Community Ops/paas/Backups
From MozillaWiki
< Community Ops | paas
Contents
Infrastructure backup
Assumptions
- The majority of our infrastructure is based in the idea to have as many immutable parts as possible.
- Docker images
- Marathon deployed apps
- Software stack
- Mesos
- Marathon
- Zookeeper
- Consul checks
- We should be comfortable with the loss of some of our EC2 instances
- Our EC2 based infra is HA
- "Backups" refer to a point-in-time copy of a service or resource
- We should utilize AWS hosted services to avoid maintenance overhead
- All the backups should be encrypted
Mutable part
- Persistent storage
- EFS
- Marathon app definitions
- Chronos task definitions
- Databases
- Consul KV
- WP sites
External dependencies & redundancy
At deploy time we should not rely to a single external (3rd party) service because it’s a SPOF that we don’t control. We need to have redundant access to data living in external dependencies.
- Docker images
Backup implementation
EFS
- Backup is going to live S3/Glacier
- Implement a script to do scheduled backups based on a backup tool
- Deploy it in chronos
- Schedule policy
- 7 times a week
- Lives in S3
- 4 times per month
- Lives in Glacier
- 12 times per year
- Lives in Glacier
- 7 times a week
Marathon/Chronos definitions
- Backup is going to live in a versioned S3 bucket
- Implement a script to do scheduled backups using marathon/chronos HTTP API
- Deploy it in chronos
- Schedule policy
- 7 times a week
- 4 times per month
- 12 times per year
Databases
- Already backed by RDS
- Current policy
- 7 times a week
- Future policy
- 7 times a week on RDS
- 12 times a year on S3/Glacier
Consul K/V
- Backup is going to live in a versioned S3 bucket
- Implement a script to do scheduled backups using consul HTTP API
- Deploy it in chronos
- Schedule policy
- 7 times a week
- 4 times per month
- 12 times per year
WP sites
- Backup is going to live in S3
- Use MainWP native backup functionality
- Schedule policy
- Once per week
3rd party services
Docker
- Docker registry mirror
- Maybe a hosted one
- EC2 container registry is not the best one but it’s hosted by AWS
Restoring from backup
Infrastructure
- Ansible playbooks for config management
- Terraform for resources management
Storage
- Use the backup tool to revert to a point in time
- Implementation
- Native tool functionality
Marathon/Chronos/Consul
- Redeploy the definition
- Implementation
- Write a script to populate the service definitions using HTTP API
WP Sites
- Native restore functionality in MainWP
- Implementation
- Native tool functionality
Databases
- Restore from snapshot
- Implementation
- Native tool functionality