* Each user's data is completely independent, there's no need for queries that cross multiple user accounts.
** This means that our data storage problem ''embarrassingly shardable''. Good times!
* Each user account is assigned to a particular '''shard''', identified by an integer.
** Their shard assignment will never change unless they delete and re-create their account.
* All reads and writes for a shard go to a single '''master''' MySQL database.
** A single DB host machine might hold multiple shards.
** To keep things simple, there are no read slaves. The sharding is the only thing responsible for distributing server load.
* Each master synchronously replicates to a '''hot standby''' in the same DC, to guard against individual machine failure.
** This may not be necessary, depending on instance reliability and our uptime requirements.
* Each master asynchronously replicates to a '''warm standby''' in a separate DC, to guard against whole-DC failure.
** Or maybe to multiple independent DCs. Same same.
** The warm standby should probably be a synchronous replication pair. Because symmetry. And failover.
* All sharding logic and management lives in a proxy process, so that it's transparent to the application.
* Look up the shard number and corresponding database for that userid.
* Forward the query to the appropriate database host, and proxy back the results.
The particulars of shard selection/lookup are not defined in this proposal. :rfkelly likes the consistent-hashing-plus-vbucket approach taken by couchbase, but it could be as simple as a lookup table. We assume that the proxy implements this appropriately and efficiently.