Services/Sync/Server/SyncChangesets

From MozillaWiki
< Services‎ | Sync‎ | Server
Jump to: navigation, search

Sync Changesets

In an effort to minimize reads to the db, we should cache the newest changes that are written to the system. Since most devices sync within a fairly short window of time, we can expire these aggressively, thus avoiding filling the RAM allocated to membase. The expectation is that we will cut way down on database reads, as most syncs are done within a fairly short time period from the point that the data was uploaded.

We can do this by simply logging the writes as a changeset, then examining the queries that come in to see if the data can be derived solely from changesets. If so, we can loop through those.

This should be transparent from the API side. Here is the process to make it work:

Read

  1. Check info/collections to make sure there are changes post the requested time
  2. If request is for period > changeset window (2 days?) go straight to db
  3. Grab <user>:<collection>:changeset; if doesn't exist, fall back to db
  4. Loop through changesets (<user>:<collection>:<timestamp>), grabbing them out of memcache. Merge and return.
  5. If some are missing or server suspects an error, fallback to db

Write

  • Write to DB
  • Write <user>:<collection>:<timestamp> as set of processed wbos
  • Get <user>:<collection>:changeset
  • Add <user>:<collection>:<timestamp> to it. Strip any references to timestamps older than the changeset window.
  • Do a check and set write back to <user>:<collection>:changeset until it succeeds

Delete

Individual record deletions will need to be logged as a special kind of write.

collection deletes simply empty <user>:<collection>:changeset before db update.

We shouldn't be doing ranged deletes at this time. There are some ways if we want to accommodate this, but I'm researching to see if it's necessary

Notes

Need to have a way to turn off reads and not writes, so as to do prepopulation and also work correctly if you want to raise the size of the changeset window.

<user>:<collection>:changeset should have a longer expiration than individual changesets so that we can comfortably fall back to the db when it's not there (as opposed to "empty").

Need reliable way to invalidate a cache if an error occurs. Should be as easy as an early record that says "cache invalidated prior to this"