Raindrop/BackEnd

From MozillaWiki
Jump to: navigation, search

Raindrop

Notes about the Raindrop back-end as it currently exists. You may also like to see the Software Architecture description and the Raindrop Document Model for some key concepts. The Back End Roadmap has some details on what we are planning and how you can get involved.

Introduction

The back-end consists of a few main components. Before reading this, you should familiarize yourself with the Software Architecture description and the Raindrop Document Model for some key concepts.

bootstrapping

The server/python/run-raindrop.py script always performs a 'bootstrap' process as it starts up. It creates the database if necessary, then creates certain couch documents necessary for raindrop to operate - these files include all of the user-interface, front and back-end extensions, "sample" documents, account information and so forth.

While the raindrop runtime loads all its content from the database, this bootstrap process always compares the files in the raindrop source tree with the version in the couch and updates the couch documents as required. This is done primarily to make it simple for the raindrop developers to maintain this initial content in source-control. So although the content is loaded from the database, the file-system implicitly 'overrides' changes made to the document the next time the bootstrap process is run (ie, the next time run-raindrop.py is executed.) In the future, and particularly as we get better support for editing this content directly in raindrop, we expect this bootstrap behaviour to change such that changes made to these items directly in the database are able to override the defaults on the file-system.

The work queue

The work queue describes how raindrop runs extensions over documents. The term "work queue" is actually a misnomer - the queue is a logical queue - and there are actually 2 different "queues" implemented even though they share the same underlying principles - both leverage couchdb's ability to report the changes made to the database in the order they happened, or are happening.

Incoming Queue Processor

The 'incoming queue processor' is code which asks couchdb for database changes as they are happening. Each time a new change is seen, raindrop determines the schema-id for the item, determines which extensions need to run over the items and executes those extensions. These extensions will generally create new documents, which are written as normal - but this writing of the new documents then triggers couch to report these as database changes, and the process repeats. When couchdb reports no more items are incoming this queue is considered "stable" (although it continues to watch for any future new items).

In this queue, individual extensions are not allowed to get behind in their processing - if a single extension takes a long time to execute, then the entire queue takes longer to become "stable" - the queue effectively blocks while these slow extensions are executing - but once the queue reports it is stable, you can be confident all the extensions have run over the new items.

The 'protocols' which provide new items leverage this behaviour. As a bunch of new raw items arrive (eg, from an IMAP server), these items are written and the protocol is blocked until the queue becomes stable. As a result, once the protocol has provided a set of IMAP messages, these messages have been fully processed before the protocol gets a chance to provide the next batch of items. As a result, new items are able to be fully processed by the front-end as soon as they arrive as all extension points have completed. This provides a kind of self-throttling for incoming items, but has the side-effect of meaning raw items are saved in the database as fast as the extensions execute rather than as fast as the raw protocol can provide them.

Backlog Queue Processor

While the 'incoming queue processor' works fine in the perfect world, the reality is that in some cases we need to execute extensions over items which previously existed in the database. Examples include:

  • When a new extension is introduced, this extension needs the opportunity to run over the older messages already in the DB.
  • When bugs are found or extensions are enhanced, they too need the opportunity to re-execute over existing items to correct or enhance the results previously produced.
  • When new schema items are introduced while raindrop isn't running (eg, via replication or external applications), or when the 'incoming queue processor' isn't running or has failed for some reason.

In this model, all extensions are executed independently from each other over these old documents in the order they were created in the database. The "state" (ie, exactly where in this changelist the extension is currently at) is maintained independently for each extension, so when raindrop restarts the extension can be continued where it left off. As a result, extensions which are slower may take longer to get through the backlog than others - in other words, each extension has a completely independent position in the changelist than others. Individual extensions, or all extensions, can be processed in this way.

While there is some integration between these 2 models (eg, the 'backlog processor' will stop at the same place the 'incoming processor' started), more integration is needed (eg, the documents which record the state of the 'backlog processor' need to reflect they are up-to-date with where the 'incoming processor' is currently up to, not where it started)

Extension execution model

Extensions are currently loaded from couchdb at runtime and their code dynamically compiled. Currently the extensions are run in-process with the raindrop back-end, meaning extensions must currently be written in Python - however the intention is to move all extensions into external processes, at which time other languages can be supported. We intend to support Javascript "out of the box", and are likely to try and reuse/abuse the couchdb javascript engine for use in this context. Like the couchdb model, we intend to use a sub-process model where raindrop itself communicates with these languages via json over stdin/stdout.

Extensions are provided with a handful of functions for them to use during their execution - these include functions for writing new schema items, performing queries on the database and so forth. Even with the current in-process model, no extensions rely on the raindrop state (although some do import raindrop helper functions).

See the Back End Roadmap for more detail on this.