Gaia/Email/Implementation/MailSynchronization
Contents
Overview
We use completely different synchronization implementations for IMAP, ActiveSync, and POP3. This is an outgrowth of differences in the protocols.
Key IMAP / ActiveSync Differences
- Fixed sync windows versus flexible sync windows
- For ActiveSync, we are constrained to sync along exact, pre-defined time boundaries. We can sync 1 day, 3 days, 1 week, 2 weeks, 1 month, or all messages. That's it.
- For IMAP we do synchronize whole days in their entirety, but otherwise what we sync until we have 'enough' messages for our display needs.
IMAP
Possibilities
IMAP sync algorithms can be classified into two major classes: whole folder sync, and partial folder sync.
Whole folder sync is what most desktop mail clients like Thunderbird do. This is simpler but has the potential for much greater resource usage and a greater delay before the client is first usable. The greater resource usage is traditionally not a problem for desktop-class devices with good bandwidth.
Partial folder sync is what the Gaia e-mail app does, and probably most mobile mail clients.
All mail sync can be optimized or made better by the various optional IMAP extensions that exist out there, like CONDSTORE and QRESYNC. The big problem is that because these are optional extensions, many servers don't implement them and we absolutely cannot depend on them. For example, GMail and Yahoo implement the absolute minimum required by the IMAP spec.
Whole Folder Sync Basics
Whole-folder sync is potentially simpler because it avoids a lot of edge cases related to partial sync.
A simple implementation finds out what the highest unique id is in the folder and then asks for all of the envelopes for messages from UID 1 through the high UID. Relying on a correlation between UIDs and recent messages allows a client to start with higher UIDs and (usually) end up fetching the envelopes of the most recent messages first. Incremental synchronization after this initial synchronization is made much easier because
Partial Folder Sync Basics
The tricky thing about partial folder sync is that usually what you want is the most recent messages by date but the way messages are numbered in an IMAP folder is based on when they were put in the folder. Since new messages must have been put in a folder recently, there is a correlation between the message UID (the number / identifier it is given when it is added to the folder), but it is just a correlation. The most recent messages in a folder by UID may be message that are six months old because the user just moved them into that folder.
This not-good-enough correlation is what complicates things. So rather than synchronizing just based on UID, the Gaia E-Mail client synchronizes based on the date the message was received, also known as its INTERNALDATE. This is a (theoretically) immutable value based on when the SMTP server received the message, but it might also be when the message was fed into the IMAP server. Either way, the value is distinct from the 'Date' the author of the message indicates.
We use the SINCE and BEFORE SEARCH operations to get a list of messages falling within a given time window.
Partial Sync Implications
Because we synchronize based on time-windows, we can always synchronize more messages. There is no hard boundary stopping us from synchronizing more messages, unlike for ActiveSync.
ActiveSync
ActiveSync was more explicitly designed for the constrained mobile device sync device idiom. In fact, it's somewhat over-constrained in this regard. When we ask to synchronize a folder, our choices are to synchronize a time range stretching: 1 day, 3 days, 1 week, 2 weeks, or 1 month into the past or ALL of the messages in the folder.
Partial Sync Possibilities
It's conceivable that we could be clever and try and implement something similar to our IMAP partial sync strategy for ActiveSync. We haven't looked into it.
POP3
POP3 support landed in v1.3. It was implemented at the request of partners who wanted the e-mail app to support POP3. However, at the time of implementation and of writing this, we are not aware of any POP3 servers where IMAP is not also freely available and so should be used instead. There are mail providers out there that officially only provide POP3 support for free and charge for IMAP support, but it is our understanding that IMAP can in fact be used on those providers without paying an additional fee.
We do not support POP3 on servers where IMAP is freely available. This is because POP3 will always be an inferior experience to IMAP given the limitations of the protocol.
Our current implementation requires support for UIDL and TOP.
Envelope/Snippet fetching, Body fetching, Attachment complications
When we initially synchronize a message, we only fetch its headers and enough of the start of the message to hopefully try and get a snippet. We only download the entire message body when the user clicks on a message to display it.
Because of limitations of POP3, we can't really know if there are attachments in a message or how many attachments there are until we've downloaded the part of the message body that contains the attachment. We are able to make some educated guesses about whether a message contains attachments by looking at its headers, but these are just guesses. We are unable to download just a message body without its attachments unless we try and do risky/tricky things that end up being wasteful.
(Specifically, we could try and close a connection if it looks like the message has a giant attachment and we don't want to see the attachment, but there's no guarantee that we'll save that much bandwidth. And we have no way to resume our message download after what we've already downloaded; we have to always download the whole message from the start, so it can end up very very wasteful. Plus, as noted elsewhere, we don't really care to complicate our POP3 implementation that much since we think people should be using IMAP.)
One side effect of this is that we *always* download the attachments in a message, and as of v1.3 and current plans for v1.4, this results in the attachment being saved to DeviceStorage if the device can manage the attachment, or the attachment simply being discarded if we cannot.
Always Leave On Server
Our implementation *never* deletes messages from the server. This is a simplifying assumption to help avoid data-loss.
In general, there are four ways a POP3 e-mail client can operate:
- it can delete the messages from the server immediately as they are downloaded. This really only makes sense if the storage space available for POP3 is extremely limited or in cases where it is (probably mistakenly) believed to enhance security because the only means of third party inspection of the messages is while they are in the POP3 spool.
- it can delete them from the server after some time interval. This allows multiple POP3 clients an opportunity to download the messages without interfering with each other, assuming all clients use a time interval longer than the maximum sync interval of the clients while also bounding POP3 spool usage.
- it can leave the messages on the server forever. This avoids the potential for data-loss because the messages don't get deleted. Whether this is the right answer or not depends on why a POP3 server is being used. If a POP3 server is being used because the mail provider wants to charge extra money for IMAP service and the user does not want to pay or switch providers, then leaving messages on the server is probably the right choice. If a POP3 server is being used because the mail provider has not upgraded their systems in the last 10+ years and accordingly has serious storage limitations and the user somehow does not want to switch providers, then this might not be so great. There are 2 major variants on leaving messages on the server:
- Never deleting messages from the server, even if the user deletes them from the device. This is what we do. If you delete an account and re-add it, all of the messages that were previously deleted from the device will be eligible for re-synchronization.
- Deleting messages that the user deletes from the device from the server. This only works when there is no ambiguity between permanently deleting the message because the user never wants to see the message again ever and removing the message for device space management. Our UI does not have a good way of differentiating between these two cases, so we choose to never delete messages. We are unlikely to ever resolve this ambiguity since there is no such ambiguity in the IMAP case, we believe that IMAP is a superior choice for users, and there are viable free IMAP services available to users which also provide for migration while still being able to receive e-mails at the old address. (For example, gmail and other IMAP providers can pull messages off a POP3 server, allowing the old e-mail address to still be used.)
Handling Excessive Numbers of Messages
Because we never delete messages and users may already have a large number of messages in their POP3 spool, we do not synchronize all messages at initial sync time. What we do is retrieve the UIDL's of all messages in the spool, and take only the 100 most recent messages (from the end of the UIDL list, which seems to correlate with recency), and put the rest in a backlog of messages. The user can synchronize these older messages 100 at a time by using the "load more messages" affordance. Otherwise, these messages will not be synchronized and only new messages received after the initial sync will be synchronized.