Gaia/Contacts/Data Refactor
Contents
Current situation
The current Contacts app has been growing up supported by the MozContacts API. It started like a simple application but has been adding more and more requirements.
The MozContacts API is being used by several applications with different requirements and the final result is a store for vCard objects with limited search capabilities.
With the inclusion of external contacts sources like Facebook, Gmail, Hotmail, vCard from SDCard, Bluetooth and so on, we have been extending the use of the API through hacks and obscure workarounds. Last but not least we are able to detect duplicates and manage to merge them appropriately.
Motivation
Contacts list performance
The Contacts App interaction design includes a fast scroll functionality with letter-based (initials) shortcuts. To keep the MozContacts API independent of this application-specific requirement we have been doing black magic to keep the performance at 60fps when loading huge amount of contacts (> 2000).
Search restrictions
Currently Contacts Search functionality is implemented using the DOM and regular expressions. There are no specific indexes on the contacts database as the MozContacts API only provides generic search functionalities.
Impossible to unmerge contacts
During the process of merging contacts we end up loosing the information of the original contacts as we only have a MozContacts API to store this information. So there is no way to unmerge contacts.
Dependency with Facebook integration
We want other apps different than the Contacts app to be able to read the contacts data imported from external sources like FB, Hotmail, Gmail, etc. However, we have legal restrictions that avoid us to share the data imported from FB with an app that is not owned by Gaia. This made us take this restricted data out of the MozContacts API because this API is accessible by privileged apps. Initially we had to create an independent IndexedDB with FB information that used to live in the Communications app, but once we were able to use the DataStore API we moved this data to a shared DataStore. So now consumers of contacts information that are interested on FB data need to query the MozContacts API and the FB datastore.
Hard to integrate with 3rd party contacts providers
We added extra support for Gmail and Hotmail contacts but this is embedded in the Contacts application itself, which is not scalable. We want to make any 3rd party app capable of providing contacts to have a seamless integration with the Contacts APP.
New potential features
Architecture
Current context
MozContacts API
Right now is frozen, we know that and will be for good a nice way of storing contacts locally, but we won't be able to add extra requirements.
DataStore API
The Datastore API is a new API designed to make apps share information or making them collaborate to add data to a common store. It allows synchronization and being able to notify about changes. It's a pretty simple key/value store, where both key and value can be any kind of object, and it doesn't implement any kind of search, just direct access by key.
Contacts data consumers (in Gaia)
- Contacts app: Obviously. This is the central hub of contacts information in Gaia. We want the user to have access to the whole list of contacts whichever its source is. We want local contacts and contacts imported from external sources like FB, Gmail, Hotmail, etc. The user won't be able to modify the contact information from an external source. All changes done to contacts will be kept locally and won't affect its remote replication. Each contact provider is responsible for its own data. In this case, the Contacts app is only provider and so responsible of the MozContacts API data, although this data can also be modified by other apps with enough permissions.
- Messaging app: Whenever the user sends or receives an SMS or MMS, we need to get the information associated to the contact(s) subject of this message. We are also allowing users to do quick searches by contact name and phone number when composing a new message. We are currently querying the MozContacts API and the FB datastore.
- Dialer app: Same situation as above, but we are only doing searches by phone number. We are currently also querying both stores, MozContacts and FB.
- Emergency calls: We do a basic search by contact ID on the MozContacts store to show the ICE contacts bar if appropriate. No FB search so far, but we want users to be able to set FB contacts as ICE contacts, so we will need it in the future. No need for quick incremental searches by phone number or any other field.
- Email app: The email app queries the MozContacts API by ID and email. No need for incremental searches so far, but we might need it at some point. No queries to the FB datastore.
- Search app: Currently disabled. So far there is a search by name implemented. It should be able to search any contact (include FB data) by any field at some point (?).
Proposal
High level overview
Actors
Contact Providers
We want a generic mechanism for apps to share contact information and so became Contact Providers. An app that wants to share each contact information should:
- Be the owner of their own information (CRUD). Contact Providers are the only ones responsible for adding, updating and removing contacts on their own stores. The Global Contact Data Store won't have write access to other provider's store.
- Provide a readonly datastore named contacts with the information it shares.
- That datastore will have a unique index per contact as key and a *MozContact* object as value.
Global Contacts DataStore (GCDS)
This is the global hub of information about Contact Providers and the data they share. We will have a certified app with no UI that will do the following:
- Provide a datastore, Global Contacts Datastore (GCDS), that gives a vision of all contacts in the different datastores owned by the different Contact Providers of the system. This global datastore will contain an entry for each and a pointer to the original contact source.
- Merge passively contacts that are detected to be the same ones. Active merge (with UI) is responsability of each Contact Provider.
- Provide a helper library for users to have the view of a single contact coming from it's different (or unique) sources.
- Listen to changes in other datastores to keep the whole view unified.
- Able to unmerge contacts references on demand.
Local contacts
- A new datastore owned by the system will be provided and will contain the same contact related information stored in the MozContacts API indexed DB.
- The content of this datastore will be modifiable via the existing MozContacts API only and the GCDS will treat this datastore as it treats any other datastore owned by any other Contact Provider.
- Because the MozContacts API can be accessed by privileged apps, certified only data won't be added to this datastore. This is the case of FB data.
Contacts Data Consumer
- The Contacts app is considered a Contacts Consumer, just like the rest of the apps listed on Contacts Data Consumers in Gaia
- Each consumer of the GCDS will need to build their own indexedDB from the data obtained through the GCDS and the datastore owned by the pointed Contact Providers. This indexedDB should contain the specific indexes required for the specific needs of the app like incremental searches, jump to letter, better cursor walking, etc.
Data flows
Contact consumer initial fetch
Preconditions The GCDS has at least one Contact Provider registered and contains pointers to the mozContact information contained in this (or these) Contact Provider(s) in the form of mozContact IDs.
Contact Consumers needs to initially create a local IndexedDB to store a copy of all the mozContact information stored in the Contact Providers that are registered with the GCDS. It will need to add the appropriate indexes to this indexedDB to allow the specific searches for the specific needs of the Contact Consumer.
The Contact Consumer that wants to obtain the global contact information and store it locally needs to query the GCDS via the DataStore API. It will need to do an initial sync to obtain the whole list of records stored in the GCDS. Per each record, it should find an array of this form:
[{ owner: 'contactprovider_1', contactId: 'aId' }, { owner: 'contactprovider_n', contactId: 'anotherId' }]
This array will usually has a length of one element only. If it contains more than one element, it means that the contact has been passively merged and so the Contact Consumer can safely ignore the rest of the elements of the array because they contain IDs for mozContacts that are an exact copy of the one corresponding to the ID stored in the first position of the array.
The owner field of the object is the name of the datastore that contains the actual mozContact information the Contact Consumer is interested on. So the Contact Consumer needs to query this datastore with the value of the contactId field to obtain the mozContact information. It is possible that the consumer has not enough privileges to access the provider's datastore. In that case, it should silently fail and continue with the next GCDS record. If it has access to the provider's datastore and once it gets the mozContact information it can store it locally in its indexedDB and repeat the process for each ID stored in the GCDS.
Local contact addition or modification
Local contact addition or modifications needs to be done through the MozContacts API since Contact Provider's datastores are supposed to be readonly.
A Contact Consumer with access to the MozContacts API can add a new contact or modify an existing one via the MozContacts API, which will write the modification to the MozContacts internal indexedDB and the Local Contacts datastore. This datastore modification will trigger the datastore-change-* system message (and the DataStore.onchange event) which will wake the GCDS up. Once it is up, the GCDS will request a sync with the Local Contacts datastore to fetch the latest changes that will be recorded in the GCDS. This will also trigger a datastore-change-* notification and a DataStore.onchange event that will be received by the Contact Consumers observing this notifications. These Contact Consumers will need to sync with the GCDS to obtain the latest changes and query the appropriate Contact Provider where the changes were originated, in this case the Local Contact datastore. The Contact Consumer responsible for this contact addition or modification can probably safely ignore the change notification, but that's an implementation detail that may defer between consumers.
External contact addition or modification
The flow in this case is pretty similar to the previous one with the difference that the contact change is not done through the MozContacts API but instead it is done through the specific mechanism the Contact Provider has for modifications of the contact data they own. This could be an app specific UI, a synchronization with an external service, etc. In any case, an addition or modification of a contact stored in a Contact Provider datastore registered with the GCDS will trigger a notification that will wake the GCDS app up which will sync with the provider's store and trigger a new notification that should be received by Contact Consumers just like in the previous flow.
Contact Consumer search
Each Contact Consumer should keep a local indexedDB with the appropriate indexes to fulfill its specific search requirements. So contacts searches will be done by querying Contact Consumer's local indexedDBs.
Contact Provider installation/uninstallation
When a new application exposing a datastore named 'contacts' is installed or uninstalled an datastore-change-contacts system message should be triggered and should wake the GCDS up as mentioned here. Once it is up, the GCDS should fetch the contact information from the new Contact Provider and populate its datastore with the corresponding information. This population will trigger a change notification that will be received by Contact Consumers which should update their local indexedDBs with the new added information.
Platform requirements
In order to do this we still have some requirements that needs to be implemented in the platform.
Expose mozContacts API content as a datastore
Because we want the MozContacts API content to be another Contact Provider as defined in this model, we need this data to be exposed through a datastore. This way the GCDS will be able to create pointers to the local contacts data owned by the MozContacts API and receive notifications about changes done in this data. More background about this change can be read on this mailing list thread and bug 1016838.
Notification mechanism about datastore changes (aka DataStore onchange scheduler)
The GCDS app needs to receive notifications about changes on the datastores owned by all the Contact Providers that are registered in the GCDS. To do that we have two ways of receiving this kind of notifications:
- Via the onchange event, that will be received by the GCDS while the app is running.
- Via the datastore-change-* system message, that the GCDS needs to listen to be woken up when a change in an observed datastore is identified. This system message was added on bug 1014023
We believe that these two mechanisms are not enough to fulfill the needs of the proposed architecture the way they are implemented right now for the following reasons.
- There is no way to know when a new Contact Provider is added
In order to add a new Contact Provider datastore information and its corresponding contained data pointers to the list of providers stored in the GCDS we need to be notified about new installed apps exposing a datastore named contacts. Currently, we are able to request a list of all the datastores with an specific name from the GCDS app, but to do that the GCDS needs to be opened. And this only happens when a contact is changed in any of the already registered Contact Providers, which could happen immediately, next month or never. So we need a way to wake the GCDS up to do the proper registration of the new installed/uninstalled Contact Provider. We can simply trigger the datastore-change-* notification as soon as we detect a new datastore named X. The GCDS will be responsible to look for not already registered Contact Providers once it is woken up by the system message.
- Potential desynchronization between Contact Providers content and GCDS
Because we have no way to acknowledge the reception and completion of the processing of a change notification we may have an scenario where:
- A bunch of contacts are added and removed from a Contact Provider being watched by the GCDS. This triggers the datastore-change-* notification.
- The GCDS wakes up as response to the datastore-change-* notification.
- The GCDS starts syncing with the observed Contact Provider datastore.
- The GCDS does enough changes on its store to trigger a datastore-change-* notification over the Contact Consumers that are observing the GCDS work.
- The Contact Consumers start syncing with the GCDS.
- The GCDS is killed by the system or the system crashes or we run out of battery or ...
- We have a desynchronization between the Contact Provider and the Contact Consumer that won't be solvable until the next time the GCDS wakes up because of a change on an observed Contact Provider, which again could happen immediately, next month or never.
We need the DataStore API to keep notifying (probably not indefinitely) the observer about the need of synchronization until the sync process is completed. Or at the very least we should make sure that the GCDS can't be killed while doing a synchronization.
- Potential performance foot gun
When a Contact Provider is changed, this triggers a datastore-change-* notification observed by the GCDS app, which is woken up. The GCDS makes its corresponding changes in its datastore which triggers a new datastore-change-* notification which is observed by every single Contact Consumer in the device. This means that we will be opening at the very least the six apps that are listed here in a very short period of time, which will certainly cost a considerable peak of memory and might even kill the GCDS or force the killing of any of these Contact Providers causing the desynchronization mentioned above.
To avoid this we could:
- Simply not observe the datastore-change-* system message on Contact Consumers. Instead, Contact Consumers will need to do a DataStore.sync() to synchronize with the GCDS every time they are opened.
- Send the datastore-change-* system message only to certified apps. This will limit the number of Contact Consumers that will be woken up and we can control that they close themselves after the sync is completed, if still want to observe the datastore-change-* notification.
- Add some kind of scheduler for these notifications so they will be delivered in a controlled way when we detect that there is enough memory to wake new apps.
DataStore permission model
Once the DataStore API is exposed to privileged apps, apart from what it is being proposed on bug 942641, we need a way for apps to define a minimum level of privileges an app should have to access a datastore. We need this granularity to avoid privileged apps to get access to the FB datastore which is supposed to be only for Gaia certified apps. Probably an addition to the manifest like:
'datastores-owned': { 'datastore-name': { 'description': 'My certified only datastore', 'access': 'readonly', 'access-level': 'certified' } }
Initial population of the GCDS
Once we move to this model we will need a way to do the initial launch of the GCDS app to do the first population of the currently existing Contact Providers (i.e. local contacts datastore, Fb datastore, etc.). This process needs to be atomic.
I know, I know. We already have DataStore to share data between apps. But unfortunately, with the approach we are proposing here we will be duplicating contacts data in quite a massive way for very similar needs. Every Contact Consumer will need to have its own indexedDB built from the data obtained from the Contact Providers datastores and with its own indexes. The Contact Consumers that we currently have in Gaia have or will have very similar, if not identical, needs, so the indexedDBs that each app will be maintaining will have exactly the same data and the same indexes. This is a lot of duplication for the same purpose. Having a way to share a indexedDB between Gaia certified Contact Consumers containing the information about all the contacts shared by Contact Providers will allow us to save a high amount of disk space.
We found this interesting discussion about DataStore where there are mentions to the possibility of allowing indexes in DataStore. That would be really helpful for this refactor.
FAQ
- What happen if a new Contact Provider is installed/uninstalled?
Check the Notification about datastore changes mechanism section. We need a datastore-change-* system message to be fired as soon as a new app exposing a 'contacts' datastore is installed. This will wake up the GCDS app which should get the list of 'contacts' datastores and add the ones that are not already registered.
- Where does the data live?
In each application. This works on both sides, providers, and consumers as well. Each Contact Provider will contain their specific data. The FB provider will contain FB contacts only, the Gmail provider Gmail contacts only and so on. Each Contact Consumer will contain a copy of the data resulting of the mixture of all the data owned by all Contact Providers they are allowed to access. We are aware of what this could mean in terms of disk usage, check the Shared indexedDBs for certified apps for a potential fix for this issue.
- What happen if I create a new contact?
That will be a local contact, accessible from MozContacts API and it's datastore. A reference in the GCDS pointing to the original source will be created as well (unless merged with an existing one).
- What happen if I modify a contact from a external source?
That's responsibility of the Contact Provider owning the affected contact. Each Contact Provider needs to manage their own external replication and sync mechanisms. There is a parallel effort to provide a way for applications to register for notifications about synchronization windows that is out of the scope of this proposal.
- Will this GCDS have access to the contacts of an specific app?
We want the user to decide that, which external datastores can be access by the GCDS and so shared with other Contact Consumers.
- What happen if a Contact Consumer receives a notification about a external modification done over a contact that has been modified locally?
Conflict resolution UI?
Some edge cases
- As a user, I want to merge 2 local contacts.
- As a user, I want to merge 2 contacts from external RW sources.
- As a user, I want to merge 2 contacts: 1 from an external RO source (like FB) and 1 from an external RW source.
- As a user, I want to merge 2 contacts from an external RO sources. [Might be impossible as FB is currently is only example of RO source]
- As a user, I want to merge 2 contacts: 1 already merged and 1 from an external RW source.
- As a user, I want to merge 2 contacts: 1 already merged and another one already merged.
- As a user, I want to merge 3 contacts: 1 local, 1 from an external RO source and 1 from a RW source.
- As a user, I want to unmerge a contact coming from 3 sources.
- As a user, I want to delete a merged contact (local + RO external source).
- As a user, I want to edit a merged contact from the external source.
- As a user, I want to edit a merged contact locally and then from the external source.
- As a user, I want to edit a merged (RO+RW) contact from the external RW source.
Implementation
Bugs associated
14 Total; 14 Open (100%); 0 Resolved (0%); 0 Verified (0%);
Experiments
Below you can find a list of experiments we did to check how our ideas perform, feasibility and performance:
- bug 1031315 - [Contacts][Data Refactor]Check performance of building contact details from different DS
- Here we tried to check how much it cost to build a contact detail from different sources each time, compared with storing the result in indexeddb and retrieving it.
- With 500 contacts, obviously is much faster the indexedDB option, but we wanted to see how much is the difference between bot solutions having 30-40ms to get the data from indexedDb and ~150ms while composing from different sources.
- To be sure that we don't consume much space on disk, we decided to cache the detail of those contacts that we visit.
- bug 1031318 - [Contacts][Data Refactor] Better search in contacts
- All in memory: 1MB with 2000 contacts
- Indexeddb: Problems with traversing, we need 8 seconds to do this.
- Finally we have decided to go with indexeddb since the memory consumption is too high, also we need to check concepts like suffix array to see if we can apply them to indexeddb.
- bug 1031327 - [Contacts][Data Refactor] Measure time to finish cursor when information stored is a full mozContact or just the bare minimun information (for 2000 contacts)
- Time to first contact is more or less the same ~ 80-100 ms
- Time to last contact with mozContacts is around 6.6~6.8 seconds
- Time to last contact coocked with indexeddb is around 2.8~3 seconds
- Storing in indexeddb a set of minimun information without having to perform operations to calculate what to display helps incredibly.
- Other findings:
- Thumbnails are expensive in space, perhaps we could offer a thumbnail DS for certified apps.