Identity/AttachedServices/StorageServerProtocol

From MozillaWiki
Jump to: navigation, search


Summary

This is a working proposal for the PiCL Storage API, to implement the concepts described in Identity/CryptoIdeas/05-Queue-Sync.

It's a work in progress that will eventually obsolete Identity/AttachedServices/StorageProtocolZero.


Queue-Sync Data Model

More details at Identity/CryptoIdeas/05-Queue-Sync.

Data is stored in independent named collections. A collection is a key-value store mapping keys to records. Each collection has a monotonically-increasing sequence number which is incremented whenever a record is changed, and provides the ability to request all changes since a given sequence number.


Collection objects have the following fields:

ParameterTypeDescription
nameurlsafe string, 64 bytesA unique identifier for this collection amongt all the user's data. Collection names may only contain characters from the urlsafe-base64 alphabet (i.e. alphanumerics, underscore and hyphen).
seqnuminteger, 8 bytesA monotonically-increasing integer that is incremented with each change to the contents of the collection.
changeidurlsafe string, XXX bytesA hash that uniquely identifies the last change to this collection. It is derived from the new sequence number, the previous changeid, and the details of the change that was made.
signatureurlsafe string, XXX bytesA client-generated HMAC signature of the current changeid. Not used or verified by the server, since it doesn't have the secret key.


Record objects have the following fields:

ParameterTypeDescription
keyurlsafe string, 64 bytesA unique identifier for this record within the collection. Keys may only contain characters from the urlsafe-base64 alphabet (i.e. alphanumerics, underscore and hyphen).
payloadurlsafe string, 256 KBThe value current stored in this record. Typically this would be encrypted and signed by the client.
seqnuminteger, 8 byteThe collection-level sequence number at which this record was last modified.
changeidurlsafe string, XXX bytesThe collection-level changeid corresponding to the modification of this record. It is derived from the new sequence number, the previous changeid, the record key, and the new record payload.
signatureurlsafe string, XXX bytesA client-generated HMAC signature of the changeid for this record. Not used or verified by the server, since it doesn't have the secret key.


Change objects are identical to record objects, except their payload field may have the value NULL to indicate a deletion rather than an update:

ParameterTypeDescription
keyurlsafe string, 64 bytesA unique identifier for the changed record within the collection. Keys may only contain characters from the urlsafe-base64 alphabet (i.e. alphanumerics, underscore and hyphen).
payloadurlsafe string or null, 256 KBThe new value to be stored in the record, or null if the record is to be deleted. Typically this would be encrypted and signed by the client.
seqnuminteger, 8 byteThe new collection-level sequence number after this change is applied.
changeidurlsafe string, XXX bytesThe new collection-level changeid corresponding to this change. It is derived from the new sequence number, the previous changeid, the record key, and the new record payload.
signatureurlsafe string, XXX bytesA client-generated HMAC signature of the changeid. Not used or verified by the server, since it doesn't have the secret key.


Authentication

To access the storage service, a client device must authenticate by providing a BrowserID assertion and a Device ID. It will receive in exchange:

  • a short-lived id/key pair that can be used to authenticate subsequent requests using the Hawk request-signing scheme
  • a mapping of collection names to access URLs


You can think of this as establishing a "login session" with the server. Access requests for a specific collection should then be directed to the appropriate URL.

Example:

   >  POST <server-url>
   >  {
   >   "assertion": <browserid assertion>,
   >   "device": <device UUID>
   >  }
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "id": <hawk auth id>,
   <   "key": <hawk auth secret key>,
   <   "collections": {
   <     "history": <access url for history collection>,
   <     "bookmarks": <access url for bookmarks collection>,
   <     <...etc...>
   <   }
   <  }

The user and device identity information is encoded in the hawk auth id, to avoid re-sending it on each request. The server may also include additional state in this value, depending on the implementation. It's opaque to the client.

The collection-specific access URLs may include a unique identifier for the user, in order to improve RESTful-icity of the API. Or they might point the client to a specific data-center which houses their write master for each collection. It's opaque to the client.

Data Access

The client now makes Hawk-authenticated requests to a specific collection at its assigned access url. The following operations are available on each collection.


GET <collection-url>

Get the current metadata for a collection: its name, seqnum and changeid. Example:

   >  GET <collection-url>
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "name": "history"
   <   "seqnum": 123,
   <   "changeid": "HASH_OF_DETAILS_OF_THE_MOST_RECENT_CHANGE",
   <   "signature": "HMAC_SIGNATURE_OF_CHANGEID"
   <  }


GET <collection-url>/records

Query parameters: start, end, limit.

Request headers: If-Match, If-None-Match

Response headers: ETag


Get the set of records currently contained in the collection. For small collections, the full set of records will be returned like so:

   >  GET <collection-url>/records
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "records": {
   <    "key1": { "payload": "payload1", "seqnum": 123, "changeid": "HASH1", "signature": "sig1" },
   <    "key2": { "payload": "payload2", "seqnum": 124, "changeid": "HASH2", "signature": "sig2" }
   <   }
   <  }


If there are a large number of records in the collection then the server may choose to paginate the result, returning only some of the records in the initial response. It will include the key "next" in the output to indicate that more records are available:

   >  GET <collection-url>/records
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "next": "key3",
   <   "items": {
   <     "key1": <record1>,
   <     "key2": <record2>
   <   }
   <  }

Clients can request the next batch using the 'start' query parameter:

   >  GET <collection-url>/records?start=key3
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "items": {
   <     "key3": <record3>,
   <     "key4": <record4>
   <   }
   <  }

When no "next" value is included in the response, the client knows that all available records have been fetched.

Records are always batched in lexicographic order of their keys, and clients are free to request an arbitrary key range using the 'start' and 'end' parameters:

   >  GET <collection-url>/records?start=key2&end=key3
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "items": {
   <     "key2": <record2>,
   <     "key3": <record3>
   <   }
   <  }

Clients may also choose to batch their requests by using the 'limit' query parameter. As with server-driven batching, the output key "next" will be used to indicate that more data is available:

   >  GET <collection-url>/records?start=key2&limit=2
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "next": "key4",
   <   "items": {
   <     "key2": <record2>,
   <     "key3": <record3>
   <   }
   <  }
   .
   .
   >  GET <collection-url>/records?start=key4&limit=2
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "items": {
   <     "key4": <record4>
   <   }
   <  }


Each server response will include an "ETag" header, formed from the combination of the current seqnum and changeid of the collection. Clients can use this in combination with standard If-Match and If-None-Match headers to ensure that they're getting a consistent view of the collection:

   >  GET <collection-url>/records?start=key2&limit=2
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  ETag: 124-HASH2
   <  {
   <   "next": "key4",
   <   "items": {
   <     "key2": <record2>,
   <     "key3": <record3>
   <   }
   <  }
   .
   .
   >  GET <collection-url>/records?start=key4&limit=2
   >  Authorization:  <hawk auth parameters>
   >  If-Match: 123-HASH
   .
   <  412 Precondition Failed
   <  ETag: 125-HASH3


XXX TODO: use of headers, versus returning seqnum/changeid in the response body?


GET <collection-url>/records/<key>

Request headers: If-Match, If-None-Match

Response headers: ETag


Get the specific record stored under the given key:

   >  GET <collection-url>/records/<key>
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  ETag: 123-HASH1
   <  {
   <   "key": <key>
   <   "seqnum": 123,
   <   "changeid": "HASH1",
   <   "payload": "payload1"
   <   }
   <  }

This request supports standard etag behaviour to ensure that a consistent view of the collection is being read.


GET <collection-url>/changes

Query parameters: since, limit.

Get the sequence of changes that have been made to the collection. If the number of changes to be returned is small, they will be returned all at once like so:

   >  GET <collection-url>/changes
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "changes": [
   <     { "seqnum": 0, "changeid": "HASH1", "signature": "sig1", "key": "key1", "payload": "payload1" },
   <     { "seqnum": 1, "changeid": "HASH2", "signature": "sig2", "key": "key2", "payload": "payload2" },
   <   }
   <  }

The changeids and signatures on these changes form a hash chain which can be verified by the client.

If there are a large number of changes to be fetched then the server may choose to paginate the result, returning only some of the changes in the initial request. It will include the key "next" in the output to indicate that more changes are available:

   >  GET <collection-url>/changes
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "next": 3,
   <   "changes": [
   <     <change1>,
   <     <change2>
   <   ]
   <  }

Clients can request the next batch using the 'since' query parameter:

   >  GET <collection-url>/changes?since=3
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "changes": [
   <     <change3>,
   <     <change4>
   <   ]
   <  }

Records are always batched in sequence number order. Clients are free to request changes starting at an arbitrary sequence number, which is useful for pulling in just the things that have changed since a previous sync.

Clients may also choose to batch their requests by using the 'limit' query parameter. As with server-driven batching, the output key "next" will be used to indicate that more data is available:

   >  GET <collection-url>/changes?since=2&limit=2
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "next": 4,
   <   "changes": [
   <     <change2>,
   <     <change3>
   <   ]
   <  }
   .
   .
   >  GET <collection-url>/changes?since=4&limit=2
   >  Authorization:  <hawk auth parameters>
   .
   <  200 OK
   <  Content-Type: application/json
   <  {
   <   "changes": {
   <     <change4>
   <   }
   <  }

The server is not required to keep the full change history from seqnum zero, and may periodically compact and garbage-collection the stored data. If the client requests changes since a seqnum that is no longer known to the server, it will receive an error:

   >  GET <collection-url>/changes?since=1
   >  Authorization:  <hawk auth parameters>
   .
   <  416 Requested Range Not Satisfiable


XXX TODO: seriously, is there a good error code for this, or should we just tunnel errors in the body?


POST <collection-url>/records

Request headers: If-Match, If-None-Match

Response headers: ETag

Update or delete records in the collection. The request body must contain an array of change objects with properly-formed sequence numbers and changeids, and it must be preconditioned with an If-Match or If-None-Match header:

   >  POST <collection-url>/records
   >  Authorization:  <hawk auth parameters>
   >  If-Match: 125-HASH1
   >  {
   >    "changes": [
   >      {"key": "key1", "payload": "newpayload1", "seqnum": 126, "changeid": "NEWHASH1", "signature": "newsig1"},
   >      {"key": "key2", "payload": null, "seqnum": 127, "changeid": "NEWHASH2", "signature": "newsig2"}
   >    }
   >  } 
   .
   <  204 No Content

The server will apply each change in turn, checking that the seqnum and changeid hash chains are properly formed. If they are not then an error will be reported:

   >  POST <collection-url>/records
   >  Authorization:  <hawk auth parameters>
   >  If-Match: 120-OLD-HASH
   >  {
   >    "changes": [
   >      {"key": "key1", "payload": "newpayload1", "seqnum": 121, "changeid": "NEWHASH1", "signature": "newsig1"},
   >      {"key": "key2", "payload": null, "seqnum": 122, "changeid": "NEWHASH2", "signature": "newsig2"}
   >    }
   >  } 
   .
   <  412 Precondition Failed
   <  ETag: 125-HASH1


No content is returned in response to a POST. The client has already calculated the new seqnum and changeid for the collection, so there is no more useful information that the server can provide.

XXX TODO: since we're posting "change" objects, does it make more sense to direct this POST at <collection-url>/changes rather than at the records resource?

POST <collection-url>/records/<key>

Update or delete a specific record in the collection. The request body must contain a change object with properly-formed sequence number and changeid, and it must be preconditioned with an If-Match or If-None-Match header:

   >  POST <collection-url>/records/<key>
   >  Authorization:  <hawk auth parameters>
   >  If-Match: 125-HASH1
   >  {
   >    "payload": "newpayload1",
   >    "seqnum": 126,
   >    "changeid": "NEWHASH1",
   >    "signature": "newsig1"
   >  } 
   .
   <  204 No Content


The server will check that the seqnum and changeid hash chains are properly formed before applying the change. If they are not then an error will be reported:


   >  POST <collection-url>/records/<key>
   >  Authorization:  <hawk auth parameters>
   >  If-Match: 120-OLD-HASH
   >  {
   >    "payload": "newpayload1",
   >    "seqnum": 126,
   >    "changeid": "NEWHASH1",
   >    "signature": "newsig1"
   >  }
   .
   <  412 Precondition Failed
   <  ETag: 125-HASH1

No content is returned in response to a POST. The client has already calculated the new seqnum and changeid for the collection, so there is no more useful information that the server can provide.