Identity/CryptoIdeas/03-ID-Attached-Data
Contents
ID-Attached Data
- Brian Warner, Chris Karlof, 05-Feb-2013
Summary: a design to extend the ideas in BrowserID Key Wrapping and Identity/CryptoIdeas/02-Recoverable-Keywrapping to include three classes of data (recoverable-by-assertion, recoverable-by-password, paired-device-key-only). A new "Key Server" is introduced, which holds (wrapped) encryption keys for the user, so Storage Servers only hold ciphertext encrypted with a full-strength per-user per-service key. Clients who wish to share data with storage servers can either reveal their per-service key, or store plaintext. Revocation is discussed.
Data Protection Classes
We describe three classes of data, providing a spectrum of confidentiality and recoverability. All data is encrypted with a full-strength key, but this key is made available (or not) to different people depending upon which class the data is in:
- class-A "available": this key is available to anyone who can produce a valid BrowserID assertion, so includes the end user, their IdP, and anyone who can spoof whatever the IdP uses to identify the user (typically an email challenge, so this includes network attackers who can redirect or snoop on SMTP traffic). Any user who can still convince their IdP to let them log in will be able to recover their class-A data, without remembering any other secrets or having any devices remaining.
- class-B "brute-forceable": this key is wrapped with a (derivative of a) user-memorized "master password". A limited set of parties (anyone who can read the class-A data) will have enough information to attempt a brute-force attack on the password, but storage servers and the rest of the world will not. The user must remember their master password and be able to log into their IdP to get this data back.
- class-C "confidential": this key is created by the user's first client, and transferred to their other clients with a pairing protocol (PAKE), facilitated by a central server but not vulnerable to it. No external parties will be able to read this data. The user must have at least one functional paired device (or a manual key backup) to recover this data.
The idea is that users can choose which browser data goes into each class. Sensible defaults would probably put Password Manager data into class-B, and bookmarks into class-A, but users should have the option of putting everything into class-C if they like (to behave like current FF Sync).
These classes can be subdivided for other properties. For example, class-A can be split into "A+" in which the data is encrypted by the assertion-protected key before it is sent to the storage server, versus "A-" in which the data is given to storage servers in the clear, and the server only provides access to readers who present an assertion (or equivalent). In both cases, the end user can recover their data with just an assertion. In A+, the server doesn't see plaintext, so the user's reliance set (the list of parties who can see the user's data) includes just the IdP and the Keyserver. In A-, the storage server can manipulate the plaintext (perhaps to provide merge/reconcilliation, or search features), in exchange for which the reliance set grows to include the storage server. "A-" can also be accomplished on a user-by-user basis by delivering a decryption key to the storage server.
User Options
If users never want to use class-B data, they should not be required to come up with a master password. If they never use class-C data, they should not be required to pair new devices with existing ones.
The system should provide some way to revoke access from stolen devices. This revocation may not be immediate.
The user should be able to grant access to subsets of encrypted data to any service they please, without also granting access to all their encrypted data.
Big Picture
Web content from various domains, as well as internal services (addons with synthetic "resource://" domains) will get an API (maybe "navigator.id.data"?) with which they can obtain keys, tokens, encrypt plaintext, and decrypt ciphertext. Web content can only obtain keys/tokens when the user has signed into that domain with BrowserID (to prevent unauthorized linkability). The API provides separate keys/tokens for the three classes of data (kA, kB, kC).
When the API is used for the first time, the browser needs to create an account or connect to an existing account. It prompts the user to provide an email address and obtains a BrowserID certificate for that address. It then presents an assertion to the Key Server to see if the account already exists, creating it if necessary. The Key Server will create a random "kA" and return it to the browser (this message is only protected by TLS, as we have no other shared secrets to work with). Subsequent accesses will use the same process: submit an assertion, get back "kA".
If class-B keys are desired, the browser will ask the user for a master password. This password will be stretched using the PBKDF2/scrypt/PBKDF2 scheme defined in Identity/CryptoIdeas/01-PBKDF-scrypt to obtain a "master key". This key is then used to derive some additional keys, which are used in a "key retrieval" step (probably using SRP) to safely obtain a wrapped copy of "kB", which is then unwrapped with a different derivative of the master key. The key retrieval step can use the shared session key to prevent eavesdroppers (even those who break TLS) from learning anything about the password or kB. This kB is a full-strength random key, created on the client, and never revealed (except in wrapped form) to the Key Server. As a result, class-B data is fully protected against everyone but the Key Server, and even the Key Server only gets a brute-force attack against the user's master password.
For each class of data, the API provides both a raw encryption key "kA[domain]", and a "tokenA[domain]", both of which are scoped to the user and the domain which served the code that invoked the API. The recommended way to manage the ciphertext is as follows:
- deliver a BrowserID Assertion and the token to the storage server
- the storage server records a database row with the assertion's email address, the token, and a slot where ciphertext will be stored
- discard the assertion. The API retains kA/kB/kC and thus the ability to regenerate the token and encryption keys.
Later, when the browser code wants to fetch or modify the data, it submits the retained token to the storage server to prove its right to access that data, and encrypts or decrypts the data with the key. The assertion (which has long-since expired) is not needed to access the storage-server data; the token is sufficient.
Storage Servers distinguish between data stored/encrypted under different classes: they should not confuse class-A data with class-B data. If it makes sense to allow multiple classes in a single service (perhaps the service holds bookmark data, and the user can choose which class they want), the server API will need to be able to list or discover what data is available. It's probably best to require a class-A token to make this query, to protect user privacy.
Storage Servers should reset their tokens, or delete user data entirely, upon receipt of a valid BrowserID Assertion with a matching email address. This allows less-verified parties to delete user data, but also allows users to regain control of their data (or delete it) even after they forget their password or lose their paired device keys.
No Key Server Certificates
The Key Server does not issue certificates. Storage Servers rely on tokens (derived from browser keys which can be held for long periods of time) for most access, and accept IdP-derived BrowserID certificates for setup, recovery, and teardown. This keeps the Key Server small (no public keys to publish and rotate, fewer asymmetric crypto operations, less confusion about expiration times which wouldn't match the IdP's choices). It also affects revocation.
IdP certificates have expiration times under the control of the IdP, and are likely to expire well before a background-data-synchronization tool would want them to (requiring users to re-sign-in every few weeks would be a drag). So IdP certs aren't a good choice for storage-server access control. Using Key Server certs instead would allow us to have much longer-lived certificates, but it sounds like "forever until revoked" is the desired lifetime for this component. So using tokens instead of certificates is simpler and achieves the same goals.
Token Versions
For each account, the Key Server remembers an integer "version number", which is incremented each time a user wants to revoke access (e.g. when they change their master password, or click a "revoke access from all devices" button on a control panel). This is delivered to the browser along with kA and wrap(kB).
The version number is included in the key derivation function for tokens, but not for the data-encryption key. The data-encryption key remains constant forever (or until we design a more complicated re-encryption-based revocation scheme). Browsers who have just received the master keys kA/kB/kC will be able to compute any version of any token they wish, but once they have successfully talked to their storage server, they will forget kA/kB/kC and all older tokens, retaining only the most recent token version and the domain-specific data-encryption key. In this post-login state, if the browser needs to compute a newer token, it must "re-login" by submitting an assertion to the Key Server to re-fetch the master keys.
Storage Servers remember just one token for each class of data.
Storage Server Access Rules
For each class, a Storage Servers will hold a row of data with (email, token, ciphertext). Requests to read or write data must be accompanied with the matching token, delivered in a confidential channel: Read(token) or Write(token, new-ciphertext). If the token does not match any known row, an "UnknownToken" error is returned.
Account creation and post-revocation update is managed with a second API: UpdateToken(assertion, new-token, old-tokens). The storage server should check to see if any of the old-tokens match the stored row: if so, "Success" is returned, and the stored token is updated to new-token. Else the server validates the assertion and checks to see if any row matches the included email address, in which case it returns "KnownUserUnknownToken". If not, the server creates a new row (with email from the assertion and new-token) and returns "SlotCreated".
The third API is DeleteData(assertion), which validates the assertion and deletes any data with a matching email address. This allows users to delete their data even if they cannot remember a password. This may need more discussion.. it seems like a useful feature, but obviously allows IdPs to clobber data they cannot read, which could be surprising.
If the server is eager to be RESTful and needs a distinct per-user identifier to go into the URL for the Read() and Write() APIs, the best choice is to use a hash of the token. This can be safe (even if we assume that URLs are not secret) because tokens are derived from full-entropy keys, and thus not vulnerable to dictionary attacks. When the token is set with UpdateToken(), the server can compute the UID and record it in an index. The server must check that both the UID and the token match the recorded data. (Less eager server designs should just use a common POST URL for all APIs and put the one token in the request body, omitting any sort of UID).
Revocation
When an activated device is lost or stolen, users will want to revoke its access, and are likely to express this by changing their master password (if any), and/or by going to a Key Server control panel of some sort and hitting a "revoke devices" button (which will require a BrowserID assertion). This will be implemented by incrementing the version number and updating tokens on all storage servers, so all devices must construct a new token to access the ciphertext. The user will have to re-log-in on all their devices (with a current assertion, and the new password) to access the post-revocation data.
Browsers are expected to discard MK (the password-derived master key) immediately after obtaining kB, to ensure that nothing is left in the browser that could let an attacker brute-force the master password without online help. Browsers also discard kA/kB/kC after constructing valid domain-specific tokens and keys for each known service (i.e. all add-ons that have registered to get domain-specific keys). They will retain the domain-specific keys, and domain-specific tokens for the current version number, so that periodic data sync can continue to occur in the background without user intervention (until revoked).
When a browser uses the "revoke device" button and increments the version number, it will immediately re-login to all known services. As the storage server will still have the old token, the Read(token) call will get an UnknownToken error, prompting the client to use UpdateToken with the last 5 or 10 versions of the token. Since the user has just re-logged in, we're sure to have an active BrowserID certificate, so the assertion generation should not require additional user interaction. The first old-token (vernum-1) will probably match, but if not (KnownUserUnknownToken) we'll try older and older tokens until we run down to version=0, at which point we'll give up. When we get Success, the storage-server has been updated to the newest token, and all other devices will be unable to access ciphertext until they are updated too.
On the not-doing-the-revocation device, the periodic background poll will suddenly get an UnknownToken error. This will prompt it to re-log-in to the Key Server, which requires an assertion, which needs user interaction (if the IdP credentials have expired). Once it learns the new version number, it computes the latest token, and tries the Read() again. If that succeeds, it forgets kA/kB/kC as usual. If it fails, the storage server may not yet be updated (perhaps the original browser didn't hit all the necessary services), and it uses UpdateToken() to update the server.
When background polls get an UnknownToken error, they should probably display an error indication to the user ("unable to sync") and offer a button to re-login, rather than spontaneously popping a login dialog.
We should consider a mechanism by which browsers can poll to learn when vernum has changed, and then proactively forget their domain-specific decryption keys. This reduces the window for an attacker to extract keys from a stolen device: the moment it learns that a revocation has taken place, it wipes the decrypted plaintext and decryption keys. Then, if it can still obtain an assertion, it can re-acquire the keys and start polling again. This "am I revoked" mechanism should not depend upon having an assertion (since these expire before the polling should stop), so perhaps it should use another token held by the Key Server which can only be used for this sort of poll.
Using server-data-replacement for revocation isn't the most pleasant scheme. One alternative would have the Key Server issue certs which expire after some limited time, but which can be renewed by a token retained in the browser. When a client tells the Key Server to revoke other devices, this token is deleted, and the old device will eventually lose the ability to create certs that will be accepted by Storage Servers. However uncooperative devices will still know valid kA/kB (so they could collude with a storage server or IdP to obtain ciphertext), which can only be handled with re-encryption. It would also require the Key Server to publish a public key (just like primary IdPs would), which would be a centralized attack target, and would require Storage Servers do frequent public-key crypto operations (instead of cheap token checks). However the similarities between a cert-issuing Key Server and a primary IdP (or the Persona secondary/fallback IdP itself) are worth exploring.
Pairing
TBD. Basic idea is to use Sync's JPAKE scheme, or the more modern SPEKE2/PAKE2 protocol, with the Key Server providing the rendezvous point, but require an email address and BrowserID assertion. Since the email uniquely identifies the channel, we can use a shorter code (perhaps 4 characters). In fact, since the assertion protects the channel against arbitrary attackers, we can probably use a short PIN (three or four digits): the only attackers are the user's IdP and the Key Server itself.
The Key Server also assists with device management: it can tell the user which devices have reported completing the pairing process, and which are in-progress. It can also avoid the situation where two devices both think they are the first to use class-C data and thus both responsible for creating kC (creating incompatible keys).
Jetpack Modules
The first implementation will probably be a cluster of Jetpack modules. The "browserid" module must, for now, open a visible tab from an externally-hosted site, which can use include.js to present the BrowserID signin dialog. Once BrowserID accepts non-http/https schemes for the audience= field, this can change to opening a visible tab from an internally-provided resource: URL. Later, the module can use chrome UI for the dialog, rather than opening and closing a regular browser tab. The API should remain the same.
The KDF module is where all the protocol work happens. It should probably have a register-callback method, then a "go" method (just like navigator.id.watch() and navigator.id.request()). The callback should receive the data-encryption keys, functions to encrypt/decrypt data with those keys, and something to access tokens. We need a way to inform the KDF module that we've successfully communicated with the storage server, and we no longer need old tokens (so it can discard kA/kB/kC safely).
The application-specific module then talks to whatever internal resource is being synchronized, and makes HTTPS requests to its storage server.