Identity/AttachedServices/DeploymentPlanning/ScryptHelperService
Contents
Overview
This is a working proposal for the backend architecture and deployment of the Scrypt Helper service.
The immediate and only goal of this service is to let New Sync devices outsource some of the computational costs of the Firefox Accounts authentication process. It is a stateless but computationally-expensive service.
Goals and Milestones
The goal for Q32013 is to have this service Production Ready.
This does not mean having a fully-deployed production environment! With the implementation of the storage component still outstanding, there's no point in standing up an authentication service all by itself. It does mean that we need the ability to do automated deployments that pass loadtests, meet operational and security criteria, and generally inspire confidence that we could roll things out to production without major hiccups.
These are the individual milestones on our way to said goal, broken down into weekly chunks:
- Aug 09:
- usable manual-deploy dev environment tooling
- using awsboxen top stand up a single box, with a simple nginx+gunicorn+python-app setup
- usable manual-deploy dev environment tooling
- Aug 16:
- defined testable "success criteria" for Scrypt Helper:
- Target number of concurrent users.
- Target number of scrypt operations per second.
- Total cost of AWS resources
- defined testable "success criteria" for Scrypt Helper:
- Aug 23:
- (reserved for work on Firefox Accounts)
- Aug 30:
- loadtesting code written and debugged.
- this will use loads and loads.js to hook into MozSvc loadtesting cluster
- the client API is so thin, we will just write custom tests to exercise it
- Dependencies: stable server API
- loadtesting code written and debugged.
- Sep 6:
- automated single-region loadtest deployment
- built using cloudformation and awsboxen
- including shipping logs and performance data to a Heka/ES/Kibana box for analysis
- loadtests run against Scrypt Helper loadtest environment.
- Dependencies: stable and performant loads cluster infrastructure, usable log analysis tools
- automated single-region loadtest deployment
- Sep 13:
- fixed any load-related issues
- (mostly reserved for work on Firefox Accounts)
- Sep 20:
- tweaking, tweaking, always tweaking...
- Sep 27:
- security review signoff.
- svcops signoff.
- Sep 30:
- Production Ready!
There are likely a lot of SvcOps details missing from this plan, e.g. monitoring and infrasec things. We'll do what we can to get those in place, but I'm OK with an initial loadtest environment that's taking some of that stuff.
Dev Deployment
Development deployments are done using awsboxen - plain awsbox is not suitable since this is not a nodejs app. :rfkelly will take responsibility for a basic awsboxen script that just stands up a single box.
Loadtest Deployment
To begin we will script this in the scrypt-helper repo, using awsboxen/cloudformation. That should suffice for initial QA and loadtesting purposes. If and when we need to migrate to other SvcOps tools, the cloudformation stuff will be a good starting point.
Architecture
This will be a multi-region high-availability deployment. Since it a prerequisite for use of the Firefox Accounts service by low-powered devices, it should have a least the same availability as the Firefox Accounts service itself.
This is a very skinny and simple service, consisting of a single URL endpoint. We'll run an autoscale cluster of machines behind a simple public ELB:
client +------------+ +-----------------------+ requests --> | Public ELB |--->| Scrypt Helper Cluster | +------------+ +-----------------------+
Since the work of this service is compute-bound, we'll probably run a small number of beefy compute nodes. :warner has some benchmarking results on the appropriate machines to use here.
Security
We need some good abuse-detection and abuse-prevention strategies in here, it's a great big DoS waiting to happen.
It's also a high-value target in that the machines will be holding cheap verifiers of user password, rather than the more expensive scrypt verifiers held by the Firefox Account server.
So, what do we do to keep these secure?
TDB.
Supporting Infrastructure
Each machine will run a local Heka agent to aggregate logs. They will be shipped to a stand-alone Heka router assocated with the region, which will in turn forward them to the shared ElasticSearch /Kibana thingy.