CloudServices/SimplePushServer

From MozillaWiki
Jump to: navigation, search
Obsolete!

Please refer to the Push Service docs.

SimplePush Server

Overview

Provide a service to allow Third Party Application servers to notify their Web Apps that an event has occurred and action may be required, without requiring a web page to be constantly present and connected to the Third Party Application Server

Project Contacts

Principal Point of Contact - Doug Turner dougt@mozilla

IRC - #push

Group Email - TBD

Goals

Provide a scalable, fast server for the SimplePush protocol as defined by https://wiki.mozilla.org/WebAPI/SimplePush.

In brief, SimplePush is a near dataless method to remotely wake a client application so that it can call "home" and determine what actions are needed. It solves the power and wasted bandwidth concerns of having dozens of applications constantly needing to be connected back with no action required.

This will provide endpoints for both websocket clients and PUTs from third party servers. SimplePush

Use Cases

Use cases are defined here

Definitions

Requirements

  • APP requests an ENDPOINT from the PUSH CLIENT and shall register two callback functions, one for receipt of the ENDPOINT, and a second for handling of a VERSION EVENT
  • If not already present, PUSH CLIENT shall generate a unique UUID4 Identifier for the UserAgent (UAID)
  • PUSH CLIENT shall generate a unique UUID4 Identifier for the APP (APPID)
  • PUSH CLIENT shall send UAID, APPID and any additional information required for proprietary KICK to the PUSH SERVER
  • PUSH SERVER shall create an ENDPOINT for the UAID and APPID and return it to the PUSH CLIENT.
  • If a KICK driver is present, PUSH SERVER shall relay appropriate PUSH CLIENT provided information to the KICK driver.
  • PUSH CLIENT tenders the ENDPOINT to APP via callback.
  • APP sends ENDPOINT to the APP SERVER
  • On VERSION EVENT, APP SERVER PUTs version value to ENDPOINT
  • If a PUSH CLIENT is currently connected to APP SERVER, APP SERVER relays an UPDATE containing currently pending VERSION EVENTS.
  • If a PUSH CLIENT is NOT currently connected, an optional, proprietary KICK driver may be called to wake devices associated with the corresponding ENDPOINT UAID.
  • If a PUSH SERVER is unable to immediately deliver a VERSION EVENT, the VERSION EVENT is logged to short term storage.
  • PUSH CLIENT connects to the PUSH SERVER and shall identify a list of one or more UAIDs it is responsible for.
  • If there are VERSION EVENTS pending for requested UAIDs, PUSH SERVER sends an UPDATE packet (For this template, italicized names would be replaced by actual values):
{ UAID: {
   {APPID: VERSION}, 
   ... },
  ... }
  • If no VERSION EVENTS are pending for the requested UAIDs, PUSH SERVER may return a status indicating no data available (for REST implementations) or simply not return content (for WebSocket)
  • During the transmission of the UPDATE, a PUSH SERVER may wish to return a 503 (Service Unavailable) error to APP SERVERS for any VERSION EVENT associated with an in progress UAID, so as to prevent potential race conditions.
  • On receipt of UPDATE, PUSH CLIENT shall return an ACK to the PUSH SERVER.
  • The ACK shall contain a list of UAIDs for which all APPIDs have been properly received.
  • The PUSH SERVER shall then clear APPID version information from short term storage, and re-allow version updates for those UAIDs if currently blocked.
  • The PUSH CLIENT shall then notify APPs of the VERSION EVENT using the appropriate callback, and passing the VERSION

NOTE: a PUSH RELAY may be created by combining the polling aspects of the PUSH CLIENT with the data management and KICK driver of the PUSH SERVER. This would allow a VERSION EVENT system to enter protected networks or use restricted means to communicate to USER AGENTs. It is important to note that once a PUSH SERVER has received an ACK for a given UAID, the PUSH SERVER is under no obligation to retain that data, and proper relay of the VERSION EVENT is the PUSH RELAY's problem.

Get Involved

Call to action for folks who want to help.

Design

Points of Contact

Server Engineer - JR Conlin jrconlin@mozilla

The protocol is defined here

Platform Requirements

This system runs on linux systems as a Go executable.

Go executables are mostly self contained, however the following external systems are strongly recommended:

  • a memcached server cluster
  • heka logger

It should also be noted that Go's SSL implementation is surprisingly CPU intensive as of 1.1.2. For our implementation, we decided that since PUTs require more setup/teardown than longer lived Websocket connections, we would use AWS ELB SSL termination to handle the secure PUTs. If peak user load is not expected to be higher than 100K or so, this may not be required.

Proprietary Ping Requirements

Code Repository

Previously (as referenced in other parts of this page), https://github.com/mozilla-services/pushgo/.

Currently, https://github.com/mozilla-services/autopush/ is used.

Release Schedule

1.4

  • Target date: 20/11/2014
  • Released: 20/11/2014

Common Changes:

  • Fixes to critical etcd routing bug
  • Fixes to Travis testing
  • Convert to toml configuration system
  • Integrated smoke tests
  • Various optimizations.

Loop Push

  • Create system that does not store data

Simple Push

  • No system specific changes made

1.4.2

Common Changes:

1.5

  • Target Date: TBD
  • Released: Unreleased

Common Changes:

  • Include support for "data" to connected devices only

QA

Points of Contact

  • Primary - kthiessen@
  • Backup - rpappalardo@

Test Framework

There are several test frameworks in place. Most systems are stand alone test suites so that they may be applied both to the current server and any externally created system.

https://github.com/mozilla-services/simplepush-testpod - provides an end-to-end stress test of the system.

https://github.com/jrconlin/simplepush_test - provides a quick "smoke test" as well as a thorough API test of bad or malicious tests.

Notes

Security and Privacy

wiki page: https://wiki.mozilla.org/Security/Reviews/SimplePushSrv

Points of Contact

Review Status

Bugzilla Tracking # - https://bugzilla.mozilla.org/show_bug.cgi?id=897454

https://wiki.mozilla.org/Security/Reviews/SimplePushSrv

Issues and Resolutions

Operations

Points of Contact

Current Ops-Engineers are oremj@ and bwong@

Deployment Architecture

Bugzilla Tracking # -

https://mana.mozilla.org/wiki/display/SVCOPS/SimplePush#SimplePush-Deployments

Deployment Request Template

Currently, deployments are created from Github Releases. Releases for Stable and Production must be signed off by QA before deployment.

Bugzilla Subject: Please deploy Tag name to Product Platform

Bug content:

  Version: 
   Release Title (e.g. Simple Push Server 1.4.2 Release Candidate 1)
   Release URL (e.g. https://github.com/mozilla-services/pushgo/releases/tag/1.4.2rc1) 
  
  Product:
   What platform to deploy the target to (e.g. push-web, push-loop) 
  
  Config Changes:
   +[handlers]: handlers section added.
   ~[default]: default section options changed.
   -[propping]: propping section removed.
   ...
   
  Deployment Notes:
   Please note any other modifications which may cause the server to fail to start. 

Escalation Paths

Lifespan Support Plans

Logging and Metrics

Current logging and metrics are being filtered into the Heka system. Final logging and metrics are TBD; depending on the sorts of data that needs to be detected.

Points of Contact

Tracking Element Definitions

Data Retention Plans

Dashboard URL

The following are mozilla private URLs.

Usage metrics are viewable at: https://graphite.shared.us-west-2.prod.mozaws.net/grafana/#/dashboard/file/default.json and use the following view template: https://dl.dropboxusercontent.com/u/361111/Push%20_%20SimplePush%20Stats-1421198257367

Customer Support

External Libraries & Users

Deployments

Loop

Hello (aka Loop) is a WebRTC based video chat program that is available to desktop and mobile devices. It uses a specialized version of SimplePush that does not have back end storage, since there is no need to alert connections that are offline.

Configuration

All systems are deployed to AWS and are set to autoscale within clusters. Clusters are not yet configured to scale, however ops is working to address this.

There are two non-production networks of systems:

Stable
This is a non production environment which hosts a stable development version of Simplepush for development and integration tests. This version may be auto-updated from the "dev" branch of https://github.com/mozilla-services/pushgo.
QA
This is a non production environment that hosts the stable, pre-release version of SimplePush for load and QA testing. This version updates from explicit releases generated by https://github.com/mozilla-services/pushgo.

It should be noted that effort is currently being made to ensure that the push service used by Hello is not significantly different (API wise) from the standard Push service.

Deployment Architectures

The stable environment is currently configured for push-loop-dev.stage.mozaws.net (for long lived socket connections) and updates-push-loop-dev.stage.mozaws.net (to receive the REST version PUTs)

While the client has the ability to retry a connection if a given machine is not responsive, it has been requested by Ops that Push Protocol Redirects be re-enabled and that a separate suite of machines be created to do connection load balancing. (so clients would first connect to the central server, and then be redirected to a machine with available resources.)

Monitoring And Metrics

Push currently provides metrics using logstash compatible reporting mechanisms (e.g. stackdriver). In addition, logs are scraped and information displayed via kibana. Currently monitoring and metrics are being collected and displayed on to different systems. Effort will be made to simplify this.

TODO: Dev & Ops need to identify a list of key health metrics to monitor for this system.

Points of Contact

In the event of significant events, operations notifies members of development teams that actions are required.

TODO: dev will provide contact information as well as a "jiggle list" of actions which may alleviate issues.

Find My Device

Configuration

Deployment Architectures

Monitoring And Metrics

Points of Contact