ReleaseEngineering/Mozpool

From MozillaWiki
Jump to: navigation, search

Overview

Mozilla needs to run its applications on various mobile devices, such as Tegras, Pandas, and even full smartphones. These devices do not act much like the servers that fill the rest of Mozilla's datacenters: they have limited resources, no redundancy, and are comparatively unreliable. With the advent of Firefox OS, Mozilla also needs the ability to automatically reinstall the entire OS on devices.

Mozpool is a system for managing these devices. Users (automated or human) who need a device matching certain specifications can request one from Mozpool, and Mozpool will find such a device, installing a new operating system if necessary. The middle layer of the system (Lifeguard) handles such reinstalls reliably, and also detects and investigates device failure, removing problematic devices from the pool. System administrators can examine these failed devices and repair them, returning them to the pool. The lowest level, Black Mobile Magic (BMM), handles low-level hardware details: automatic power control via IP-addressable power switches; a network-hosted Linux environment for performing software installations; and pinging, logging, and so forth.

Because continued operation of this system is business-critical, it is designed to be resilient to failure not only of individual devices, but to the servers running Mozpool itself.

Policies and Procedures

Available Device Images

panda-android-4.0.4_v3.2
Added SUTAgent 1.20 to base image
panda-android-4.0.4_v3.3
Added Adobe flash 11.1.115.81 to base image
panda-android-4.0.4_v3.1
todo
android
todo
repair-boot
todo
b2g
obsolete

How-To's

Links

Architectural Description

See http://hg.mozilla.org/build/mozpool/file/default/README.md for the most up-to-date architectural description of the system.

Source

The source is at http://hg.mozilla.org/build/mozpool

User Interface

The Mozpool user interface is available through a web browser. The home page shows the three layers of the system (Mozpool, Lifeguard, and BMM). Clicking on any of those shows a UI specific to the layer. The BMM UI allows direct control of device power, as well as manual PXE booting; this layer is of most interest to datacenter operations staff. The lifeguard layer allows managed PXE boots and power cycles, as well as forced state transitions.

Deployment

Mozpool is a Python daemon that runs on multiple imaging servers. It uses a database backend and HTTP API for communication between servers. Its frontend is a dynamic web application. The BMM equipment - TFTP servers, syslog daemons, and so on - runs on the same systems.

Mozpool is designed to be deployed in multiple "pools" within Mozilla. The first and likely largest is release engineering.

Release Engineering

In the scl3 datacenter, we have an initial deployment of 10 racks of Pandaboards. Each rack holds about 80 Pandas, grouped in custom-built chassis, for a total of about 800 pandas. Each rack also contains seven "foopies" (proxying between pandas and Buildbot) and one imaging server. Each rack has a dedicated VLAN, keeping most network traffic local to the rack. The database backend is MySQL. See the puppet modules, linked above, for more details of the deployment.

At the BMM and Lifeguard levels, each imaging server is responsible for the pandas in its rack, as assigned in inventory. At the Mozpool level, each imaging server is responsible for all requests that were initiated locally. Mozpool uses HTTP to communicate with Lifeguard on other imaging servers when it needs to reserve a non-local device.

Mozpool Client

In Release Engineering we use the mozpool client to talk with the Mozpool servers to request panda boards. To do this we install the python package inside of a virtual environment. The package is stored in pypi:

To create a new packaged version, checkout the mozpool repo and do the following:

  1. Make your code changes
  2. Update the version in setup.py
  3. Add a new line to CHANGES.txt with the new version, the date and what is changing
  4. cd mozpoolclient && python setup.py sdist

To deploy to our pypi setup follow these instructions.

There is also a "fork" of the client code that lives in the tools repo: http://hg.mozilla.org/build/tools/lib/python/vendor/mozpoolclient-0.1.6

To update this version run the following commands:

OLD=0.1.5
NEW=0.1.6
cd tools/lib/python/vendor
hg move mozpoolclient-${OLD} mozpoolclient-${NEW}
# Assuming mozpool is checked out at the same level as your tools repo.
rsync --recursive --delete ../../../../mozpool/mozpoolclient/* mozpoolclient-${NEW}
#Bump the version in here http://mxr.mozilla.org/build/source/tools/lib/python/vendorlibs.pth
vi ../vendorlibs.pth
hg commit -m"Bumping mozpool client vendor version from ${OLD} to ${NEW}"
hg push

NOTE: if you're making API changes to the mozpool client, you'll need to update the consumers in the tools repo as well before committing.

If you're the pypi package maintainer (armenzg or dustin), you can follow these [??? instructions].