Auto-tools/Projects/MozPool

From MozillaWiki
Jump to: navigation, search

NOTE: The MozPool design has been changed to be more tightly integrated with LifeGuard and BlackMobileMagic. See http://hg.mozilla.org/build/mozpool/file/default/README.md for the whole system's high-level design.

Goals

MozPool is a simple service to manage a pool of devices. The main goal is to integrate it into buildbot for running mobile/B2G unittests, as part of the future foopy-less system. MozPool can, however, be used for any system, such as AutoPhone.

MozPool itself is just a reservation system that allows machine and human users to check out and check in devices. It can be used to obtain specific devices, devices meeting specific criteria, or any and all devices. MozPool will use Lifeguard to maximize the probability that an assigned device is in a good state and working properly.

MozPool will have both an HTML interface, for humans, and a RESTful JSON API for machines (the latter will probably also be used internally by the former).

Non-Goals

By itself, MozPool does not do any error checking; that is the purpose of Lifeguard. Lifeguard may, however, be fairly tightly coupled with MozPool. MozPool is only responsible for assigning devices matching the user's criteria that are known to be in a good state.

It would be nice but not crucial to have MozPool run on any platform. Initially, Linux should be sufficient. Similarly, it will target Python 2.7 but may run on lower versions.

Links

Requirements

  1. MozPool should provide a mechanism for devices to register themselves with it. This may be through the SUTAgent registration protocol, or by querying an inventory server.
  2. Once a device is registered with MozPool, it should track its state.
  3. MozPool should provide a web UI that users can use to see the status of connected devices, check them out, and check them back in again.
  4. MozPool should provide an API with which remote components can interact with it (via TCP sockets or HTTP), and should include the following. Note that these may be proxied to Lifeguard or a similar service.
    1. an API to request a device for testing. This API should accept some parameters: processor (armv6 vs armv7), hardware type (panda, ...), pool (b2g vs mobile), and potentially android version. It should return an identifier of a device that it has a valid recent ping, and then mark the status of the device accordingly (e.g., 'checked_out').
    2. an API to return a checked_out device to the pool. This API should accept a device identifier. After being returned to the pool, MozPool should ask Lifeguard to reboot the device (?) and verify it is alive, after which the status should be updated to online.
    3. an API to set the status of the device (offline or online) and to attempt rebooting or resetting its power.
      1. If a user marks a device as online, Lifeguard should attempt to bring the device online; if it fails, it should return the status to offline.
    4. an API to ping a device, to see if it's online.
    5. an API to flash a given B2G build on a device (do we need to be able to flash fennec boards as well?).
    6. an API to reboot a device, given its identifier.
    7. an API to reset the power on a device, given its identifier.
    8. an API to get the current status of a device, given its identifier. The status should include device state, and any other details that another process would need in order to initiate a remote flash of the device.
    9. an API to set the current state of a device, given its identifier.
    10. additional APIs needed to support the Web UI above
  5. MozPool should scale to handle a large number of devices (several hundred, exact number still TBD)
  6. MozPool should have unit tests, that we can run before committing changes.
  7. MozPool should have a staging environment, and integration test that we can run in a live environment.
  8. MozPool should maintain a detailed log including, among other things, details on device registrations, device state transitions, and all API requests.
  9. MozPool should be well-documented.

Open Questions

  • What is the best way to handle the API calls that just pass through to Lifeguard? Note that it's important to use MozPool for these actions (e.g. pinging, rebooting, etc.) so that MozPool doesn't assign the device out to someone else while Lifeguard is working on it.
  • If Lifeguard/BMM is responsible for installing a particular build of Fennec or B2G on a device before returning it (which makes sense, as it abstracts this operation away from the test harnesses), it may take some time for a check-out operation to complete. Is it acceptable to leave the connection open during this time, or should we use a callback mechanism? If the latter, we'll definitely need a Python client library to simplify the procedure.

API

The API is documented along with that of LifeGuard at http://hg.mozilla.org/build/mozpool/file/default/API.txt

Testing

See Auto-tools/Projects/MozPool/LocalTesting for a walk-through on testing with a local server and fake data.