User:Mconnor/Current/Project Sisyphus
From MozillaWiki
Contents
Overview
Fixing busted trees and productivity-eating context switches to stare at webpages. That is the stone we've been rolling up the hill for years. We have the ability to fix this, using technology (omg!) and some leveraged approaches. This is not about solving problems for any particular group, this is about removing a major productivity sink across the project.
Image via fouro on Flickr
What success looks like
mozilla-central is almost always open for business
- 90% of pushes to m-c are green with no regressions
- tree bustage is resolved in less than 30 minutes
- tree is open 95% of the time.
Computers watch the tree, not humans
- developers are notified of problems in their push, no user-polling
- current sheriff(s) are also notified, as is dev-tree-management
- All regression reports are tracked automatically, and someone (sheriffs) ensure resolution
- If resolutions are not addressed, patch gets backed out
- Known random oranges do not turn the tree orange unless they get worse
Managing the tree is easy
- It is easy to close and reopen multiple trees
- All closures are logged, data is easily available on reasons/type/duration
Sheriffs exist to help solve problems, not find them
- Sheriff duty is a full-time job (when you're on duty) so that everyone else can focus on code, not tree state
- Focus shifts from finding problems to being point on dealing with them
- We have expanded coverage to cover 80% of pushes
How we get there
There is no silver bullet, just lots of hard work over a long period of time. But every bit helps.
Keeping mozilla-central green
Implement mozilla-inbound
Optional for now, will revisit later
Resolve backouts quickly
Note: raised on dev.planning, new proposal coming soon
Continue to encourage teams to adopt a project branch model
- Some movement here.
switch onchange builds to non-PGO to catch problems dramatically sooner
Automating tree watching
Extend perf regression finder to mail pusher + sheriff
Use Pulse to notify pusher + sheriff on failures
Verify intermittent oranges are intermittent automatically
Tree management
Build a better tool for managing tree status
Build a regression dashboard to ensure that all perf/test regressions are tracked and addressed
Sheriff Evolution
Broader coverage
- Get sheriff tool online
- Get multiple shifts per day, to cover across timezones
Changed role
- Sheriff is now point to:
- merge mozilla-inbound
- ensure regressions are backed out or bugs are filed, as appropriate
- address bustage on the main tree