Releases/Firefox 5/Risk mitigation strategies
From MozillaWiki
Contents
Why?
We need to see where we are at with various risk factors for Firefox 5 and ways to mitigate that risk if we aren't comfortable with the level.
This page / planning does not mean we NEED to do any of these or Firefox 5 isn't ready to release. It is merely prudent to discuss where we are at, what's in our control, and ways to mitigate risk before they are needed.
Current risk profile
Add-ons
Mobile
- AMO has no compatibility bumping for mobile
- Mobile has about 50% compatible
- Lower than we would like to see it
- We should manually look at the recommended add-ons and bump them
- Not very many binary add-ons
- Don't think we should hold the release if we don't increase the percentage
Desktop
- 78% compatible with Firefox 5 for the add-ons on AMO
- Large portion of the remaining percentage is the .NET Framework Assistant
- Talked with the developer at Microsoft, said he would update his add-on. We don't have a timeframe for the update though
- Risk: LOW for AMO add-ons. HIGH for non-AMO add-ons
- Most have updated, and the ones that aren't are waiting for release
- https://addons.mozilla.org/en-US/firefox/compatibility
- 78% of addons compatible
- .net framework assitant: ETA?
- AVG, Synamtec, McAfee, Kaspersy should be ready
- Haven't heard back from google for the toolbar, not currently compatible
- From the add-on side we should do a very gradual rollout so add-on authors have time to update before the bulk of our users are affected by incompatibilities
Stability
Mobile
- Mobile crash data is close/the same as 4.0.1
- See higher crashes in beta, as the ADUs grow the crash rate goes down a bit
- Number of users on beta are small (but the best we've ever had)
- Not watching any particular bug to see if it flares up after the release
- Looks good for release
5.0 - 4.0.1 crashes ADUs throt crash/100 crashes ADUs throt crash/100 2011-06-16 49 7,114 100% 0.69% 809 165,235 100% 0.49% 2011-06-15 40 6,583 100% 0.61% 771 165,907 100% 0.46% 2011-06-14 43 6,069 100% 0.71% 785 163,882 100% 0.48
Desktop
- 1.7 million users on 5.0 overall.
- Crash rate fairly low at 1.36 crashes per 100 ADU.
- Distribution of users scattered across all betas - http://test.kairo.at/socorro/2011-06-16.buildcrashes.html.
- Risks
- No good data right now on any one beta for 1 million+ users.
- b7: 89K users, 6.799 crashes per 100 ADU.
- b6: 295K users, 1.436 crashes per 100 ADU.
- The last several betas have never increased much beyond 250K users.
- We know from 4.0 experience that the crash landscape changes above 1 million, 2 million, 5 million. We had over 2 million beta users for pre 4.0 builds.
- Not enough data to really understand if there is top crasher.
- 2 Flash releases in the last week and a half.
- From the stability side we need automatic updates to 6-10 million users to be confident releasing to the rest. That calls for a release method to get that many ADUs, pause while we interpret the data, then open it for everyone
Security
- bug 659349 has details released prematurely
- Filed May 24th. Got a fix into Firefox 5. They talked about the details 5 days early
- Not the most significant bug we are fixing in this release
- Screenscraping bug that will affect 40-50% of our users that have machines that can run WebGL
- People expect us to talk about it during the release
- If we are doing a slow rollout we might have to delay the security advisories
- Very uncomfortable with a slow rollout because people will go look
- Other than bug 659349, we're in good shape for a Tuesday release
- Most were found internally
- The external bugs are both sg:moderate
- At this point it's really too late to do a 4.0.2, unless we think Firefox 5's uptake is going to be terrible
- If we are going to roll out slowly in the future, we need to start discussing a possibly 5.0.1
- From the security side we want to release as quickly as possible
Web compatibility
- The WebGL disabling cross-texture
- setTimeout background time clamping has potential negative consequences, none we know about
- A throttled roll out may help us find issues before the whole audience is exposed to it
- From the web compatibility side a gradual rollout will let us know if these web compatibility issues affect our userbase before exposing the entire userbase
Dials we can adjust
Advertised vs unadvertised update
- We could offer an advertised (major) update rather than an unadvertised (minor) one
Pros
- Gives users more notice / lets them opt-in
- Ability to speak directly to users via the billboard
- Users may be more tolerant of add-on incompatibility due to better mental preparation
Cons
- Slows uptake
- If the user chooses never we don't have a point release in a month reprompting them
- More users exposed for longer if we announce security vulnerability details
- Requires webpage creation, copy creation, and localization--none of which has been done
- requires manual RelEng touching of the updates; some small QA impact TBD
Manual-only update
- We could only offer a manual download from Mozilla.com. Users would only get the in-product update if they manually check for updates
Pros
- Minimizes risk to userbase while still being technically released
- Gives users more notice / lets them opt-in (either from mozilla.com or checking for updates manually)
- Users may be more tolerant as they explicitly looked for and installed the release
- Press around release may prompt add-on makers to update their add-ons
Cons
- Slows uptake considerably
- Do we disclose security vulnerability details?
- Some may not view it as a release if it is only available when manual action is taken
Throttled automatic update offers
- Release as normal but have some percentage of update pings return no update available
Pros
- Lowers risk across the entire userbase
- Gives add-on developers additional time to increase compatibility
Cons
- Gives some users more risk, others less
- May be harder to see crash spikes as the user ramp is gradual
- May be harder to get initial feedback as the volume could be too low to determine if something is a major issue
- More users exposed for longer if we announce security vulnerability details
Outcome
- clooney/mfinkle will take point on getting all featured mobile add-ons compatible or removing them from the featured list
- Mobile doesn't need to throttle
- No one wanted to do prompted update for desktop
- lmesa liked Manual-only the best for desktop
- We decided it didn't get us where we needed to be testing-wise
- Not the best from a security standpoint
- Discounted
- Argued to throttle @ 100% and then cut it off when we hit enough of an audience or to throttle at some percentage and later increase to 100%
- Decided to throttle automatic updates to 25-33% for a maximum of 51 hours (48 + 3 hours to get us to a regular PDT time)
- Asked for 72 hours, security team was more comfortable with 48 hours
- Staying throttled (or turning off updates entirely) after 51 hours needs to have clear justification and signoff from the security team
- clegnitto and joduinn decided on 33% (based on some WAG numbers) as they would rather overshoot than undershoot
- clegnitto will work with metrics to get hourly ADU reports