Releases/Firefox 5/Risk mitigation strategies

From MozillaWiki
Jump to: navigation, search

Why?

We need to see where we are at with various risk factors for Firefox 5 and ways to mitigate that risk if we aren't comfortable with the level.

This page / planning does not mean we NEED to do any of these or Firefox 5 isn't ready to release. It is merely prudent to discuss where we are at, what's in our control, and ways to mitigate risk before they are needed.

Current risk profile

Add-ons

Mobile

  • AMO has no compatibility bumping for mobile
  • Mobile has about 50% compatible
    • Lower than we would like to see it
  • We should manually look at the recommended add-ons and bump them
  • Not very many binary add-ons
  • Don't think we should hold the release if we don't increase the percentage

Desktop

  • 78% compatible with Firefox 5 for the add-ons on AMO
  • Large portion of the remaining percentage is the .NET Framework Assistant
    • Talked with the developer at Microsoft, said he would update his add-on. We don't have a timeframe for the update though
  • Risk: LOW for AMO add-ons. HIGH for non-AMO add-ons
  • Most have updated, and the ones that aren't are waiting for release
  • AVG, Synamtec, McAfee, Kaspersy should be ready
  • Haven't heard back from google for the toolbar, not currently compatible
  • From the add-on side we should do a very gradual rollout so add-on authors have time to update before the bulk of our users are affected by incompatibilities

Stability

Mobile

  • Mobile crash data is close/the same as 4.0.1
  • See higher crashes in beta, as the ADUs grow the crash rate goes down a bit
  • Number of users on beta are small (but the best we've ever had)
  • Not watching any particular bug to see if it flares up after the release
  • Looks good for release
                           5.0                -             4.0.1
             crashes   ADUs   throt  crash/100  crashes   ADUs         throt  crash/100
2011-06-16 	49 	7,114 	100% 	0.69% 	809 	165,235 	100% 	0.49%
2011-06-15 	40 	6,583 	100% 	0.61% 	771 	165,907 	100% 	0.46%
2011-06-14 	43 	6,069 	100% 	0.71% 	785 	163,882 	100% 	0.48

Desktop

  • 1.7 million users on 5.0 overall.
  • Crash rate fairly low at 1.36 crashes per 100 ADU.
  • Distribution of users scattered across all betas - http://test.kairo.at/socorro/2011-06-16.buildcrashes.html.
  • Risks
    • No good data right now on any one beta for 1 million+ users.
    • b7: 89K users, 6.799 crashes per 100 ADU.
    • b6: 295K users, 1.436 crashes per 100 ADU.
    • The last several betas have never increased much beyond 250K users.
    • We know from 4.0 experience that the crash landscape changes above 1 million, 2 million, 5 million. We had over 2 million beta users for pre 4.0 builds.
    • Not enough data to really understand if there is top crasher.
    • 2 Flash releases in the last week and a half.
  • From the stability side we need automatic updates to 6-10 million users to be confident releasing to the rest. That calls for a release method to get that many ADUs, pause while we interpret the data, then open it for everyone

Security

  • bug 659349 has details released prematurely
    • Filed May 24th. Got a fix into Firefox 5. They talked about the details 5 days early
    • Not the most significant bug we are fixing in this release
    • Screenscraping bug that will affect 40-50% of our users that have machines that can run WebGL
    • People expect us to talk about it during the release
  • If we are doing a slow rollout we might have to delay the security advisories
  • Very uncomfortable with a slow rollout because people will go look
  • Other than bug 659349, we're in good shape for a Tuesday release
    • Most were found internally
    • The external bugs are both sg:moderate
  • At this point it's really too late to do a 4.0.2, unless we think Firefox 5's uptake is going to be terrible
  • If we are going to roll out slowly in the future, we need to start discussing a possibly 5.0.1
  • From the security side we want to release as quickly as possible

Web compatibility

  • The WebGL disabling cross-texture
  • setTimeout background time clamping has potential negative consequences, none we know about
  • A throttled roll out may help us find issues before the whole audience is exposed to it
  • From the web compatibility side a gradual rollout will let us know if these web compatibility issues affect our userbase before exposing the entire userbase

Dials we can adjust

Advertised vs unadvertised update

  • We could offer an advertised (major) update rather than an unadvertised (minor) one

Pros

  1. Gives users more notice / lets them opt-in
  2. Ability to speak directly to users via the billboard
  3. Users may be more tolerant of add-on incompatibility due to better mental preparation

Cons

  1. Slows uptake
  2. If the user chooses never we don't have a point release in a month reprompting them
  3. More users exposed for longer if we announce security vulnerability details
  4. Requires webpage creation, copy creation, and localization--none of which has been done
  5. requires manual RelEng touching of the updates; some small QA impact TBD

Manual-only update

  • We could only offer a manual download from Mozilla.com. Users would only get the in-product update if they manually check for updates

Pros

  1. Minimizes risk to userbase while still being technically released
  2. Gives users more notice / lets them opt-in (either from mozilla.com or checking for updates manually)
  3. Users may be more tolerant as they explicitly looked for and installed the release
  4. Press around release may prompt add-on makers to update their add-ons

Cons

  1. Slows uptake considerably
  2. Do we disclose security vulnerability details?
  3. Some may not view it as a release if it is only available when manual action is taken

Throttled automatic update offers

  • Release as normal but have some percentage of update pings return no update available

Pros

  1. Lowers risk across the entire userbase
  2. Gives add-on developers additional time to increase compatibility

Cons

  1. Gives some users more risk, others less
  2. May be harder to see crash spikes as the user ramp is gradual
  3. May be harder to get initial feedback as the volume could be too low to determine if something is a major issue
  4. More users exposed for longer if we announce security vulnerability details

Outcome

  • clooney/mfinkle will take point on getting all featured mobile add-ons compatible or removing them from the featured list
  • Mobile doesn't need to throttle
  • No one wanted to do prompted update for desktop
  • lmesa liked Manual-only the best for desktop
    • We decided it didn't get us where we needed to be testing-wise
    • Not the best from a security standpoint
    • Discounted
  • Argued to throttle @ 100% and then cut it off when we hit enough of an audience or to throttle at some percentage and later increase to 100%
  • Decided to throttle automatic updates to 25-33% for a maximum of 51 hours (48 + 3 hours to get us to a regular PDT time)
    • Asked for 72 hours, security team was more comfortable with 48 hours
    • Staying throttled (or turning off updates entirely) after 51 hours needs to have clear justification and signoff from the security team
  • clegnitto and joduinn decided on 33% (based on some WAG numbers) as they would rather overshoot than undershoot
  • clegnitto will work with metrics to get hourly ADU reports