Releases/Firefox 3.6.14/Tactical planning

From MozillaWiki
Jump to: navigation, search

Meeting details

  • 10:30 am PST - 11:00 am PST
  • No conf room, phone meeting (couldn't book a room)
  • Phone conference is x297
  • Using #planning for backchannel

Overview

  • Dan caught a crash spike in js_DestroyScriptsToGC (bug 631105)
  • The spike was in 3.6 only, but it looked like it happened right after install and on startup
  • Unclear if users can use Firefox again, though there are a fair amount of crash reports by the same user in a row, which points to being completely busted
  • We backed out bug 599610 and built build #2
  • The spike went down in js_DestroyScriptsToGC (yay!) but up in js_Enumerate (boo!). We tracked the js_Enumerate stuff in bug 633869
  • Dan noticed most / all of the crashes in both had frankenfoxes. He doesn't think the backout did anything, merely moved the signature
  • We started pulling crash data in bug 634343
    • We wanted to know:
      1. If the issue was new (all the way back to 3.6, as we took updater changes in 3.6.9)
      2. If the issue wasn't new, if we made it worse
      3. If the issue wasn't new, what crash-stats curve could we expect during a release so we could turn off updates if it looks like the issue was unexpectedly worse
    • We brought down the Hadoop cluster running the first job, delaying crash processing across Mozilla (whoops)
  • On top of all this, on 2011-02-11 a CSRF issue (fixed in 3.6.14) was made public. Adobe is worried about it being a 0day and wants us to ship quickly

What we know

  • We can't reproduce any of this
  • Frankenfoxes have existed for a while (taking care of #1 in above)
  • It looks like all 3.6 frankenfoxes have newer dlls and older Firefox.exe's. It looks like all 3.5 frankenfoxes have older dlls and newer Firefox.exes
  • If we rebuild 3.6.14 Thunderbird needs to rebuild
  • If we rebuild 3.5.17, Seamonkey needs to rebuild
  • We need to ship soon

What we don't know

  • If the issue wasn't new, if we made it worse. We think the rate is elevated, but don't have the data yet. Also, we have no clue what fix would have caused the crash rate to go up
  • If the issue wasn't new, what crash-stats curve could we expect during a release so we could turn off updates if it looks like the issue was unexpectedly worse

Options available to us

  1. Ship 3.6.14 build #2 and 3.5.17 build #1 (aka do nothing)
    • Pro: Less work
    • Pro: Ship quickly
    • Con: Risk of losing Firefox users if the problem has been made worse by any changes
    • Con: Now that we already have build #3 not sure assorted systems can go back to older build easily
  2. Ship 3.6.14 build #3 and 3.5.17 build #1 (aka do slightly less than nothing)
    • Pro: Less work
    • Pro: Ship quickly
    • Pro: Less risk of making the problem worse as the updater changes have been backed out
    • Con: Still risk of losing Firefox users if the problem has been made worse by non-updater changes
  3. Create 3.6.14 build #4 off of the 3.6.13 relbranch, only containing the CSRF fix. Create 3.5.17 build #2 off the 3.5.16 relbranch, only containing the CSRF fix
    • Pro: Almost no risk of making the problem worse
    • Pro: Additional time to get an information we need from crash-stats
    • Pro: Make Adobe happy
    • Pro: Protect users from a possible 0day
    • Con: Lots more work
    • Con: Thunderbird has to rebuild
    • Con: Seamonkey has to rebuild
    • Con: Longer to ship
    • Con: May get 0day if external reporters are confused about content
    • Con: Kicking the can down the road
    • Con: Other security fixes wait another month
  4. Create 3.6.14 build #4 off of the 3.6.13 relbranch, only containing the CSRF fix. Ship 3.5.17 build #1 as-is w/o disclosing the other fixed issues
    • Pro: Almost no risk of making the problem worse
    • Pro: Additional time to get an information we need from crash-stats
    • Pro: Make Adobe happy
    • Pro: Protect users from a possible 0day
    • Pro: Protect 3.5 users completely
    • Con: Lots more work
    • Con: Thunderbird has to rebuild
    • Con: Longer to ship
    • Con: Kicking the can down the road
    • Con: May get 0day if external reporters are confused about content
    • Con: Other security fixes wait another month

Outcome

  • It was decided the CSRF issue isn't serious enough to build for explicitly, which killed option #3 and #4
  • It was decided to be safe, we'll go with option #2 (even though it is more work and means it takes a bit to release)
  • New schedule created
    • Qualify today, 2011-02-22
    • Go to beta tomorrow 2011-02-23
    • Release to everyone on Tuesday, 2011-03-01
  • Dan verified a manual frankenfox crashes the same way/with the same signatures on 3.6.13 and 3.6.14
    • Further validates the decision for option #2, as we can be confident we aren't taking any updater changes that would make people get into the state more often