Firefox/Channels/Postmortem/61

From MozillaWiki
Jump to: navigation, search

Notes for 61 post mortem

10:00am PDT Tuesday July 17 (after the Channel Meeting) Vidyo channel: release coordination IRC: #release-drivers

Some stats

What went well?

  • Shortest post-Dawn Beta cycle (and an All Hands thrown in for good measure), went very smoothly!
  • Good responsiveness from engineering teams on picking up unassigned bugs and getting patches landed before the last minute.
  • Good feedback from Marketing on release notes
  • We shipped RDL! \o/

What's new page

  • Deployment issues meant that the live WNP wasn't ready for testing until 4am UTC on the 26th (i.e. release day)
  • Confusion with QA about expected vs. actual results (compounded by the rollout issues)
    • when should it be ready/deployed? Does it need to be on merge day after the merge or can it be tested earlier during RC week?
    • during RC week on release-localtest
      • We skipped localtest update testing to move faster (going directly to cdntest channel testing)
    • can we better clarify the timeline and QA expectations?
    • Let's take this back to the WNP team and figure out why the timelines are not being met
    • (liz) I will check with them and add items to the RC week/release checklists and the milestones doc

Avast/AVG TLS 1.3 issue (bug 1468892)

  • Reported against 61.0b14 on June 14 (12 days prior to release)
  • Engineering involved June 18
  • Tracking requested on June 26 by Philipp (!!)
  • Normandy recipe disabling TLS 1.3 for all Win7 users and Avast/AVG users on Win8+ deployed June 28 (still active)
  • Anti-virus affects Firefox Windows almost every release, FF 60 was Kaspersky and its a constant pain over the last 9 years, can Product Integrity proactively test AV with RC builds and work with AV folks somehow? <-- [roland] I know it's a hard problem, how can SUMO help?
    • QA doing some testing now, complicated due to number of configurations
    • Can we automate some of this?

Fennec AllocInfo::Get<T> topcrash (bug 1468541)

  • First spiked in 61.0b12
  • Linked to Snapdragon 820/821 CPU quickly
  • Difficult to reproduce and assign an owner to
  • Temporarily throttled updates @ 1% before eventually going to 99% (was blocking Galaxy S8 crash fix)
  • Still unclear what happens next, no signs of crash on Beta62 so far
    • [marcia] We should be sure that the signature didn't morph into something else in 62
      • new sdk work may affect this, signatures may have shifted (for 63)
  • Getting affected device directly to the developer is key (in this case nchen)

Late uplifts

  • Successful fix for the Galaxy S8 crashes (bug 1460989) \m/
  • Last-minute Android networking feature disabling due to download manager regression (bug 1467755)
    • Bug reported Friday before SF All Hands, investigation delayed due to it
  • WebRTC sec bug (bug 1458048) needed late uplift and RC respin because fix landed upstream without coordination and uplift wasn't requested in a timely fashion

61.0.1 drivers

  • Bug 1471375 - Reports about missing activity stream content on new tab page and about:preferences#home panel
    • Reported on 6/26 (go-live day)
    • The root cause of this was users with corrupted IndexedDB databases
    • The patch landed allowed AS to continue being functional, but doesn't fix the underlying IDB bug
    • The AS team did their own post-mortem for this: Meeting Notes
  • Bug 1472127 - Update to Firefox 61.0 killed all my bookmarks and also the backups are unreadable
    • Reported on 6/29 (post go-live)
    • Root cause was new migration code shipped in 61 removing bookmarks with wrong parents (which is an erroneous condition from the start)
    • Can we talk to Mak/QA about doing more ongoing testing in later release cycles?
    • This is an issue that has occurred in the past. Tom has offered to look into smoke testing this more thoroughly going forward, if possible.
    • Examples from prev releases: 1388584 (in 55.0.1), 1206376 (41.0.1), 1206376 (44.0.2)
  • Bug 1472137 - Crash in [@ IPCError-browser | ShutDownKill] in mozilla::mscom::Interceptor::~Interceptor()
    • Reported on 6/29 (post go-live)
    • Also manifested for Chinese users as having an unusable browser (bug 1471824)
      • Regressed by bug 1364624 uplifted to beta (along a couple of other shutdownhangs - see below)

61.0.1 notable ride alongs

  • Various crash fixes from Windows SRWLOCK change (landed in 61.0b6, not noticed until after we shipped)
  • Fix for Windows download issues exposed by sec uplift (bug 1465458)
  • Twitch 1080p playback fix (bug 1469257)
    • Filed during RC week, didn't have a verified patch ready in time for RC uplift

Normandy hotfix deployments/rollouts

  • Avast TLS 1.3 issue (bug 1471672)
    • Down to 50% now
    • Technical problems with the recipe - should Developers get training on how to create recipes?
    • TomGrab mentioned issues with QA testing due to recipe misconfigurations. how to target windows versions, etc
    • Pro/cons of default vs. user branch rollouts
  • OSX 10.9 OMTP crashes (bug 1472308)
    • Still troubleshooting issues with this one
  • HTTP throttling v2 algorithm (bug 1462906)
  • RDL was also rolled out to release via Normandy
  • TLS 1.3 fallback
    • (Ritu) We may need to refine, review, share the Normandy rollout process (where to file the bug, who creates the recipe, reviews it, intent to ship, QA testing etc)