Firefox/Channels/Postmortem/57
From MozillaWiki
Post-mortem for the Firefox and Fennec 57 release Release Coordination vidyo room Tuesday, Dec 5th, 2017
Attendees: Sheila, Erin, Kevin, Marcia, Julien, Callek, RyanVM, TomGrab, Nicole, Ritu
Summary
List issues (good and bad) What went well? What didn't? What could we do better?
(syl) First run page super slow on old systems (3 fps) - bug 1417888
- Wasn't tested on slow systems
- not just old systems, the bug report says 100% of the linux population + about 20% of the windows population
- Erin/Nicole can help follow-up, what kind of QA support do we need here? Can we add testing on slow systems to the existing test plan?
- This was tested by SV as part of the onboarding flow: https://public.etherpad-mozilla.org/p/Onboarding_Experience
- would need to add additional testing needed.
- We might incorporate an update plan into the 58/59 time frame
- (Ritu) Too many uplifts during Beta57 cycle - 576
- 56 had 365 uplifts
- 55 had 320
- Great release quality despite the huge code churn
- Was justified due to the scope of Quantum release!
- Good to see more self-awareness
- (Ritu) Nightly and Beta milestones helped teams plan better
- Soft code freeze milestone was appreciated by eng teams
- Add a WNP/first run related milestone
- (Ritu) Each RC uplift was reviewed for second opinion by eng managers/component owners
- may not be a scaleable process
- (Ritu) More aggressive tracking via blocking flag
- Good support from all on reviewing, investigating, fixing blockers
- (Ritu) Amazing efforts to keep untriaged bug backlog within acceptable limits
- may not be a scalable process beyond 57
- Emma's dashboard will help us monitor this going forward
- (Ritu) QA team's feature doc which contained list of features, status, blockers was very useful
- (Ritu) Fennec triage could be improved
- It's hard to make quick progress on blocking bugs
- Difficult to help get second opinion on RC uplifts
- Hard to get developer time to fix issues on the frontend
- Romania, Taipei timezone cycle can be a bit slow, needing nudging from PST owners
- need a resource that is a decision maker for bugs filed (crashes, blocking, etc.)
- Fennec team could benefit from aligning better with Firefox processes
- Fennec triage/component ownership needs to be shared with Firefox team
- awareness of who knows what? resource list.
- for now, ping snorp for platform, nevin for front end. dont know? defer to snorp
- (Ritu) Awesome effort in planning and getting WNP ready during RC week
- In the past this was a huge challenge
- Hoping to get this new process become a repeatable thing for 58/59
- (Ritu) Releng team's infra code freeze helped keep things smooth and predictable
- Callek offered to ping catlee/jlund on whether this needs to happen going forward
- This CF should have been better communicated out to release-drivers mailing list
- (marcia) Engineering lead for the release worked well. Jim Mathies was very responsive and worked to keep the Wednesday triage under control. We should keep a lead in place for each future release.
- Do we need a Fennec release eng lead as well?
- (marcia) Some challenges around tracking Fennec features since they don't use the same process as Desktop - https://docs.google.com/spreadsheets/d/1Rn-F3Kg_1_VznIxxXkAGGL8mVMSAdamZZI4f1O2r8HA/edit#gid=152116571.
- (kbrosnan) laser focus on 57 caused a defocus on 58/59 maybe there should have been dedicated people who were focused on those releases (outside release management)
- (marcia) To add to that, we need to be careful we don't carry nightly regressions into beta. In Fennec we had https://bugzilla.mozilla.org/show_bug.cgi?id=1413500 carried from Nightly into beta
- (Ryan) Part of that issue was late landings due to not communicating out the milestones widely enough
- More EPM support on future release planning and tracking
- Cross functional milestones needed from relman team for future releases and keep the focus going
- (sheila) - Dot release process - decision on dot release and timing really needs to be a joint decision from Product and Engineering. Fine for Release Management to make a recommendation. Document summarizing issues and status was great.
- Relman team should continue reviewing dot release plan, fixes, user impact with senior management
- (sheila) - More formal review process post release. Communication of non-decisions.
- Relman team can build more visibility on issues that were reviewed before we decide unthrottling
- (Ryan) Early rollout to DevEdition went well++
- This was done to speed up the beta staged rollout
- This helps weed out issues sooner and ship desktop beta builds faster
- (kbrosnan) a lot of confusion about how Google Play does throttled releases
- new installs are not guaranteed to get the most recent version. they have the same % chance to get the new version as previous installs
- [julien] should we push the release candidate out earlier at low rollout %age?++
- Fennec staged rollout at 10% on launch day was deemed too slow
- Product recommended going at 25%. This helps us mimic the Desktop release throttling
- Before the dot release is pushed out, the previous release should be at 100% staged rollout.
- [elan] Shield
- We can cover this in my 57 retro but putting ideas here are welcome:
- Need a single dashboard!