Unified Telemetry/Status reports/July 17 2015
From MozillaWiki
Contents
Unified Telemetry status report July 17, 2015
Overall Project Health
Green - r41 is go live for unified Telemetry. All issues triaged and assigned milestones. Dev Team continues to focus on data validation.
Exec Summary
- Client work delayed this week by sick time. Send logic and a few other changes planned for uplift to Aurora & Beta next week. Remaining work for 41 waiting for reviews.
- July 30 milestone for first complete pass of data validation, deployment of pipeline scaling work
- Testing plan up on wiki:Telemetry/Testing
- Ongoing planning on FHR V2/V3 historic pipeline migration link to status here.
Risks/Issues
Description of Risks/Issues | State | Owner | Plan to Resolve/Mitigation | Target Date |
---|---|---|---|---|
Data integrity between V2/V4 and V4 internal data consistency | Open | Brendan/Sam | Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation | 7/30 |
Data continuity across V2/V4 | Open | Katie/Mark/Trink | Plan, Metabug | 7/23 |
Legal review | Open | BDS/Legal | Meeting between groups | 8/04 |
QA sign off (functional, load) | Open | Stuart | Telemetry/Testing | 8/04 |
Operations - data retention requirements | Open | Travis/Katie | Eng team owes ops a doc defining ping types and data retention requirements | 8/04 |
Operations - analysis tools & microservices | Open | Travis/Mark/Roberto | Architecture/Data flow diagram | 8/04 |
Data loss incident | Fixed | mreid/whd/trink | Tee server needs to return error status from old or new. Added Ops resources (Daniel Thornton). | 7/15 |
Remote about:healthreport content | Open | Katie/Georg | Made a request to Laura Thomson for help | 8/04 |
Budget, size of UT pings | Open | Mark/BDS | https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 | 8/04 |
Analysis difficulty | Open | Katie/tbd | No plan yet, aside from ongoing work on tools | 8/04 |
Accomplished for Last Period
Engineering & Ops
- Heka 0.10.0 beta released
- Client work: Spreadsheet
- Not uplifting recent send logic changes to Beta (needs more bake time for confidence)
- Uplifting a few patches around the send-logic ([uplift2], http://bit.ly/1Je45UA) to Aurora as soon as the send-logic impact is verified
- Remaining client work ([uplift3], http://bit.ly/1TCl4r8) for 41 is manageable and either blocked by info requests or review
- Data validation
- Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24
- Work on missing subsessions analysis (hints at a client bug): https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
- Pipeline scaling work
- Finished distributed aggregation work started at workweek: https://github.com/mozilla-services/data-pipeline/pull/93
- Deployed next round of changes
- Telemetry tools and microservices
- Work on memory footprint of the Spark jobs: https://bugzilla.mozilla.org/show_bug.cgi?id=1182499
- Kickoff meeting for deployment plan for telemetry tools and microservices: Architecture flow diagram
QA
- test cases, bug closing
Project management
- meeting, emails, hand waving
Planned for Upcoming Period
Engineering
- Client
- Do code reviews for deletion pings and choices info bar
- Pending ping cleanup
- Investigate count discrepancies between "main" pings and "saved session" pings
- Pipeline
- Continue with scaling work
- Monitoring work for Telemetry data
- Investigate executive stream discrepancies
- Bug fixes
- Data validation
- Join corresponding v2 data to v4 nightly clients data set
- Continue writing callbacks that look at other measures
- Breadth first, do a first pass at most validations and flag big issues
- Deep dive on missing subsessions as it may indicate a client bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
- Data continuity
- Document strategy for executive dashboards with v2 + v4 data
Ops
- data bricks investigation (big jobs on big clusters) - cost, resourcing etc
QA
- closing bugs
- test suite creation
- finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)
Project Management
- Finish triage of bugs
- remainder of release tasks scheduled
Outstanding requests not yet road mapped into a release
Description | State | Owner | Plan to Resolve/Mitigation | Target Date |
---|---|---|---|---|
FireFox OS - app pings | Open | Katie | Need to schedule and understand impact on project | TBD |
histograms for loop/hello | Open | Katie | Need to schedule and understand impact on project | TBD |