Unified Telemetry/Status reports/July 24 2015
From MozillaWiki
Contents
Unified Telemetry status report July 24, 2015
Overall Project Health
Last week: Green
This week: Yellow - r41 is go live for unified Telemetry. All issues triaged and assigned milestones. Dev Team continues to focus on data validation, client side pings being the current blocker under investigated.
Exec Summary
- Validation work hits ping bugs, current blocking issue
- July 30 milestone for first complete pass of data validation, deployment of pipeline scaling work
- Testing plan up on wiki:Telemetry/Testing
- Ongoing planning on FHR V2/V3 historic pipeline migration link to status here.
Risks/Issues
Description of Risks/Issues | State | Owner | Plan to Resolve/Mitigation | Target Date |
---|---|---|---|---|
Investigate gaps in pings | Open | Stuart/Alessio | https://bugzilla.mozilla.org/show_bug.cgi?id=1185123, working doc | 8/04 |
Data integrity between V2/V4 and V4 internal data consistency | Open | Brendan/Sam | Investigation in progress. Added resources (Sam). https://etherpad.mozilla.org/fhr-v4-validation | 7/30 |
Data continuity across V2/V4 | Open | Katie/Mark/Trink | Plan, Metabug | 7/30 |
Legal review | Open | BDS/Legal | Meeting between groups | 8/04 |
QA sign off (functional, load) | Open | Stuart | Telemetry/Testing | 8/04 |
Operations - data retention requirements | Open | Travis/Katie | Eng team owes ops a doc defining ping types and data retention requirements | 8/04 |
Operations - analysis tools & microservices | Open | Travis/Mark/Roberto | Architecture/Data flow diagram | 8/04 |
Data loss incident | Fixed | mreid/whd/trink | Tee server needs to return error status from old or new. Added Ops resources (Daniel Thornton). | 7/15 |
Remote about:healthreport content | Open | Katie/Georg | Made a request to Laura Thomson for help | 8/04 |
Budget, size of UT pings | Open | Mark/BDS | https://bugzilla.mozilla.org/show_bug.cgi?id=1182693 | 8/04 |
Analysis difficulty | Open | Katie/tbd | No plan yet, aside from ongoing work on tools | 8/04 |
Accomplished for Last Period
Engineering & Ops
- Initial Databricks investigation: not useful to Perf Team, metrics team/Katie to decide next week if it suits our purpose.
- Aggregation work up in stage, needs testing
- Client work: Spreadsheet
- Data validation
- Missing pings doc
- Generated v4 data set with complete set of pings from all clients seen on nightly: https://bugzilla.mozilla.org/show_bug.cgi?id=1171265#c24
- Work on missing subsessions analysis (hints at a client bug): https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
- Pipeline scaling work
- Finished distributed aggregation work started at workweek: https://github.com/mozilla-services/data-pipeline/pull/93
- Deployed next round of changes
- Telemetry tools and microservices
- Work on memory footprint of the Spark jobs: https://bugzilla.mozilla.org/show_bug.cgi?id=1182499
- Kickoff meeting for deployment plan for telemetry tools and microservices: Architecture flow diagram
QA
- Investigate client QA automated test scripts
- Update test wiki
- work with softvision to prepare for RC pass
Project management
- meetings, emails, hand waving
Planned for Upcoming Period
Engineering
- Client
- Do code reviews for deletion pings and choices info bar
- Continue Pending ping cleanup
- Continue Investigate count discrepancies between "main" pings and "saved session" pings
- Pipeline
- Continue with scaling work
- Monitoring work for Telemetry data
- Investigate executive stream discrepancies
- Bug fixes
- Data validation
- Working on 100k-client paired v2/v4 pings from early June to early July
- validation efforts (main vs saved-sessions, ending subsessions pings, broken chaining)
- Deep dive on missing subsessions as it may indicate a client bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1171268
- Data continuity
- Document strategy for executive dashboards with v2 + v4 data
Ops
- aggregate pipeline available in staging, needs testing
QA
- closing bugs
- continue test suite creation
- finalizing long term QA engagement (softvision engagement, tooling asks for CI loop based testing)
Project Management
- Finish triage of bugs
- remainder of release tasks scheduled
Outstanding requests not yet road mapped into a release
Description | State | Owner | Plan to Resolve/Mitigation | Target Date |
---|---|---|---|---|
FireFox OS - app pings | Open | Katie | Need to schedule and understand impact on project | TBD |
histograms for loop/hello | Open | Katie | Need to schedule and understand impact on project | TBD |