Breakpad/Status Meetings/2015-08-19
From MozillaWiki
< Breakpad | Status Meetings
« previous meeting — index – next week » create?
Contents
Meeting Info
Breakpad status meetings occur on Wed at 11:00am Pacific Time.
Conference numbers:
Vidyo: Stability 650-903-0800 x92 conf 98200# 800-707-2533 (pin 369) conf 98200#
IRC backchannel: #breakpad
Mountain View: Dancing Baby (3rd floor)
Operations Updates
- stage is working again
- fair and balanced submitter is running
- 10% of what we receive is now going to stage
- adi is coming in now
- fair and balanced submitter is running
- JSON problems reoccured in prod
- stage has a patch the filters keys as well as values
- manually submitting a problematic crash leads us to believe stage is fixed
- write out the null bytes
- updating supersearchfields is not working
- adrian working on a patch
- datadog runs on UDP
- we've been losing windows of time
- webapp has been registering as down to pingdom/nagios
- high response time requests from the webapp, but they are serving requests in those windows
- can we get newrelic to look?
- talk to Travis
- we need 8 hosts for a while
- moztrap was seeing this
- cache was acclimating pingdom to fast responses
- whole cache was invalidating at once
- we are not actually down in these times, may just have higher response times above the timeouts
- the only thing more dangerous than no alerting is noisy alerting
- pingdom accounts for the team?
- going to take approx 1 month to set us up for pingdom on the mozilla account
- sentry is down
- cannot ingest
- should be back by the end of the day
- loggly
- django errors don't end up in syslog, they don't go to stdout
- maybe we don't want to continue with loggly
- the features it has over other log aggregators are not that useful
Other
- How many EC2 webhead nodes do we have to prod?
- Wanna scale that down?
Project Updates
Socorro Bug Tracker
this week's bugs
Deployment Triage
PR Triage
QA
- Help tracking down an intermittent failure
- ReadTimeout: HTTPSConnectionPool(host='crash-stats.mozilla.com', port=443): Read timed out. (read timeout=10)
other business
- We're ok with exposing ADI
- The Raw Crash JSON has weeeeird keys - https://crash-stats.mozilla.com/admin/supersearch-fields/missing/
- how many EC2 webheads do we have, and do we want to scale it down?
- pausing until we figure out the downtime reporting stuff
- later this week there will be a PR for collector2015
- changes collector system to use rule sets like the processor
- will make it easier to accept non-binary crashes, ravenjs style, etc
Travel, etc
- adrian AFK next week
- lars is hiding in the water