QA/Loop/E2E Coverage
Contents
End to End Test Coverage
This wiki page documents the end to end test coverage of Loop. Making sure TokBox and Loop-server/MSISDN/Push can all handle the load is critical, as one weak link can bring the whole system down.
Meetings
- Triage: need visibility into incoming product/server bugs
Performance Under Load
- how fast is call creation when under load?
- is detection and usage of TURN/STUN slow?
- what bytes/sec are we getting through tokbox?
Server
- Code Complete Status: It's still getting new features and will get them as the software matures. But things have slowed down. We have a /version/ in the URL in case we push a backward incompatible version - but we try as hard as we can to keep the server backward compatible to reduce the number of possible servers/clients combos. (Tarek @ Mozilla)
Actions
- need checklist that everyone is load tested to target (PM numbers will come after beta, we should set our own target)
- coordinate with Tokbox on writing tests that cover interactions with our Loop server and their infrastructure
- We may want to consider doing 1-2 week trains as work has slowed down. This would create a cadence and predictable schedule for other dependent teams to expect server updates. We can always push hot patches as needed.
Automation Coverage
- OWNER: dmose owns the E2E test creation
- STATUS: only setup to work with localhost
- NEXT:
- need to get it working with stage/prod.
- need to investigate if e2e test can be run as a load test
- https://bugzil.la/1064429 - [Loop] Performance logger
- A functional test has already been written (and works) that makes a call between two tabs in the same browser on the same computer, going through a local loop server, and using the production push, tokbox, and fxa infrastructure, if you've already built and configured loop server locally on your computer (http://mxr.mozilla.org/mozilla-central/source/browser/components/loop/README.txt#25).
- This might be enough to make smoketesters lives easier today, if they can get (or get help for) configuring the server on their local machine.
- There are a lot of different pieces that need to happen to make this more useful in different circumstances, both those that are more realistic for testing the live infrastructure, as well as those that are more isolated for running at checkin time.
- Soon (if I'm lucky, tomorrow, but we'll see...), I'll be collating the bugs we've got and filing ones that are missing and getting that info to Maire, so that it's then possibly to coherently drive the different pieces forward.
Tokbox (STUN/TURN/Call Management)
- We meet with the tokbox QA manager weekly.
- We need some sort of checklist on both sides to confirm we’re load tested up to expected requests per sec for the first quarter.
- Requirements: https://bugzilla.mozilla.org/show_bug.cgi?id=1056250
- Status Page: http://status.tokbox.com/ (lacks historical data)
- I believe they have their own load testing system, but we should confirm
- Loop-server load tests exercise TokBox servers, not STUN/TURN
- They’ve done a lot of load testing and have confidence that they can elastically scale with more tokbox servers. We can get specific numbers, if we say we want 200-400 RPS in the first quarter, they can provide detail on how they tested. This is what they do, so they’re just adding another client into their existing infrastructure. One could also argue that if TURN goes down, the 10-20% affected due to firewalls may not be a ‘blocker’.
Cloud Services
With Beta channel release, we'll get real numbers on the percentage of user that will end up using STUN/TURN servers. We project 10-20% of users: firewalled users, net neutrality blocked p2p connections. Please consult this wiki for more information.
WebRTC
Steeplechase is used to programmatically simulate firewalls to exercise STUN/TURN which is used by Tokbox streaming servers when WebRTC & P2P connections fail. However, this is focused on the platform code and never tests directly against TokBox. This runs in it's own environment/CI for testing in various network configurations.
Nils Ohlmeier and Syd Polk can provide more details
STUN
A STUN server is a public server that can be reached by clients that are behind a NAT to learn their public external IP. That's the first step to set up a peer-to-peer communication because you give people your public IP, not your IP local to the network.
NAT
NATs provide features like UDP punching to allow peers to communicate, but some NAT don't allow this.
TURN
A TURN server is a full relay when peers can nagotiate a direct link through the NAT. The whole stream goes through the TURN server, this can be a huge amount of data.
A in-depth explanation: http://www.html5rocks.com/en/tutorials/webrtc/infrastructure/?
Mozilla's Roll-out Plan
In general, Mozilla plans to roll-out Loop starting in Firefox 34 with a "soft start". Most of this has been implemented on the client side (ie. in Firefox code). Further deployment will occur as load capability is evaluated. Please consult this page for more information on this roll-out plan.