User:Armenzg/Proposals/Mozharness changes
DISCLAIMERS:
- Running mozharness from the tree does not increase Hg load since we package it inside of the test zips
- There are various options considered
- This is a draft
Contents
Introduction
mozharness has many very valuable strengths and has made our contribution world many folds easier than contributing to buildbotcustom. However, I believe that there are many things we can still do to make everything much easier. Please bare with me as we visit the various options we have.
This page steams from https://etherpad.mozilla.org/easier-ci. In this page we're trying to focus on the side of executing jobs rather than scheduling and mainly to improve mozharness.
Problems
In order to add new suites we have to touch code in many different repositories, e.g. mozharness in-tree configs, mozharness configs and scripts and at times other releng repositories. Not all job definition information (e.g. run_file_name, target unzip dirs et all) is stored in the mozharness in-tree configs. This makes it difficult to experiment in try. Writing a mozharness script requires learning to write script in a non-pythonesque manner. This adds a big learning curve. Having the mozharness scripts in the mozharness repo prevents from experiment in try. Mozharness has great libraries that cannot be used in-tree. Mozharness has great libraries that cannot be easily re-used in a python like manner.
Story telling time
This narrative should help a bit understand some of the problems.
Imagine that mozharness would call a script in the tree and we take it on from there ("that's nice!"). However, there is a big issue with that, we don't have the libraries in the tree that make mozharness great (reliable downloads, reliable checkouts, virtualenv creation, CI-needed data). It would be nice if mozharness was in the tree, no? No, I don't think so. The reason behind this answer is that mozharness is not written in a way that we can cleanly extend. mozharness has not been written in a python-esque way that allows to re-use functionality.
Proposals
Moved to the Mozharness project page.
Create dummy/generic try support
The dummy/generic script reads from HG web a dummy/generic script (e.g. talos.json). This in-tree script can trigger anything we need. This would require a mozharness script that would follow this logic. We would need to add some try chooser syntax. We would need to use trigger arbitrary jobs API since we probably don't want to add a dummy build to trigger jobs.
- Perhaps we do; I don't know
Value gained:
- It allows for experimentation in the CI without releng intervention
- It can help us test local code in production faster
- We could even experiment new mozharness scripts once we have it working locally
Cons:
- We have to make sure to have a try push to be able to see the job
- We would have to hand craft a request to trigger an arbitrary job
Timeline
For Q4 2014 we would only like to tackle 3.1 (move more configs to the tree) and 3.2 (use a different output parser). At the completion of these two projects we would like to measure again and re-evaluate the gain from the following proposals. During Q4 we might have time to experiment with the following projects just to get a sense of the difficulty and value gained.
Contribution opportunities
In order for mach and mozharness to use similar configuration files, we have to define the right configuration format. I doubt that straight mozharness configs would be good enough. I believe ahal had some ideas in this area. If you would like to find stakeholders and drive this conversation it would be ideal.
Considerations
We have to make sure that whatever solution we move towards we have to take the considerations of Release Engineering with the utmost importance. What are some of those needs? (Please add more and try to make them specific).
- Reliability of the running jobs
- Not impact release scripts
- Not impact merge day scripts
- Not to loose the output needed to communicate between various systems
- Think of Pulse, builds-4hr and others
- Output parsing and proper re-scheduling if needed
- Ease of transitioning to new world
- Does it diminishes future technology changes? (e.g. switching from buildbot to TaskCluster)
- Do not break older release branches
- Do not create more burden on critical services (e.g. hg/ftp)
We also have to have buy in by other Auto Tools members.
- Do we see the value to our day-to-day work?
- What are we clearly gaining?
Another consideration is how to split up the project so we have milestones that we can accomplish in a timely manner. This project could grow too much in scope, drag and not give us the value that we wanted.
Concerns/ideas
- Should we have any special concerns with regards:
- to Pulse?
- to output parsing?
- to requesting re-triggers from the machine?
- Should pulse messages be generated from the machines?