Sheriffing/How To/Retrigger Jobs

From MozillaWiki
Jump to: navigation, search

Sometimes builds and tests need to be retriggered. For some classes of automation/infrastructure failures, this happens automatically and the job is marked in the Treeherder UI as dark blue. For other cases, or if you're doing investigative work e.g. testing for an intermittent failure, you'll need to retrigger the job manually. Retriggering a job will cause the certain test to be run again.

Tasks which belong to the release process to actually ship a build to users must not be retriggered or backfilled else the task chain will break. These tasks can be found e.g. on beta and release trees. Examples are the "UV" or "Snap" tasks. If a task name at the bottom left of treeherder starts with "release-", it's a release task. If you are unsure about it, ask in Firefox CI on Matrix.


Manual retriggers

  • Select a job result in Treeherder and click on it.
  • This will display a results pane in the left bottom corner with information like Job, Machine, Task, etc.
  • To retrigger this job/test, click on the circular arrow with the mouseover text of "Repeat the selected job" at the top of the results pane and the job will be retriggered. You can accomplish the same thing by simply pressing "r" in the results pane when you are logged in to Treeherder.
Retrigger ll.jpg
  • If you want to manually retrigger multiple jobs, you can add them to the pinboard and click on retrigger all:
Rt all.jpg

Backfills

Backfilling runs the job on the previous pushes (at the moment: 5) where it often didn’t run (regarded as not necessary or to save resources). To backfill, you select a job and from the panel you click on the “...” next to the retrigger button, and choose the first option:

Backfill.jpg

This comes in handy when you want to determine from which push a certain failure has started (ex: for backouts).
If you want to backfill on a certain number of pushes, click on “...” and then on Custom Action:

Custom action.jpg
  • Choose backfill and change the depth to your number of choice:
Backfill details.jpg
  • And trigger.

Retriggering Nightly Builds

If new Nightlies have to be requested - be it for a backout or because the merge had to be later than expected - wait for the normal 'Gecko Decision Task' to finish and request the Nightlies after that. Due to the new 'shippable' builds, the Nightlies will create far less jobs and reuse the already running shippable builds.


Nightly builds run at 12:00 / 01:00 AM/PM RO time (10am/pm UTC) so if we don't succeed in doing merges to central before that, nightly builds will be automatically scheduled for the last push to central before that time if there have not been Nightly builds already for that push (scheduled 12 hours before).

Note: We only respin nightlies if we miss them by a few minutes or if we need to get something into the next nightly (for example: fixes for crashes). If they have been running for more than half an hour, we won’t respin them.

Steps
Cancel running undesired Nightly tasks on older push
  • Open mozilla-central and type "nightly" in the upper right search box, also select the running jobs (gray) and deselect the rest
Nightly filter.jpg
  • Scroll down to the last merge, you will see "N" builds running
Nightly running.jpg
  • Pin all the jobs and cancel them
Nightly pin.jpg
  • Pin all nightlies and select “Clear”
Request new Nightlies
Retrigger running.jpg
  • Scroll down and click Trigger hook, a pop-up will be displayed, click Trigger Hook again
Trigger hook.jpg

CAVEAT: there are implications to triggering too many Nightly builds in a single day or in quick succession. Please talk with a sheriff first before retriggering Nightly builds.

How to bulk retrigger build bustages a push at a time

Please note that this will run all failed jobs again, not only build bustages!

Prerequisites:

Step 1: Run pip install taskcluster to install a taskcluster component from pip install taskcluster

      If you get the message "The program 'pip' is currently not installed" then you have to install it by running:
           a.  sudo apt install -y python-pip 
           b.  when python-pip install is completed, run pip install taskcluster 

Step 2: Save the file and make it executable by running the command: sudo wget https://hg.mozilla.org/build/braindump/raw-file/default/taskcluster/tc-filter.py -P /usr/local/sbin/ && chmod +x /usr/local/sbin/tc-filter.py

Step 3: Re-sign in with the taskcluster tool if you were already signed in

How to use:

  1. Set the url of the taskcluster instance in which the failing tasks ran (there is also a community taskcluster instance which doesn't get sheriffed): export TASKCLUSTER_ROOT_URL=https://firefox-ci-tc.services.mozilla.com
  2. sign in with the taskcluster tool ( eval $(taskcluster signin) | if you were not already signed in)
  3. run tc-filter.py --state failed --action rerun --graph-id geckoDecisionTaskTaskId

Note: Replace geckoDecisionTaskTaskId with the task id being shown on the bottom left when you click on the gecko decision task for the push with the failures.

Rerunning build bustages

How to install taskcluster CLI

This tool is needed in order to retrigger some build jobs, especially nightly builds. Download Taskcluster CLI on Ubuntu from https://github.com/taskcluster/taskcluster-cli

From your /home/user folder (or the location where mozilla-unified is stored), run the following commands:

  1. sudo wget https://index.taskcluster.net/v1/task/project.taskcluster.taskcluster-cli.latest/artifacts/public/linux-amd64/taskcluster -P /usr/local/sbin/
  2. sudo chmod +x /usr/local/sbin/taskcluster

The tool is now installed and made executable in /usr/local/sbin/.

How to use Taskcluster CLI
  1. From the terminal, run the command: eval $(taskcluster signin). This tool will only work as long as the terminal remains open.
  2. When the browser page opens, login using LDAP
  3. Click Create a new clientId and go to the end of the page, then click Create Client.
  4. Wait a few seconds, then close the browser.
  5. In the console, the following message should appear: Credentials output as environment variables.
  6. Run taskcluster task rerun TASK_ID (take the TASK_ID from the job summary – go to Treeherder, click the job and on the left side of the window you have Task)
  7. After following these steps, the console should output either running or pending.