ReleaseEngineering/Buildduty/SVMeetings/Oct19-Oct23
2015-10-19 - 2015-10-20
[alin] Phil noticed that “t-w864-ix-192” and “t-w864-ix-193” were failing each talos job due to the fact that they hadn’t been added to GraphServer, so he disabled them. in the logs we can see errors like: CRITICAL - FAIL: Graph server unreachable (5 attempts) INFO - RETURN:send failed, graph server says: CRITICAL - RETURN:No machine_name called 'T-W864-IX-192.e' can be found did some investigations and noticed that the records from data.sql file are not up-to-date. what seems odd to me is that talos jobs don’t fail on other slaves that are also not added to GraphServer Q: should I create a patch to add the remaining Windows slaves to GraphServer? Yes please.
2. https://treeherder.mozilla.org/#/jobs?repo=try&revision=f57ad0d4fd2a compared the results from yesterday to the results obtained by Vlad last week: https://hg.mozilla.org/try/rev/6b4a05498515 → debug test marionette: orange on both → [opt & debug] test mochitest-gl: orange on both → several [opt & debug] test mochitest-devtools-chrome jobs are still orange → debug test web-platform-tests[1-8]: all green on my push, we had several orange ones last week → most of test web-platform-tests-e10s are green on the latest push
My push: total jobs: 100 success: 86 warnings: 12 exceptions: 2
Vlad’s push: total jobs: 100 success: 71 warnings: 29
3. today I was notified in #buildduty channel that build API has problems checked https://secure.pub.build.mozilla.org/buildapi/ and https://api.pub.build.mozilla.org/clobberer/ , both seemed to be working at one point, https://secure.pub.build.mozilla.org/buildapi/ became unaccessible but it recovered after that Ryan Watson (:w0ts0n) said that the laod balancer is stable at the moment and he will do some investigation → ask in #moc or #it Q: what is the approach for us in these cases? Is there something we can do to solve the issue/ who should we address the issue to?
4. https://bugzilla.mozilla.org/show_bug.cgi?id=1204970 (modify check_pending_builds to report more granularly on pending builds/tests) Armen is working to produce a new allthethings.json file noticed that the current version of the file is already different than the one from last week (we have 15202 builders right now vs 10014 last week), also, it seems to be up-to-date at the moment, allthethings.json contains 64 different platforms: https://pastebin.mozilla.org/8849877 so I’ll need to group these into larger and more relevant categories (if you have suggestions, please feel free to mention them) So the pending counts we really care about here here https://secure.pub.build.mozilla.org/builddata/reports/slave_health/ We just need to have them as an alert. From your pastebin you can strip out all of text after the platform i.e. linux64-asan-debug is just linux64-asan-debug, android-api-9-debug is just android-api-9 and so on...
5. https://bugzilla.mozilla.org/show_bug.cgi?id=1214529 Gregor Wagner (:gwagner) asked for help in this case once the pine repo is reset, he needs to run all the b2g jobs on try, I noticed that I can restrict the builders like that: try: -b do -p emulator,emulator-kk,emulator-x86-kk -u xpcshell -t none I am not sure how I can do this on pine, as I did not find those annotations for example: “emulator-kk” for B2G KK Emulator [opt & debug] builders
2015-10-21 - 2015-10-22
[alin] opened bug 1216904 to add the missing Windows slaves to GraphServer and uploaded the patches I looked over bug 1209669 and noticed that many of the slaves that are listed in graphserver.txt have already been added to graph DB in that bug. Also noticed that both the simple values and the e10s values of the slaves have been added (for example: "t-w732-ix-195" and "t-w732-ix-195.e"), while data.sql contains only the the e10s values I computed a difference txt file that only adds the missing e10s slaves (I haven’t uploaded it yet). If you take a look at the txt file on bug 1209669, it follows that the sql statement for the last slave (“t-w864-ix-194”) does not end properly (missing “;” at the end), so I don’t know if the slave has been added or not. talk to db team and get them to add missing entries
2. https://bugzilla.mozilla.org/show_bug.cgi?id=1217064
started working on this bug with Pete Moore. Pete was able to trigger a decision task bug marked as resolved
3. allthethings.json seems out-of-date
Number of pending builds: 6283 Total number of builders: 15677 THAT IS ALL FOR NOW!!!! Computed number of pending jobs 6281 b2g_ash_macosx64_gecko nightly b2g_ash_win32_gecko nightly Contor: 2
[vlad]
https://wiki.mozilla.org/ReleaseEngineering/Buildduty/Nagios#How_do_I_interact_with_the_nagios_IRC_bot.3F http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/cgi-bin/status.cgi?host=all&servicestatustypes=28&hoststatustypes=15&serviceprops=8202&hostprops=8202
Kim will talk to Amy about how to acquire access for Vlad/Alin Resolved ticket https://github.com/mozilla/build-relengapi/issues/281 re-imaged the following slaves, regarding to this bug https://bugzilla.mozilla.org/show_bug.cgi?id=1200180 : t-w864-ix-158 t-w864-ix-043 t-w864-ix-025 update 10.10.5 confgis with patch to enable r7 on trunk and disable r5 on trunk, kim will update bug with example
catlee signed the graph server gpg files for you :-)