-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Azure Pipelines] reduce Edge parallel jobs from 20 to 10 #18448
Conversation
Sent this PR because I spotted the outdated comment while doing other things today, but I think it's probably best to resolve #18397 before landing this, as it might introduce new ways of failing. |
9ad77ff
to
e31aa28
Compare
I've started a full run of Edge Dev and Canary in https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=27491 to see how long it takes and if the results are affected. |
Diff between the runs with 20 and 10 jobs:
There are some shared regressions there that probably aren't because of flakiness, but rather because of test order dependence. There are tests that are fixed by fewer shards, but not as many as the regressions. This makes sense as more jobs means more isolation and less chance for state interference. More interestingly, the overall run time is slower now. It looks like we actually have 15 agents now, so more capacity than I thought. @mustjab what amount of parallelism do you think we should use? Just leave it at 20 and update the comment? |
e31aa28
to
79c01fd
Compare
@foolip we have now allocated 20 VMs for Windows pipeline and can increase number to 20, if you think it will improve job stability. |
@mustjab the number of jobs is already 20, in this PR I tried to reduce the number since it seemed unnecessary / based on the needs for EdgeHTML. Since we're now running both Edge Dev and Canary, the number of VMs to run them all at once would be 40. I checked https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=27966 and it looks like what happened is that first all the Canary jobs ran, and then the Dev jobs started as the Canary jobs finished and made VMs available. The net effect is that Dev started and ended about an hour later than Canary. Reducing the number to 10 would mean that they all start at the same time, but take about twice as long to finish. In the end I don't think it matters all that much, but decreasing the number of jobs to match the available VMs is probably makes more sense. So, consider this open for review. |
20 was chosen to make each job of a full EdgeHTML run fast enough, but with Chromium-based Edge each job now finishes in <1h. Each job has some overhead, so decrease the number of jobs to 10.
79c01fd
to
9674814
Compare
Started https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=28010 to see how this will look now with 20 VMs. |
These are the runs: https://wpt.fyi/results/?run_id=306270015&run_id=273340005 They took 1.7 and 1.8 hours to run, so looking good. I'll merge this and check if the scheduled runs then also look good. |
The first set of aligned runs after this change: Edge took 1.7/1.8h, which is also the time Safari took, so this seems pretty good. |
Diffing the master runs before/after this change also show no dramatic change in results: |
However, in the second scheduled run after this landed, Edge Canary filed. I've filed #18583 and suggest reverting this if it happens again. |
…rm-tests#18448) 20 was chosen to make each job of a full EdgeHTML run fast enough, but with Chromium-based Edge each job now finishes in <1h. Each job has some overhead, so decrease the number of jobs to 10.
20 was chosen to make each job of a full EdgeHTML run fast enough, but
with Chromium-based Edge each job now finishes in <1h. Each job has some
overhead, so decrease the number of jobs to 10.