Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure Pipelines] reduce Edge parallel jobs from 20 to 10 #18448

Merged
merged 1 commit into from
Aug 20, 2019

Conversation

foolip
Copy link
Member

@foolip foolip commented Aug 15, 2019

20 was chosen to make each job of a full EdgeHTML run fast enough, but
with Chromium-based Edge each job now finishes in <1h. Each job has some
overhead, so decrease the number of jobs to 10.

@wpt-pr-bot wpt-pr-bot requested a review from jgraham August 15, 2019 07:55
@foolip foolip requested review from mustjab and thejohnjansen and removed request for jgraham August 15, 2019 10:08
@foolip
Copy link
Member Author

foolip commented Aug 15, 2019

Sent this PR because I spotted the outdated comment while doing other things today, but I think it's probably best to resolve #18397 before landing this, as it might introduce new ways of failing.

@wpt-pr-bot wpt-pr-bot requested a review from jgraham August 15, 2019 10:10
@foolip foolip force-pushed the foolip/azure-edge-parallelism branch from 9ad77ff to e31aa28 Compare August 16, 2019 03:58
@foolip
Copy link
Member Author

foolip commented Aug 16, 2019

I've started a full run of Edge Dev and Canary in https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=27491 to see how long it takes and if the results are affected.

@foolip
Copy link
Member Author

foolip commented Aug 16, 2019

Diff between the runs with 20 and 10 jobs:

There are some shared regressions there that probably aren't because of flakiness, but rather because of test order dependence. There are tests that are fixed by fewer shards, but not as many as the regressions. This makes sense as more jobs means more isolation and less chance for state interference.

More interestingly, the overall run time is slower now. It looks like we actually have 15 agents now, so more capacity than I thought. @mustjab what amount of parallelism do you think we should use? Just leave it at 20 and update the comment?

@mustjab
Copy link
Contributor

mustjab commented Aug 19, 2019

@foolip we have now allocated 20 VMs for Windows pipeline and can increase number to 20, if you think it will improve job stability.

@foolip
Copy link
Member Author

foolip commented Aug 20, 2019

@mustjab the number of jobs is already 20, in this PR I tried to reduce the number since it seemed unnecessary / based on the needs for EdgeHTML.

Since we're now running both Edge Dev and Canary, the number of VMs to run them all at once would be 40. I checked https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=27966 and it looks like what happened is that first all the Canary jobs ran, and then the Dev jobs started as the Canary jobs finished and made VMs available. The net effect is that Dev started and ended about an hour later than Canary.

Reducing the number to 10 would mean that they all start at the same time, but take about twice as long to finish.

In the end I don't think it matters all that much, but decreasing the number of jobs to match the available VMs is probably makes more sense. So, consider this open for review.

20 was chosen to make each job of a full EdgeHTML run fast enough, but
with Chromium-based Edge each job now finishes in <1h. Each job has some
overhead, so decrease the number of jobs to 10.
@foolip
Copy link
Member Author

foolip commented Aug 20, 2019

Started https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=28010 to see how this will look now with 20 VMs.

@foolip
Copy link
Member Author

foolip commented Aug 20, 2019

These are the runs: https://wpt.fyi/results/?run_id=306270015&run_id=273340005

They took 1.7 and 1.8 hours to run, so looking good. I'll merge this and check if the scheduled runs then also look good.

@foolip foolip merged commit 4b0d632 into master Aug 20, 2019
@foolip foolip deleted the foolip/azure-edge-parallelism branch August 20, 2019 21:02
@foolip
Copy link
Member Author

foolip commented Aug 21, 2019

The first set of aligned runs after this change:
https://wpt.fyi/results/?run_id=300520016&run_id=290970009&run_id=296660009&run_id=285010009&run_id=290980002&run_id=283450002&run_id=276780005&run_id=283400010

Edge took 1.7/1.8h, which is also the time Safari took, so this seems pretty good.

@foolip
Copy link
Member Author

foolip commented Aug 21, 2019

Diffing the master runs before/after this change also show no dramatic change in results:
https://wpt.fyi/results/?diff&filter=ADC&run_id=286810005&run_id=296660009
https://wpt.fyi/results/?diff&filter=ADC&run_id=276760011&run_id=285010009

@foolip
Copy link
Member Author

foolip commented Aug 21, 2019

However, in the second scheduled run after this landed, Edge Canary filed. I've filed #18583 and suggest reverting this if it happens again.

@thejohnjansen
Copy link
Contributor

@foolip thanks for the head's up. @mustjab is on a vacation right now, but he'll take a look when he gets back if this continues to fail.

natechapin pushed a commit to natechapin/wpt that referenced this pull request Aug 23, 2019
…rm-tests#18448)

20 was chosen to make each job of a full EdgeHTML run fast enough, but
with Chromium-based Edge each job now finishes in <1h. Each job has some
overhead, so decrease the number of jobs to 10.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants