-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build/cmd/relui: advisory TryBots can handle or avoid timeouts better #60443
Comments
CC @prattmic. |
Change https://go.dev/cl/503516 mentions this issue: |
This reverts CL 503758. This newly added longtest builder isn't as fast as the others, and ends up running into the same problem as advisory trybots did in issue 60443, that there may be periods without logging that exceed the watchdog limit of 10 mins. Rather than bumping the watchdog timer to a value we can't predict would be enough, revert this and try it again later. For golang/go#60443. Change-Id: I42e9eaee8643cc2eb2e077bc3010663b62b94f2b Reviewed-on: https://go-review.googlesource.com/c/build/+/503516 Reviewed-by: Heschi Kreinick <[email protected]> Auto-Submit: Dmitri Shuralyov <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Dmitri Shuralyov <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]>
Change https://go.dev/cl/504523 mentions this issue: |
Longtest builders are known to apply a scale factor on the default go test timeout (10 minutes). Do the same in relui. For golang/go#60443. Change-Id: If0e7c21fdf7916745ce65db0c73a730091a506f7 Reviewed-on: https://go-review.googlesource.com/c/build/+/504523 Auto-Submit: Dmitri Shuralyov <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Run-TryBot: Dmitri Shuralyov <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Heschi Kreinick <[email protected]>
It's interesting to check if this case still happens with any regularity by now. It might not be since we started scaling the watchdog delay on longtest builders (CL 504523). |
Change https://go.dev/cl/541380 mentions this issue: |
Give up on distinguishing between infrastructure failures and test failures. At the end of the day someone has to look at the errors either way. At the same time, don't treat context errors as successes. We can't afford to assume that timeouts and stuff aren't related to the tests we're running. Even if we could, we'd be missing coverage. For golang/go#63147. For golang/go#60443. Change-Id: I5fe78616b4c8f85b6b1e901b509d291884cf9a27 Reviewed-on: https://go-review.googlesource.com/c/build/+/541380 Auto-Submit: Heschi Kreinick <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]>
Advisory TryBots are causing much less friction compared to previously now that #57725 has been resolved. Filing this for some followup improvements that we can apply to how they handle timeouts.
Consider the following logs from
linux-amd64-longtest-race
anddarwin-amd64-12_0
advisory trybot runs that timed out after all 3 tries:Timeout after one slow package test
Timeout after one slow buildlet creation
There are at least two areas for improvement visible.
First, it timed out because the single "index/suffixarray" tests happen to sometimes take longer than the default watchdog delay (currently 10 minutes), and so the watchdog mechanism stopped it. This can be avoided by injecting our own watchdog reset every 5 minutes as we do in some other tasks that run for a long time. (Increasing the watchdog delay is also a viable workaround.)
Second, the trybot attempts 2 and 3 don't end up actually trying because they start out with a canceled context. So there's effectively only one attempt. Issue #60444 may be relevant to this.
Filing for tracking purposes; it's not a high priority. CC @golang/release.
The text was updated successfully, but these errors were encountered: