-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: test frequently times out after 10 minutes on linux-amd64-longtest builder #25886
Comments
I bumped the timeout to 6 minutes in CL 115016. Either I did it wrong or it's not deployed yet. |
p.s. I thinks this should be labeled "builders". |
Ah, I forgot to remove the Then yes, this seems like it should be labelled Builders. Do you want to keep this issue open to track the deployment/debugging of your fix? |
I guess we could do this to avoid someone else opening another one : ) Once the version with the new time limit is deployed we can close. |
We're closer - now it only fails most of the time. When it does fail, it seems like the usual culprit is:
|
Those timeouts seem to be fairly frequent. Two recent examples:
|
Runtime folks (@aclements @rsc @RLH @randall77): can we make it a priority to get the More examples (three in a row!): |
By "long test" do you mean just "go test"? Historically there has not been a requirement that the non-short tests complete in any particular time, and I'm kind of disappointed to see this requirement imposed. But even ignoring that, 'go test runtime' runs in 1.5 minutes on my laptop. If the builders are 6X slower than my laptop, let's focus on that problem. |
It looks like That header says "-quick" (I forget how that's different from -short). How does the longtest builder affect that test? |
It's not the build system imposing it. It's cmd/go:
We can have the coordinator explicitly set it much higher if you'd like. |
Actually, that's not quite accurate. While cmd/go's default is 10 minutes, cmd/dist test still sets a limit, adjusted by GO_TEST_TIMEOUT_SCALE. Instead I sent https://go-review.googlesource.com/c/build/+/167638 to just crank up the longtest's timeout scale. |
I wrote that paragraph before I checked how long the runtime test actually took. The real problem seems to be that the "longtest" builder machines are |
Change https://golang.org/cl/167638 mentions this issue: |
It's using the default GCE VM type we use for all other container-based builds: The VM is single-use & unshared by other tests. We stopped using Kubernetes a long time ago specifically because of people (often rightfully) blaming Kubernetes' isolation quality for flaky tests. |
The -cpu=1,2,4 test takes a fair bit longer. On my workstation: ok runtime 422.956s It doesn't surprise me that this goes over the 10 minute mark on a slower machine. |
Updates golang/go#25886 Change-Id: I5168e291ab77cbd3843bdc39e319a68dfa65aedd Reviewed-on: https://go-review.googlesource.com/c/build/+/167638 Reviewed-by: Russ Cox <[email protected]>
If the problem is the combined time for |
Well, we don't run the longtest builder sharded. It's not a trybot so we don't particularly care about latency. We also bound it to 1 builder per commit at a time, which means it can fill in the dashboard slowly, but that's fine. |
It's still timing out at 10 minutes. Has the dashboard change been deployed? |
@aclements, no, not yet. Every time I've went to do it people were using gomote and just starting trybots. Btw, you can check this yourself at https://farmer.golang.org/ where it says at the top:
I'll do it now. |
The longtest builder appears to be happy now. Shall we close this? |
Closing. |
Change https://golang.org/cl/192679 mentions this issue: |
CL 167638 made a change to add more CPU resources to the longtest builder, with the intention of going from 4 vCPUs, 15 GB RAM to 16 vCPUs, 14.4 GB RAM. It used n1-highcpu-8 GCE machine type, which actually has 8 vCPUs and 7.2 GB RAM.¹ Having less RAM than before wasn't the intention. Fix that by changing n1-highcpu-8 to n1-highcpu-16, which matches the comment. ¹ https://cloud.google.com/compute/docs/machine-types Updates golang/go#32831 Updates golang/go#33986 Updates golang/go#25886 Change-Id: I8426867fe33b3bf86576cb13d0d6113cd87f30c1 Reviewed-on: https://go-review.googlesource.com/c/build/+/192679 Reviewed-by: Bryan C. Mills <[email protected]>
CL 167638 made a change to add more CPU resources to the longtest builder, with the intention of going from 4 vCPUs, 15 GB RAM to 16 vCPUs, 14.4 GB RAM. It used n1-highcpu-8 GCE machine type, which actually has 8 vCPUs and 7.2 GB RAM.¹ Having less RAM than before wasn't the intention. Fix that by changing n1-highcpu-8 to n1-highcpu-16, which matches the comment. ¹ https://cloud.google.com/compute/docs/machine-types Updates golang/go#32831 Updates golang/go#33986 Updates golang/go#25886 Change-Id: I8426867fe33b3bf86576cb13d0d6113cd87f30c1 Reviewed-on: https://go-review.googlesource.com/c/build/+/192679 Reviewed-by: Bryan C. Mills <[email protected]>
For example: https://build.golang.org/log/9b5c6ae2c5be87e49c9893e9ea6dcc2aa6198200
The builder has been consistently broken for this and other reasons since mid-May, so it's hard to pinpoint when this failure started happening. I tried looking for an existing issue about this, but couldn't find any.
If I run
go test -v runtime
on my laptop, I get the result:Perhaps the three-minute timeout per package is too low for the runtime package, seeing as how my run barely fit under 180s. Or perhaps some of the long or expensive tests should be made faster.
/cc @aclements @ianlancetaylor @bradfitz
Tangential question - should this be labelled
Testing
orBuilders
? Perhaps the label descriptions could be clarified.The text was updated successfully, but these errors were encountered: