Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: OpenBSD 7.0 386 builder is slow #49666

Open
heschi opened this issue Nov 18, 2021 · 13 comments
Open

x/build: OpenBSD 7.0 386 builder is slow #49666

heschi opened this issue Nov 18, 2021 · 13 comments
Labels
Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@heschi
Copy link
Contributor

heschi commented Nov 18, 2021

I am about 80% sure that the OpenBSD builders, both 6.8 and 7.0, are timing out running all.bash. We have a hard 30 minute limit on build VMs, and the OpenBSD builders take slightly longer than that to run all.bash from scratch. However, they write a snapshot of the distribution after it's built, and reusing that saves enough time for the build to succeed upon retry -- that's why the dashboard is relatively clean-looking.

This is wasteful. We should probably increase the timeout so that the build finishes, prevent the retry, or modify the OpenBSD tests somehow to run in a more reasonable amount of time.

cc @golang/release @4a6f656c

@heschi heschi added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 18, 2021
@heschi heschi modified the milestones: Backlog, Unreleased Nov 18, 2021
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Nov 18, 2021
@bcmills
Copy link
Contributor

bcmills commented Dec 7, 2021

This is wasteful. We should probably increase the timeout so that the build finishes, prevent the retry, or …

#42699 makes a similar observation about TryBot stalls and retries.

@bcmills
Copy link
Contributor

bcmills commented Jan 26, 2022

The openbsd-386-70 builder has been struggling for quite a long time now, and it has only completed a couple of builds for the many CLs that were merged over the past few days:
image

It sounds like the plan is just to raise the timeout — is this blocked on anything in particular?

@heschi
Copy link
Contributor Author

heschi commented Feb 3, 2022

I said I'd do it and then disappeared in a puff of smoke for two weeks. Mailing now.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/382978 mentions this issue: dashboard: allow 45 minutes for 386 OpenBSD builds

@heschi
Copy link
Contributor Author

heschi commented Feb 4, 2022

Maybe there was more than one thing going on. OpenBSD 6.8 386 is clean, perhaps because of this change or perhaps not. 7.0, however, is still failing even after (I believe) the change has taken effect. See attached.

builder openbsd-386-70.txt

It seems that cgo_test is hanging. Retitling.

@heschi heschi changed the title x/build: OpenBSD builders are timing out running tests on the main repo runtime/go: cgo_test times out on OpenBSD 7.0 386 Feb 4, 2022
@heschi heschi changed the title runtime/go: cgo_test times out on OpenBSD 7.0 386 runtime/cgo: cgo_test times out on OpenBSD 7.0 386 Feb 4, 2022
@heschi
Copy link
Contributor Author

heschi commented Feb 4, 2022

@ianlancetaylor

I'm going to call this a release blocker until it's investigated.

@heschi heschi modified the milestones: Unreleased, Go1.18 Feb 4, 2022
@ianlancetaylor
Copy link
Member

On a gomote the cgo_test takes 22 seconds to run. I don't think there is anything specific to cgo here.

In the log attached above everything seems to be slow. I see things like

  2022-02-04T17:24:20Z run_tests_multi 10.128.15.199:80: [go_test:container/heap go_test:container/list go_test:container/ring]
  2022-02-04T17:24:29Z finish_run_tests_multi after 8.61s; 10.128.15.199:80: [go_test:container/heap go_test:container/list go_test:container/ring]

...

ok  	container/heap	0.058s
ok  	container/list	0.047s
ok  	container/ring	0.049s

The log suggests that it takes 8.61 seconds to run the container/heap, container/list, and container/ring tests. But when reporting the actual test times, the total is 0.15 seconds. The latter time probably doesn't include the time that it takes to build the tests, but of course that shouldn't be very long either; on a gomote each takes about 1 second to build.

I don't know what is going here but I don't think it's cgo related.

CC @golang/release

@ianlancetaylor ianlancetaylor changed the title runtime/cgo: cgo_test times out on OpenBSD 7.0 386 builders: OpenBSD 7.0 386 builder is slow Feb 4, 2022
@bcmills
Copy link
Contributor

bcmills commented Feb 4, 2022

But when reporting the actual test times, the total is 0.15 seconds. The latter time probably doesn't include the time that it takes to build the tests, but of course that shouldn't be very long either; on a gomote each takes about 1 second to build.

In #49343, @millerresearch noted that

on the raspberry pi builders each invocation of go tool dist test adds an overhead of >30 seconds, even for a test which takes a small fraction of a second.

Is it possible that that's what's going on here too?

@heschi
Copy link
Contributor Author

heschi commented Feb 4, 2022

cc @4a6f656c

@millerresearch
Copy link
Contributor

Is it possible that that's what's going on here too?

Yes, although in this case the overhead seems to be about 8 seconds for each invocation of dist test. Still way more than the time taken by the test itself.

The overhead consists mainly of multiple unnecessary calls to checkNotStale, which thrashes the file system on operating systems which don't do as much aggressive caching of directories and inodes as linux does.

The problem is magnified by the unnecessary partitioning of tests into separate invocations of dist test, in many cases only running one test per dist test.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/384294 mentions this issue: dashboard: allow 60 minutes for 386 OpenBSD builds

gopherbot pushed a commit to golang/build that referenced this issue Feb 9, 2022
Either 45 minutes wasn't enough, or these things are hanging forever.
Give them even more time so we can figure out which.

For golang/go#49666.

Change-Id: Ia8da50f06d828d508226bf01a3d53e5790a648e6
Reviewed-on: https://go-review.googlesource.com/c/build/+/384294
Trust: Heschi Kreinick <[email protected]>
Run-TryBot: Heschi Kreinick <[email protected]>
Reviewed-by: Dmitri Shuralyov <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Carlos Amedee <[email protected]>
@heschi
Copy link
Contributor Author

heschi commented Feb 10, 2022

Builds are succeeding with a 60-minute timeout. Given the above hypothesis about expensive filesystem reads, this doesn't seem like a release blocker.

@dmitshur dmitshur changed the title builders: OpenBSD 7.0 386 builder is slow x/build: OpenBSD 7.0 386 builder is slow Feb 12, 2022
@dmitshur dmitshur modified the milestones: Go1.18, Unreleased Mar 4, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/406216 mentions this issue: cmd/coordinator: consolidate and increase global VM deletion timeout

gopherbot pushed a commit to golang/build that referenced this issue May 16, 2022
We had a lot of flexibility over timeouts, making their maintenance
harder. Consolidate it to a single timeout in the pool package, and
modify it from 45 minutes to 2 hours.

There's room for improvement in how we maintain this timeout,
but I'm leaving that for future work (with a tracking issue).

Fixes golang/go#52591.
Updates golang/go#52929.
Updates golang/go#49666.
Updates golang/go#42699.

Change-Id: I2ad92648d89a714397bd8b0e1ec490fc9f6d6790
Reviewed-on: https://go-review.googlesource.com/c/build/+/406216
Run-TryBot: Dmitri Shuralyov <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Dmitri Shuralyov <[email protected]>
Reviewed-by: Carlos Amedee <[email protected]>
Reviewed-by: Heschi Kreinick <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants