Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: add a longtest+race builder #54630

Closed
bcmills opened this issue Aug 23, 2022 · 17 comments
Closed

x/build: add a longtest+race builder #54630

bcmills opened this issue Aug 23, 2022 · 17 comments
Assignees
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. new-builder
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Aug 23, 2022

We have -race builders that run tests with the race detector enabled, and -longtest builders that run tests without -short.
However, most non-trivial races occur in concurrent code, and tests with nontrivial concurrency tend to take longer. So the tests run by the -race builders miss a lot of the actual races.

It would be nice to have at least one builder that runs the full test suite (without -short) under the race detector.

@bcmills bcmills added Builders x/build issues (builders, bots, dashboards) new-builder labels Aug 23, 2022
@gopherbot gopherbot added this to the Unreleased milestone Aug 23, 2022
@heschi heschi added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Aug 29, 2022
@heschi
Copy link
Contributor

heschi commented Aug 29, 2022

cc @golang/release

@heschi heschi moved this to Planned in Go Release Sep 27, 2022
@mknyszek
Copy link
Contributor

mknyszek commented Nov 9, 2022

The race detector builders currently have GO_DISABLE_OUTBOUND_NETWORK=1 while longtest builders have needsGoProxy: true. These seem in conflict, and I don't have a good idea of how they'll actually resolve. I guess I can just try and see what happens?

@heschi
Copy link
Contributor

heschi commented Nov 9, 2022

The long tests require network so you'll want to leave the network enabled.

@mknyszek
Copy link
Contributor

mknyszek commented Nov 9, 2022

Got it. Also @bcmills do you want this to run for subrepos? Or are you just concerned about the main repo at the moment?

(I figure I'll just start with the main repo since I suspect there will be some surprises...)

@bcmills
Copy link
Contributor Author

bcmills commented Nov 9, 2022

Ideally subrepos too (especially x/sys and x/net), but the main repo is a good starting point.

@mknyszek
Copy link
Contributor

I redeployed the coordinator and the builders are online! (For subrepos as well.) Hopefully the timeout scale is sufficient. We'll see soon.

@mknyszek
Copy link
Contributor

I think the UI could also use some work. It's in the race section but it just says "linux" twice. I'll poke at that.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/449196 mentions this issue: dashboard: add linux-amd64-longtest-race builder

@mknyszek
Copy link
Contributor

Hm, the builds seem stuck on... what I have to guess is the index/suffixarray tests. On longtest those already take 47 seconds... I wonder if it's ballooning out on longtest x race.

@mknyszek
Copy link
Contributor

For comparison, image/gif longtest seems to be around 6x slower with the race detector. So, assuming that slowdown applies here as well (maybe not) 47 seconds balloons out to around 5 minutes. That seems like a lot.

@mknyszek
Copy link
Contributor

Worse than that actually, 600+ seconds. The runtime tests took 700+ seconds. Unfortunately the builds are incredibly slow. :( How should we proceed? Leave the builder but start disabling tests for this configuration? Is it fine to be this slow?

@bcmills
Copy link
Contributor Author

bcmills commented Nov 10, 2022

If the tests pass reliably (with an appropriate timeout scale factor), then I guess the relevant question is just how slow are they?

Given that the race detector dumps exact stack traces for any race it detects, and given that it's for a common platform and architecture (so not hard to find a local machine to run the tests on), I'd say that if the builder takes, say, twice as long as the longtest builder overall, that's probably fine. But if it's going to take, say, 10 hours per run, we probably ought to prune things back in some way.

The log you linked seems to have disappeared, but looking at the index/suffixarray tests it looks like they're running exhaustive checks over huge input spaces (2²¹ and 3¹⁴ cases, respectively). Given that those spaces have already been checked exhaustively, I wonder if those tests would be more cost-effective as fuzz tests sampling fewer elements, perhaps from an even larger space. 🤔

@mknyszek
Copy link
Contributor

It's definitely not 10 hours but I don't have complete data. It does actually seem to be mostly keeping up with tip, so maybe it's fine.

Maybe let's just leave it up and we can work on improving run times more incrementally.

@bcmills
Copy link
Contributor Author

bcmills commented Nov 10, 2022

It does look like x/tools/go/pointer and x/benchmarks/sweet/cmd/sweet might need either some test skips or a longer timeout scale. I see consistent timeouts in those.

@mknyszek
Copy link
Contributor

x/benchmarks/sweet/cmd/sweet is just one test and yeah, it should probably just be disabled for this builder.

Repository owner moved this from Planned to Done in Go Release Nov 10, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/449518 mentions this issue: go/pointer: break TestInput into subtests and update skips

gopherbot pushed a commit to golang/tools that referenced this issue Nov 11, 2022
TestInput is fiendishly slow and memory-hungry. Breaking it into
subtests reveals that the vast majority of that cost can be attributed
to a single test case: "testdata/a_test.go".

Update the address-space-based skip to skip only that one input, skip
it under the race detector (it is timing out on the new
"linux-amd64-longtest-race" builder), and run the remaining inputs in
parallel (to reduce test latency now that we've better identified the
culprit).

Updates golang/go#14113.
Updates golang/go#54630.

Change-Id: I105cfa173edc74692eaa41efd50ae08eeacf0d7d
Reviewed-on: https://go-review.googlesource.com/c/tools/+/449518
TryBot-Result: Gopher Robot <[email protected]>
gopls-CI: kokoro <[email protected]>
Auto-Submit: Bryan Mills <[email protected]>
Run-TryBot: Bryan Mills <[email protected]>
Reviewed-by: Alan Donovan <[email protected]>
@golang golang locked and limited conversation to collaborators Nov 11, 2023
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Mar 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. new-builder
Projects
Archived in project
Development

No branches or pull requests

5 participants