Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/release{,bot}: include long tests in pre-release testing #29252

Closed
bcmills opened this issue Dec 14, 2018 · 28 comments
Closed

x/build/cmd/release{,bot}: include long tests in pre-release testing #29252

bcmills opened this issue Dec 14, 2018 · 28 comments
Labels
Builders x/build issues (builders, bots, dashboards) early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Dec 14, 2018

I tested CL 154101 and the subsequent security patches using a combination of go test -run TestScript/[…] and all.bash. Unfortunately, significant parts of go get (including path-to-repository-resolution) are only exercised in non-short tests, and all.bash by default only runs the short tests, despite the name. (I remember that latter point occasionally — but apparently not frequently enough.)

Even more unfortunately, releasebot suggests all.bash for security releases as well, and release runs the same all.bash commands as the regular builders.

As a result, a significant regression (#29241) made it all the way through development, code review, and release building without running the existing tests that should have caught it.

We should ensure that the commands release executes and the instructions releasebot prints for both kinds of releases include the non-short tests on at least one platform.

(CC @bradfitz @dmitshur @FiloSottile)

@gopherbot gopherbot added this to the Unreleased milestone Dec 14, 2018
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Dec 14, 2018
@bradfitz
Copy link
Contributor

Additionally: normally we also look at build.golang.org for a happy row of "ok" before doing a release, but because this was a security release without any of our normal infrastructure, we didn't have build.golang.org and didn't have the existing linux-amd64-longtest builder.

@bradfitz
Copy link
Contributor

The longtest builder is highlighted here and shows the breakage, which we would've been much more likely to see using the normal infrastructure:

screen shot 2018-12-14 at 6 03 12 am

@bcmills
Copy link
Contributor Author

bcmills commented Dec 14, 2018

Yep, there were a lot of factors at play. This one seems like a relatively easy win, though: we don't cut all that many releases, so “run all of the tests one last time to be sure” seems like a reasonable step in the release process.

@bcmills
Copy link
Contributor Author

bcmills commented Aug 21, 2019

CC @toothrot

@bcmills bcmills modified the milestones: Unreleased, Go1.14 Aug 21, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Aug 21, 2019

Marking release-blocker for 1.14. We really ought to be running all of the tests before we cut a release.

@toothrot
Copy link
Contributor

@bcmills Would it be sufficient for release to query the dashboard/coordinator for longtest status? On one hand, I would love to run longtest at build time, but on the other hand I am hesitant to increase the duration of our release process significantly.

@bradfitz
Copy link
Contributor

I also vote we just query the dashboard to gate releases. cmd/release is extra slow in all.bash mode as-is (without adding long tests) because it doesn't shard test execution over N machines. Adding long tests just makes a slow situation even worse.

Now that the build system has a scheduler, we can even tweak the scheduler to make sure that of release-branch HEADs are highest priority. (It might already be doing close to that as-is, actually)

We could even go a step further and have cmd/release not even run make.bash and instead just pick up the artifacts from the previous build (which are already known to be good artifacts if all the tests passed). But that's for another day. (Even further: run cmd/release on every release and then releasebot just downloads them)

@bcmills
Copy link
Contributor Author

bcmills commented Dec 11, 2019

Querying the dashboard seems fine for regular releases, as long as we're checking the result for the actual commit that we're about to release.

I think that still leaves a testing gap for security releases, though.

@bcmills
Copy link
Contributor Author

bcmills commented Dec 11, 2019

FWIW, it looks like we released 1.12.14 and 1.13.5 with failing longtest builds again. 😞

https://golang.org/cl/205438 and https://golang.org/cl/205439 need to be reviewed and merged before the next point releases.

@toothrot
Copy link
Contributor

toothrot commented Jan 9, 2020

@bcmills As of today, this is still a manual step in our process. I've noticed another possible brittle test appear in the longtest failures for 1.12, and we'll look into addressing that.

We'll still do the effort to have our release automation query the branch status before tagging a release.

The build dashboards for 1.13 and 1.12 have some ports that are consistently failing. It seems like we should reconsider the validity of some ports based on their builder status.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/214433 mentions this issue: dashboard: upsize freebsd-amd64-race builder to 7.2 GB RAM

gopherbot pushed a commit to golang/build that referenced this issue Jan 14, 2020
Start using n1-highcpu-8 machine type instead of n1-highcpu-4
for the freebsd-amd64-race builder.

The freebsd-amd64-race builder has produced good test results
for the x/tools repo for a long time, but by now it has started
to consistently fail for reasons that seem connected to it having
only 3.6 GB memory. The Windows race builders needed to be bumped
from 7.2 GB to 14.4 GB to run successfully, so this change makes
a small incremental step to bring freebsd-amd64-race closer in
line with other builders. If memory-related problems continue to
occur with 7.2 GB, the next step will be to go up to 14.4 GB.

The freebsd-amd64-race builder is using an older version of FreeBSD.
We may want to start using a newer one for running tests with -race,
but that should be a separate change so we can see the results of
this change without another confounding variable.

Also update all FreeBSD builders to use https in buildletURLTmpl,
because it's expected to work fine and will be more consistent.

Updates golang/go#36444
Updates golang/go#34621
Updates golang/go#29252
Updates golang/go#33986

Change-Id: Idfcefd1c91bddc9f70ab23e02fcdca54fda9d1ac
Reviewed-on: https://go-review.googlesource.com/c/build/+/214433
Run-TryBot: Carlos Amedee <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Carlos Amedee <[email protected]>
@cagedmantis cagedmantis self-assigned this Feb 7, 2020
@cagedmantis
Copy link
Contributor

Assigned the issue to myself since I will be tracking progress.

@cagedmantis
Copy link
Contributor

cagedmantis commented Feb 25, 2020

Moving this to the next major milestone. The long tests have been run manually.

@cagedmantis cagedmantis modified the milestones: Go1.14, Go1.15 Feb 25, 2020
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/227859 mentions this issue: dashboard: upsize freebsd-amd64-race builder to 16 vCPUs, 14.4 GB mem

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/227859 mentions this issue: dashboard: upsize freebsd-amd64-race builder to 16 vCPUs, 14.4 GB RAM

gopherbot pushed a commit to golang/build that referenced this issue Apr 10, 2020
This is a followup to CL 214433. Start using n1-highcpu-16 machine type
instead of n1-highcpu-8 for the freebsd-amd64-race builder.

Increasing the RAM from 3.6 GB to 7.2 GB has helped golang/go#36444
significantly: the builder stopped failing consistently on x/tools
and resulted in many data races being uncovered in golang/go#36605.

However, by now, it has started to fail consistently again. This
time it seems to be due to low performance, causing the tests in
golang.org/x/tools/internal/lsp/regtest package to fail due with
"context deadline exceeded" errors.

FreeBSD is one of the ports that stays visible when "show only first-
class ports" is checked on build.golang.org. The other -race builders
have all been upgraded to the n1-highcpu-16 machine type by now out
of necessity.

It seems fair to provide the FreeBSD port with an equal amount of
resources, even if the increased memory isn't strictly required yet.
Once this change is applied, if the failures persist, we can be more
confident that the problem is due to the code or the port, rather
than due to this -race builder having 2𝗑 less CPU and RAM resources
compared to other -race builders.

An alternative is to increase timeout for this builder type, but I'm
opting to defer exploring that after equalizing the machine type.

For golang/go#36444.
For golang/go#34621.
For golang/go#29252.
For golang/go#33986.

Change-Id: I41f149365128c7bc6f576c778ac07618acc04612
Reviewed-on: https://go-review.googlesource.com/c/build/+/227859
Reviewed-by: Alexander Rakoczy <[email protected]>
@dmitshur dmitshur self-assigned this May 12, 2020
@dmitshur dmitshur modified the milestones: Go1.15, Go1.16 Jul 6, 2020
gopherbot pushed a commit that referenced this issue Aug 18, 2020
There were two places where the -short flag was added in order to
speed up tests when run in short mode, in CL 178399 and CL 177417.

It appears viable to re-use the GO_TEST_SHORT value so that -short
flag is not used when the tests are executed on a longtest builder,
where it is not a goal to skip slow tests for improved performance.

Do so, in order to make the testing configurations simpler and more
predictable.

Factor out the flag name out of the string returned by short, so that
it can be used in context of 'go test' which can accept a -short flag,
and a test binary which requires the use of a longer -test.short flag.

For #39054.
For #29252.

Change-Id: I52dfbef73cc8307735c52e2ebaa609305fb05933
Reviewed-on: https://go-review.googlesource.com/c/go/+/233898
Run-TryBot: Dmitri Shuralyov <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@dmitshur dmitshur added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Dec 10, 2020
@toothrot toothrot removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Dec 17, 2020
@dmitshur
Copy link
Contributor

I've made progress on investigating and understanding what's needed to complete this issue during the Go 1.16 release timeframe. It is now well understood (by me), but resolving it will need some discussion and collaboration with the cmd/go team. It became too late to start this work in the 1.16 cycle, but I plan to resume it early in Go 1.17 cycle instead. (We will continue to do the manual long test verification via go test std cmd until then.)

@dmitshur dmitshur modified the milestones: Go1.16, Go1.17 Jan 20, 2021
@dmitshur dmitshur added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Jan 20, 2021
@dmitshur dmitshur changed the title x/build/cmd/release{,bot}: include long tests in pre-release testing x/build/cmd/release{,bot}: include long tests in pre-release testing (windows-amd64-longtest is done, linux-amd64-longtest remains) Jan 20, 2021
@gopherbot
Copy link
Contributor

This issue is currently labeled as early-in-cycle for Go 1.17.
That time is now, so a friendly reminder to look at it again.

@dmitshur
Copy link
Contributor

After backports #45240 and #45239 are made, we'll be able to start using linux-amd64-longtest and linux-386-longtest builders during release testing.

CL 304949 adds those two targets to be used in future releases. Once that's done, this issue will be resolved.

@dmitshur dmitshur changed the title x/build/cmd/release{,bot}: include long tests in pre-release testing (windows-amd64-longtest is done, linux-amd64-longtest remains) x/build/cmd/release{,bot}: include long tests in pre-release testing (windows-amd64-longtest is done, linux-{386,amd64}-longtest remains) Mar 25, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/304949 mentions this issue: cmd/releasebot, cmd/release: add 2 more longtest test-only targets

@dmitshur dmitshur changed the title x/build/cmd/release{,bot}: include long tests in pre-release testing (windows-amd64-longtest is done, linux-{386,amd64}-longtest remains) x/build/cmd/release{,bot}: include long tests in pre-release testing Mar 29, 2021
@golang golang locked and limited conversation to collaborators Mar 29, 2022
@heschi heschi moved this to Done in Go Release Sep 27, 2022
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
Archived in project
Development

No branches or pull requests

6 participants