-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build/cmd/release{,bot}: include long tests in pre-release testing #29252
Comments
Additionally: normally we also look at build.golang.org for a happy row of "ok" before doing a release, but because this was a security release without any of our normal infrastructure, we didn't have build.golang.org and didn't have the existing |
Yep, there were a lot of factors at play. This one seems like a relatively easy win, though: we don't cut all that many releases, so “run all of the tests one last time to be sure” seems like a reasonable step in the release process. |
CC @toothrot |
Marking release-blocker for 1.14. We really ought to be running all of the tests before we cut a release. |
@bcmills Would it be sufficient for release to query the dashboard/coordinator for longtest status? On one hand, I would love to run longtest at build time, but on the other hand I am hesitant to increase the duration of our release process significantly. |
I also vote we just query the dashboard to gate releases. cmd/release is extra slow in all.bash mode as-is (without adding long tests) because it doesn't shard test execution over N machines. Adding long tests just makes a slow situation even worse. Now that the build system has a scheduler, we can even tweak the scheduler to make sure that of release-branch HEADs are highest priority. (It might already be doing close to that as-is, actually) We could even go a step further and have cmd/release not even run make.bash and instead just pick up the artifacts from the previous build (which are already known to be good artifacts if all the tests passed). But that's for another day. (Even further: run cmd/release on every release and then releasebot just downloads them) |
Querying the dashboard seems fine for regular releases, as long as we're checking the result for the actual commit that we're about to release. I think that still leaves a testing gap for security releases, though. |
FWIW, it looks like we released 1.12.14 and 1.13.5 with failing longtest builds again. 😞 https://golang.org/cl/205438 and https://golang.org/cl/205439 need to be reviewed and merged before the next point releases. |
@bcmills As of today, this is still a manual step in our process. I've noticed another possible brittle test appear in the longtest failures for 1.12, and we'll look into addressing that. We'll still do the effort to have our release automation query the branch status before tagging a release. The build dashboards for 1.13 and 1.12 have some ports that are consistently failing. It seems like we should reconsider the validity of some ports based on their builder status. |
Change https://golang.org/cl/214433 mentions this issue: |
Start using n1-highcpu-8 machine type instead of n1-highcpu-4 for the freebsd-amd64-race builder. The freebsd-amd64-race builder has produced good test results for the x/tools repo for a long time, but by now it has started to consistently fail for reasons that seem connected to it having only 3.6 GB memory. The Windows race builders needed to be bumped from 7.2 GB to 14.4 GB to run successfully, so this change makes a small incremental step to bring freebsd-amd64-race closer in line with other builders. If memory-related problems continue to occur with 7.2 GB, the next step will be to go up to 14.4 GB. The freebsd-amd64-race builder is using an older version of FreeBSD. We may want to start using a newer one for running tests with -race, but that should be a separate change so we can see the results of this change without another confounding variable. Also update all FreeBSD builders to use https in buildletURLTmpl, because it's expected to work fine and will be more consistent. Updates golang/go#36444 Updates golang/go#34621 Updates golang/go#29252 Updates golang/go#33986 Change-Id: Idfcefd1c91bddc9f70ab23e02fcdca54fda9d1ac Reviewed-on: https://go-review.googlesource.com/c/build/+/214433 Run-TryBot: Carlos Amedee <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Carlos Amedee <[email protected]>
Assigned the issue to myself since I will be tracking progress. |
Moving this to the next major milestone. The long tests have been run manually. |
Change https://golang.org/cl/227859 mentions this issue: |
Change https://golang.org/cl/227859 mentions this issue: |
This is a followup to CL 214433. Start using n1-highcpu-16 machine type instead of n1-highcpu-8 for the freebsd-amd64-race builder. Increasing the RAM from 3.6 GB to 7.2 GB has helped golang/go#36444 significantly: the builder stopped failing consistently on x/tools and resulted in many data races being uncovered in golang/go#36605. However, by now, it has started to fail consistently again. This time it seems to be due to low performance, causing the tests in golang.org/x/tools/internal/lsp/regtest package to fail due with "context deadline exceeded" errors. FreeBSD is one of the ports that stays visible when "show only first- class ports" is checked on build.golang.org. The other -race builders have all been upgraded to the n1-highcpu-16 machine type by now out of necessity. It seems fair to provide the FreeBSD port with an equal amount of resources, even if the increased memory isn't strictly required yet. Once this change is applied, if the failures persist, we can be more confident that the problem is due to the code or the port, rather than due to this -race builder having 2𝗑 less CPU and RAM resources compared to other -race builders. An alternative is to increase timeout for this builder type, but I'm opting to defer exploring that after equalizing the machine type. For golang/go#36444. For golang/go#34621. For golang/go#29252. For golang/go#33986. Change-Id: I41f149365128c7bc6f576c778ac07618acc04612 Reviewed-on: https://go-review.googlesource.com/c/build/+/227859 Reviewed-by: Alexander Rakoczy <[email protected]>
There were two places where the -short flag was added in order to speed up tests when run in short mode, in CL 178399 and CL 177417. It appears viable to re-use the GO_TEST_SHORT value so that -short flag is not used when the tests are executed on a longtest builder, where it is not a goal to skip slow tests for improved performance. Do so, in order to make the testing configurations simpler and more predictable. Factor out the flag name out of the string returned by short, so that it can be used in context of 'go test' which can accept a -short flag, and a test binary which requires the use of a longer -test.short flag. For #39054. For #29252. Change-Id: I52dfbef73cc8307735c52e2ebaa609305fb05933 Reviewed-on: https://go-review.googlesource.com/c/go/+/233898 Run-TryBot: Dmitri Shuralyov <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
I've made progress on investigating and understanding what's needed to complete this issue during the Go 1.16 release timeframe. It is now well understood (by me), but resolving it will need some discussion and collaboration with the cmd/go team. It became too late to start this work in the 1.16 cycle, but I plan to resume it early in Go 1.17 cycle instead. (We will continue to do the manual long test verification via |
This issue is currently labeled as early-in-cycle for Go 1.17. |
Change https://golang.org/cl/304949 mentions this issue: |
I tested CL 154101 and the subsequent security patches using a combination of
go test -run TestScript/[…]
andall.bash
. Unfortunately, significant parts ofgo get
(including path-to-repository-resolution) are only exercised in non-short
tests, andall.bash
by default only runs theshort
tests, despite the name. (I remember that latter point occasionally — but apparently not frequently enough.)Even more unfortunately,
releasebot
suggestsall.bash
for security releases as well, andrelease
runs the sameall.bash
commands as the regular builders.As a result, a significant regression (#29241) made it all the way through development, code review, and release building without running the existing tests that should have caught it.
We should ensure that the commands
release
executes and the instructionsreleasebot
prints for both kinds of releases include the non-short
tests on at least one platform.(CC @bradfitz @dmitshur @FiloSottile)
The text was updated successfully, but these errors were encountered: