Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: autoupgrade failed #51776

Closed
cockroach-teamcity opened this issue Jul 22, 2020 · 5 comments · Fixed by #51893
Closed

roachtest: autoupgrade failed #51776

cockroach-teamcity opened this issue Jul 22, 2020 · 5 comments · Fixed by #51893
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).autoupgrade failed on master@e9a4f83e3eee59510f97db2c6e0df9b57cf6b944:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/autoupgrade/run_1
	test_runner.go:804: test timed out (10h0m0s)

More

Artifacts: /autoupgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 22, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Jul 22, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).autoupgrade failed on master@b8a50cc4d062293915969cdc83e3ec4d057cede5:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/autoupgrade/run_1
	test_runner.go:804: test timed out (10h0m0s)

	autoupgrade.go:192,autoupgrade.go:265,test_runner.go:757: context canceled

More

Artifacts: /autoupgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).autoupgrade failed on master@bfa6307c292ef4dfed4a53cb99f506e6dab26533:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/autoupgrade/run_1
	autoupgrade.go:141,autoupgrade.go:265,test_runner.go:757: determining cluster version: write tcp 172.17.0.3:55188->35.238.81.230:26257: write: broken pipe
		(1) attached stack trace
		  | main.registerAutoUpgrade.func1.3
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/autoupgrade.go:94
		  | main.registerAutoUpgrade.func1.4
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/autoupgrade.go:108
		  | main.registerAutoUpgrade.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/autoupgrade.go:140
		  | main.registerAutoUpgrade.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/autoupgrade.go:265
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (2) determining cluster version
		Wraps: (3) write tcp 172.17.0.3:55188->35.238.81.230:26257
		Wraps: (4) write
		Wraps: (5) broken pipe
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *net.OpError (4) *os.SyscallError (5) syscall.Errno

More

Artifacts: /autoupgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@irfansharif
Copy link
Contributor

16:24:15 test.go:208: test worker status: sleeping
16:24:45 test.go:208: test worker status: decommission
16:24:45 cluster.go:2190: [w0] > ./cockroach node decommission 3 --insecure --port={pgport:3}
16:25:21 test.go:208: test worker status: stop
16:25:21 test.go:190: test status: stopping cluster
16:25:21 cluster.go:380: > /Users/irfansharif/Software/src/github.com/cockroachdb/cockroach/bin/roachprod stop local:3
local: stopping and waiting.
16:25:22 test.go:208: test worker status: sleeping

Hm, is this me?

@irfansharif
Copy link
Contributor

I'm not entirely sure what's happening here. Running it locally I've occasionally run into the following:

        autoupgrade.go:200,autoupgrade.go:273,test_runner.go:757: pq: cannot set cluster.preserve_downgrade_option to 20.1 (cluster version is 20.1-13)

Which, looking at the test, basically means the cluster has autoupgraded despite the test not expecting it to (because a non-decommissioned node is still down). But that's a different failure mode from the 10h0m0s timeout that originally created this issue.

Hm, actually looking at the logs for those I see that under under 3.logs, we have ctx tags for n5, which tells me it's the same thing as #51497 (comment).

@irfansharif
Copy link
Contributor

n3 also plays a central role in this test, and so does n5. Crossing those wires seems bad, and should be fixed anyway. I'm going to go do that now, and not think too hard about how confusing the two nodes may have resulted in this failure. Given this started flaking only recently, it's probably the same fallout (introduced in #51329).

irfansharif added a commit to irfansharif/cockroach that referenced this issue Jul 24, 2020
..and the setting of cluster settings for single node clusters.
`roachprod start --sequential` was broken in cockroachdb#51329, and the broken-ness
outlined in TODOs in cockroachdb#51790. This PR just addresses those TODOs.

Fixes cockroachdb#51497
Fixes cockroachdb#51721
Fixes cockroachdb#51738
Fixes cockroachdb#51768
Fixes cockroachdb#51769
Fixes cockroachdb#51776

Release note: None
craig bot pushed a commit that referenced this issue Jul 25, 2020
51893: roachprod: fixup `roachprod --sequential` r=irfansharif a=irfansharif

..and the setting of cluster settings for single node clusters.
`roachprod start --sequential` was broken in #51329, and the broken-ness
outlined in TODOs in #51790. This PR just addresses those TODOs.

Fixes #51497
Fixes #51721
Fixes #51738
Fixes #51768
Fixes #51769
Fixes #51776

Release note: None

Co-authored-by: irfan sharif <[email protected]>
@craig craig bot closed this as completed in #51893 Jul 25, 2020
@craig craig bot closed this as completed in 6d6706b Jul 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants