Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed #45820

Closed
cockroach-teamcity opened this issue Mar 6, 2020 · 4 comments
Closed

roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed #45820

cockroach-teamcity opened this issue Mar 6, 2020 · 4 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@954fe69d554162aec0fbc001aad1fe5103d8df13:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200306-1790595/tpccbench/nodes=9/cpu=4/chaos/partition/run_1
	test_runner.go:756: test timed out (10h0m0s)

	tpcc.go:858,tpcc.go:570,test_runner.go:741: error with attached stack trace:
		    main.(*monitor).WaitE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2356
		    main.runTPCCBench.func3
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:835
		    github.com/cockroachdb/cockroach/pkg/util/search.searchWithSearcher
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:43
		    github.com/cockroachdb/cockroach/pkg/util/search.(*lineSearcher).Search
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/search/search.go:173
		    main.runTPCCBench
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:746
		    main.registerTPCCBenchSpec.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:570
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:741
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor failure:
		  - error with attached stack trace:
		    main.(*monitor).wait.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2412
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor task failed:
		  - context canceled

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 6, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.1 milestone Mar 6, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@752dea867f3aeb142e98c22f8d320ce19041aa8d:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200307-1793930/tpccbench/nodes=9/cpu=4/chaos/partition/run_1
	cluster.go:1410,context.go:135,cluster.go:1399,test_runner.go:778: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1793930-1583568484-26-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 10: skipped
		6: 17883
		9: 17271
		3: 19021
		4: 19581
		2: 21270
		8: 17759
		7: dead
		5: 20540
		1: 17553
		Error:  7: dead

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@c473f40078994551cebcbe00fdbf1fa388957658:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200309-1796240/tpccbench/nodes=9/cpu=4/chaos/partition/run_1
	cluster.go:1410,context.go:135,cluster.go:1399,test_runner.go:778: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1796240-1583738442-28-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 10: skipped
		2: dead
		9: 21628
		3: 18371
		8: 21995
		7: 21153
		5: 18239
		1: 19738
		4: 17700
		6: 17209
		Error:  2: dead

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@793a9200c16693aff32aa6a4dd9d8bbcbddb30aa:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200312-1804460/tpccbench/nodes=9/cpu=4/chaos/partition/run_1
	cluster.go:2368,tpcc.go:729,tpcc.go:570,test_runner.go:747: error with attached stack trace:
		    main.(*monitor).WaitE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2356
		    main.(*monitor).Wait
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2364
		    main.runTPCCBench
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:729
		    main.registerTPCCBenchSpec.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:570
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:747
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - monitor failure:
		  - unexpected node event: 1: dead

	cluster.go:1410,context.go:135,cluster.go:1399,test_runner.go:801: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1804460-1583998462-18-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 10: skipped
		1: dead
		4: 21803
		2: 22372
		7: 20181
		6: 21454
		5: 23615
		9: 19954
		8: 22284
		3: 19400
		Error:  1: dead

More

Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@nvanbenschoten
Copy link
Member

#45820 (comment) and #45820 (comment) were both:

F200309 13:15:32.536477 602779 kv/kvserver/concurrency/concurrency_manager.go:295  [n2,s2,r5880/2:/Table/57/1/{0-666}] caller violated contract
goroutine 602779 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0x744c101, 0xed5f83874, 0x0, 0x773189)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/get_stacks.go:25 +0xb8
github.com/cockroachdb/cockroach/pkg/util/log.(*loggerT).outputLogEntry(0x7448ea0, 0xc000000004, 0x69a9670, 0x2e, 0x127, 0xc00bdbd9e0, 0x3c)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:211 +0xa0c
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x4b41220, 0xc00f836db0, 0x4, 0x2, 0x0, 0x0, 0xc006064598, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2c9
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x4b41220, 0xc00f836db0, 0x1, 0x4, 0x0, 0x0, 0xc006064598, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:44 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatal(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:164
github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency.(*managerImpl).OnLockUpdated(0xc000c7b280, 0x4b41220, 0xc00f836db0, 0xc0149ed8b0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/concurrency/concurrency_manager.go:295 +0xe4

This was fixed by #41912.

The last failure is a lot more interesting.

F200312 12:49:34.767817 115 kv/kvserver/store_raft.go:488  [n1,s1,r9/1:/Table/1{3-4}] while committing batch: IO error: While pread offset 9474048 len 20341: /mnt/data1/cockroach/000939.sst: Input/output error
while committing batch
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:657
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).handleRaftReady
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_raft.go:389
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).processReady
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_raft.go:487
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).worker
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:226
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*raftScheduler).Start.func2
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/scheduler.go:166
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:198

We've seen this in the past in #42876 (comment) and #38772 (comment). Again, we see in the dmesg output on the failing node:

[16867.458270] print_req_error: critical medium error, dev nvme0n1, sector 385099680
[16867.472849] print_req_error: critical medium error, dev nvme0n1, sector 385099848
[16867.483245] print_req_error: critical medium error, dev nvme0n1, sector 385099848

It's unclear from a search on a few online forums whether this is a kernel issue or a hardware issue, but threads like https://www.reddit.com/r/DataHoarder/comments/bps9fu/my_nvme_ssd_started_having_read_errors/ point towards the latter. Either way, there's nothing here to indicate that this is a CockroachDB issue. Since everything else here has been fixed, I'm going to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

No branches or pull requests

3 participants