Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #105970

Closed
cockroach-teamcity opened this issue Jul 2, 2023 · 4 comments
Closed

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #105970

cockroach-teamcity opened this issue Jul 2, 2023 · 4 comments
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-testeng TestEng Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jul 2, 2023

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ df82b5306e945d396269f28874d98fa567f4ac32:

(test_runner.go:1075).runTest: test timed out (7h0m0s)
(monitor.go:137).Wait: monitor failure: monitor task failed: COMMAND_PROBLEM: exit status 137
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-29296

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 2, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.2 milestone Jul 2, 2023
@blathers-crl blathers-crl bot added the T-testeng TestEng Team label Jul 2, 2023
@cockroach-teamcity
Copy link
Member Author

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ aacba20d325e5702836e9a76be646b5f1bd922af:

(test_runner.go:1075).runTest: test timed out (7h0m0s)
(monitor.go:137).Wait: monitor failure: monitor task failed: COMMAND_PROBLEM: exit status 137
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@tbg
Copy link
Member

tbg commented Jul 5, 2023

This doesn't look like #105797 to me. Here, the breaker stays tripped,

E230704 01:45:04.004186 1104750 kv/kvserver/replica_circuit_breaker.go:175 ⋮ [T1,n9,s9,r1017/4:‹/Table/108/2/17{31/10…-64/6/…}›] 12185  breaker: tripped with error: replica unavailable: (n9,s9):4VOTER_DEMOTING_LEARNER unable to serve request to r1017:‹/Table/108/2/17{31/10/"CALLYEINGOUGHT"/"KHOCNGwQM3geus"/792-64/6/"PRESCALLYOUGHT"/"3geusdC49w1xBXoJ"/1839}› [(n6,s6):6, (n1,s1):5, (n9,s9):4VOTER_DEMOTING_LEARNER, (n4,s4):8VOTER_INCOMING, next=9, gen=65, sticky=1688413724.720700769,0]: closed timestamp: 1688411380.133533729,0 (2023-07-03 19:09:40); raft status: {"id":"4","term":8,"vote":"4","commit":71,"lead":"0","raftState":"StatePreCandidate","applied":71,"progress":{},"leadtransferee":"0"}: operation ‹"probe"› timed out after 1m0s (given timeout 1m0s): result is ambiguous: after 60.00s of attempting command: context deadline exceeded

Note the StatePreCandidate. Looks more like real loss of quorum - we are in a joint configuration, n9 is demoting and n4 is incoming. so the conf is (n6 n1 n9) && (n6 n1 n4). Part of this could be explained by #104567 but the cluster also looks overloaded:

W230704 01:43:36.353939 3833 kv/kvserver/store_raft.go:318 ⋮ [T1,n6] 48261  raft receive queue for r438 is full
W230704 01:43:37.373878 887721 util/quotapool/config.go:75 ⋮ [T1,n6,s6,r4/6:‹/System{/tsd-tse}›] 48262  have been waiting 15.000445236s attempting to acquire raft proposal quota

I would have suspected #104861, but that PR just merged a few hours ago, and post-dates this test failure.

I think more investigation would be needed to determine what went wrong in this issue.

@cockroach-teamcity
Copy link
Member Author

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ dbe8511fae8fca21562fdde5c240b1f7d06ef582:

(test_runner.go:1075).runTest: test timed out (7h0m0s)
(monitor.go:137).Wait: monitor failure: monitor task failed: COMMAND_PROBLEM: exit status 137
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@srosenberg
Copy link
Member

Sadly artifacts are already gone, so we can't triage the old failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-testeng TestEng Team
Projects
None yet
Development

No branches or pull requests

3 participants