Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpcc/mixed-headroom/n5cpu16 failed [OOM during import while running 21.2] #74892

Closed
cockroach-teamcity opened this issue Jan 15, 2022 · 37 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-disaster-recovery

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jan 15, 2022

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 78419450178335b31f542bd1b14fefdf4ecee0e8:

		  |  1169.0s        0            0.0            9.4      0.0      0.0      0.0      0.0 stockLevel
		  |  1170.0s        0            0.0            9.3      0.0      0.0      0.0      0.0 delivery
		  |  1170.0s        0           42.0           95.1  31138.5 103079.2 103079.2 103079.2 newOrder
		  |  1170.0s        0            6.0            9.4  21474.8 103079.2 103079.2 103079.2 orderStatus
		  |  1170.0s        0           48.1           93.7  30064.8 103079.2 103079.2 103079.2 payment
		  |  1170.0s        0            6.0            9.4     83.9 103079.2 103079.2 103079.2 stockLevel
		  |  1171.0s        0            5.0            9.3  64424.5 103079.2 103079.2 103079.2 delivery
		  |  1171.0s        0           44.9           95.1  36507.2 103079.2 103079.2 103079.2 newOrder
		  |  1171.0s        0            4.0            9.4  12884.9 103079.2 103079.2 103079.2 orderStatus
		  |  1171.0s        0           45.9           93.7  36507.2 103079.2 103079.2 103079.2 payment
		  |  1171.0s        0            5.0            9.3   6174.0  85899.3  85899.3  85899.3 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:6498
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-12308

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jan 15, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 5ad21e3896ee809e9c3ebc28bb22166f1275acca:

		  |   882.0s        0            0.0            9.0      0.0      0.0      0.0      0.0 stockLevel
		  |   883.0s        0            2.0            9.1  36507.2  38654.7  38654.7  38654.7 delivery
		  |   883.0s        0           34.0           92.3  66572.0 103079.2 103079.2 103079.2 newOrder
		  |   883.0s        0            3.0            9.1  38654.7 103079.2 103079.2 103079.2 orderStatus
		  |   883.0s        0           32.0           90.6  42949.7 103079.2 103079.2 103079.2 payment
		  |   883.0s        0            0.0            9.0      0.0      0.0      0.0      0.0 stockLevel
		  |   884.0s        0            5.0            9.1 103079.2 103079.2 103079.2 103079.2 delivery
		  |   884.0s        0           38.0           92.2  81604.4 103079.2 103079.2 103079.2 newOrder
		  |   884.0s        0            5.0            9.1  36507.2 103079.2 103079.2 103079.2 orderStatus
		  |   884.0s        0           49.0           90.6  42949.7 103079.2 103079.2 103079.2 payment
		  |   884.0s        0            5.0            9.0 103079.2 103079.2 103079.2 103079.2 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:6498
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 4b41789120e019ab015e6dbb924df763897ebadb:

		  |   960.0s        0            3.0           10.9  90194.3 103079.2 103079.2 103079.2 delivery
		  |   960.0s        0           34.0          110.2  73014.4 103079.2 103079.2 103079.2 newOrder
		  |   960.0s        0            2.0           10.9  45097.2  45097.2  45097.2  45097.2 orderStatus
		  |   960.0s        0           35.0          108.0  73014.4 103079.2 103079.2 103079.2 payment
		  |   960.0s        0            3.0           10.9   4831.8  90194.3  90194.3  90194.3 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |   961.0s        0            1.0           10.9 103079.2 103079.2 103079.2 103079.2 delivery
		  |   961.0s        0           37.0          110.1  90194.3 103079.2 103079.2 103079.2 newOrder
		  |   961.0s        0            5.0           10.9  25769.8 103079.2 103079.2 103079.2 orderStatus
		  |   961.0s        0           40.0          107.9  81604.4 103079.2 103079.2 103079.2 payment
		  |   961.0s        0            1.0           10.9  40802.2  40802.2  40802.2  40802.2 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:6498
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 912964e02ddd951c77d4f71981ae18b3894e9084:

		  |  1253.0s        0            4.0            8.8  98784.2 103079.2 103079.2 103079.2 stockLevel
		  |  1254.0s        0            7.0            8.7  47244.6 103079.2 103079.2 103079.2 delivery
		  |  1254.0s        0           36.0           89.5  77309.4 103079.2 103079.2 103079.2 newOrder
		  |  1254.0s        0            3.0            8.9   2684.4  64424.5  64424.5  64424.5 orderStatus
		  |  1254.0s        0           31.0           88.6  28991.0 103079.2 103079.2 103079.2 payment
		  |  1254.0s        0            4.0            8.8   1140.9 103079.2 103079.2 103079.2 stockLevel
		  |  1255.0s        0            5.0            8.7 103079.2 103079.2 103079.2 103079.2 delivery
		  |  1255.0s        0           50.9           89.5  47244.6 103079.2 103079.2 103079.2 newOrder
		  |  1255.0s        0            4.0            8.9  77309.4 103079.2 103079.2 103079.2 orderStatus
		  |  1255.0s        0           40.0           88.6  49392.1 103079.2 103079.2 103079.2 payment
		  |  1255.0s        0            3.0            8.8  45097.2 103079.2 103079.2 103079.2 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:6498
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@tbg
Copy link
Member

tbg commented Jan 20, 2022

Error: error in newOrder: ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh due to a conflict: intent on key /Table/166/1/669/0): "sql txn" meta={id=e741a752 key=/Table/168/1/669/8/0 pri=0.07912940 epo=17 ts=1642609332.165110310,2 min=1642609036.453450943,0 seq=20} lock=true stat=PENDING rts=1642609326.485867639,0 wto=false gul=1642609036.953450943,0 (SQLSTATE 40001)

It looks like a transaction retry error is somehow bubbling up to here:

if _, err := tx.run(context.Background(), warehouseID); err != nil {
w.counters[txInfo.name].error.Inc()
return errors.Wrapf(err, "error in %s", txInfo.name)
}

@tbg
Copy link
Member

tbg commented Jan 20, 2022

image

The "last good" run before the failing streak is https://teamcity.cockroachdb.com/viewLog.html?buildId=4115910 ( d6b99e9) and the first failure in the streak 7841945.

$ git log  --no-merges 78419450178335b31f542bd1b14fefdf4ecee0e8 --not d6b99e92bf55b6f4a0d79800d67924e04d0b2a6d --oneline
ca66a18fa4 execinfrapb: remove ScanVisibility
b37e13d74f sql: clean up unnamed struct in scanColumnsConfig
00912544a5 sql: remove privilege checks at scanNode init time
9dc76f064a sql: remove index flags logic from scanNode
0845c8a2cb sql: simplify scanColumnsConfig
5ac83d9070 sql: add regression tests inserting decimals in scientific notation
48f2808616 sql: don't check column visibility when initializing scanNode
1770c214f9 sql: remove unused scanColumnsConfig field
3afbdb0f50 sql: implement ON CONFLICT ON CONSTRAINT
2490224168 colexechash: combine two conditionals into one in distinct mode
6998af348e colexechash: remove some dead code
0bb31ff1dc colexectestutils: increase test coverage by randomizing batch length
bb2fc51a42 colexechash: cleanup the previous commit
13b4e48afe colexechash: fix an internal error with distinct mode
74b6e343ac tree,parser: add support for ON CONFLICT ON CONSTRAINT
b3877b8775 cdc: Allow webhook sink to provide client certificates to the remote webhook server
afb8dbe096 streampb: delete `stream.pb.go`
5c3e798c08 bazel: upgrade `rules_go` to pull in new changes
785af465ac sql,server: add VIEWACTIVITYREDACTED role
9653dd13ce build: add <release branch> to nightly and latest tag values
6664d0c34d kv: circuit-break requests to unavailable replicas
ad59351e4b echotest: add testing helper
055a55f52c authors: add natelong to authors
19d12a63e7 roachtest: update 22.1 version map to v21.2.4
7577c4e6df cloud: bump orchestrator to v21.2.4

Starting 3x b3877b8 here: https://teamcity.cockroachdb.com/viewLog.html?buildId=4163457&buildTypeId=Cockroach_Nightlies_RoachtestStress&tab=buildResultsDiv&branch_Cockroach_Nightlies=%3Cdefault%3E

If this passes, then it's likely a SQL/colexec change that's to blame for this change of behavior.

cc @yuzefovich in case you have an immediate idea what could have changed in the propagation of txn retry errors.

@tbg tbg changed the title roachtest: tpcc/mixed-headroom/n5cpu16 failed roachtest: tpcc/mixed-headroom/n5cpu16 failed [retry err bubbles up] Jan 20, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ da01e4c0545f191a0573e1d097ff0366769e0d6b:

		  |  1376.0s        0            3.0            7.4 103079.2 103079.2 103079.2 103079.2 stockLevel
		  |  1377.0s        0            5.0            7.4 103079.2 103079.2 103079.2 103079.2 delivery
		  |  1377.0s        0           26.0           75.7 103079.2 103079.2 103079.2 103079.2 newOrder
		  |  1377.0s        0            1.0            7.5  42949.7  42949.7  42949.7  42949.7 orderStatus
		  |  1377.0s        0           18.0           74.4 103079.2 103079.2 103079.2 103079.2 payment
		  |  1377.0s        0            6.0            7.4 103079.2 103079.2 103079.2 103079.2 stockLevel
		  |  1378.0s        0            9.0            7.4 103079.2 103079.2 103079.2 103079.2 delivery
		  |  1378.0s        0           19.0           75.7  81604.4 103079.2 103079.2 103079.2 newOrder
		  |  1378.0s        0            2.0            7.5    159.4  45097.2  45097.2  45097.2 orderStatus
		  |  1378.0s        0           25.9           74.4 103079.2 103079.2 103079.2 103079.2 payment
		  |  1378.0s        0            6.0            7.4 103079.2 103079.2 103079.2 103079.2 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:6498
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@yuzefovich
Copy link
Member

I think it's most likely because of the streamer work (#68430) where we now use leaf txns to issue concurrent requests for index joins in some cases. Notably, I haven't yet implemented the transparent refresh mechanism there, so it's expected that the number of retryable errors increases because of that PR. I guess if we do SET CLUSTER SETTING sql.distsql.use_streamer.enabled = false;, then these failures will go away.

@tbg
Copy link
Member

tbg commented Jan 20, 2022

Would you mind making that change? I think the streamer needs to be off by default if it can't properly propagate refresh errors. We're going to catch this in most workloads.

@yuzefovich
Copy link
Member

Just to make sure I understand things correctly: generally speaking, propagating a txn retryable error to the client is acceptable because the app must have some kind of retry loop; however, in most of our roachtests we don't tolerate the retryable errors and treat them as a failure of the test. Does this sound right?

@tbg
Copy link
Member

tbg commented Jan 20, 2022

The workload here handles retry errors (unless I'm misreading something about where the error occurs). I think what is happening here is that a retry error bubbles up as a regular error, i.e. it can't have had the proper type. Or at least that's what I think we're seeing? The error is returned from this method:

func (n *newOrder) run(ctx context.Context, wID int) (interface{}, error) {
atomic.AddUint64(&n.config.auditor.newOrderTransactions, 1)
rng := rand.New(rand.NewSource(uint64(timeutil.Now().UnixNano())))
d := newOrderData{
wID: wID,
dID: int(randInt(rng, 1, 10)),
cID: n.config.randCustomerID(rng),
oOlCnt: int(randInt(rng, 5, 15)),
}
d.items = make([]orderItem, d.oOlCnt)
n.config.auditor.Lock()
n.config.auditor.orderLinesFreq[d.oOlCnt]++
n.config.auditor.Unlock()
atomic.AddUint64(&n.config.auditor.totalOrderLines, uint64(d.oOlCnt))
// itemIDs tracks the item ids in the order so that we can prevent adding
// multiple items with the same ID. This would not make sense because each
// orderItem already tracks a quantity that can be larger than 1.
itemIDs := make(map[int]struct{})
// 2.4.1.4: A fixed 1% of the New-Order transactions are chosen at random to
// simulate user data entry errors and exercise the performance of rolling
// back update transactions.
rollback := rng.Intn(100) == 0
// allLocal tracks whether any of the items were from a remote warehouse.
allLocal := 1
for i := 0; i < d.oOlCnt; i++ {
item := orderItem{
olNumber: i + 1,
// 2.4.1.5.3: order has a quantity [1..10]
olQuantity: rng.Intn(10) + 1,
}
// 2.4.1.5.1 an order item has a random item number, unless rollback is true
// and it's the last item in the items list.
if rollback && i == d.oOlCnt-1 {
item.olIID = -1
} else {
// Loop until we find a unique item ID.
for {
item.olIID = n.config.randItemID(rng)
if _, ok := itemIDs[item.olIID]; !ok {
itemIDs[item.olIID] = struct{}{}
break
}
}
}
// 2.4.1.5.2: 1% of the time, an item is supplied from a remote warehouse.
// If we're in localWarehouses mode, keep all items local.
if n.config.localWarehouses {
item.remoteWarehouse = false
} else {
item.remoteWarehouse = rng.Intn(100) == 0
}
item.olSupplyWID = wID
if item.remoteWarehouse && n.config.activeWarehouses > 1 {
allLocal = 0
// To avoid picking the local warehouse again, randomly choose among n-1
// warehouses and swap in the nth if necessary.
item.olSupplyWID = n.config.wPart.randActive(rng)
for item.olSupplyWID == wID {
item.olSupplyWID = n.config.wPart.randActive(rng)
}
n.config.auditor.Lock()
n.config.auditor.orderLineRemoteWarehouseFreq[item.olSupplyWID]++
n.config.auditor.Unlock()
} else {
item.olSupplyWID = wID
}
d.items[i] = item
}
// Sort the items in the same order that we will require from batch select queries.
sort.Slice(d.items, func(i, j int) bool {
return d.items[i].olIID < d.items[j].olIID
})
d.oEntryD = timeutil.Now()
err := crdbpgx.ExecuteTx(
ctx, n.mcp.Get(), n.config.txOpts,
func(tx pgx.Tx) error {
// Select the district tax rate and next available order number, bumping it.
var dNextOID int
if err := n.updateDistrict.QueryRowTx(
ctx, tx, d.wID, d.dID,
).Scan(&d.dTax, &dNextOID); err != nil {
return err
}
d.oID = dNextOID - 1
// Select the warehouse tax rate.
if err := n.selectWarehouseTax.QueryRowTx(
ctx, tx, wID,
).Scan(&d.wTax); err != nil {
return err
}
// Select the customer's discount, last name and credit.
if err := n.selectCustomerInfo.QueryRowTx(
ctx, tx, d.wID, d.dID, d.cID,
).Scan(&d.cDiscount, &d.cLast, &d.cCredit); err != nil {
return err
}
// 2.4.2.2: For each o_ol_cnt item in the order, query the relevant item
// row, update the stock row to account for the order, and insert a new
// line into the order_line table to reflect the item on the order.
itemIDs := make([]string, d.oOlCnt)
for i, item := range d.items {
itemIDs[i] = fmt.Sprint(item.olIID)
}
rows, err := tx.Query(
ctx,
fmt.Sprintf(`
SELECT i_price, i_name, i_data
FROM item
WHERE i_id IN (%[1]s)
ORDER BY i_id`,
strings.Join(itemIDs, ", "),
),
)
if err != nil {
return err
}
iDatas := make([]string, d.oOlCnt)
for i := range d.items {
item := &d.items[i]
iData := &iDatas[i]
if !rows.Next() {
if err := rows.Err(); err != nil {
return err
}
if rollback {
// 2.4.2.3: roll back when we're expecting a rollback due to
// simulated user error (invalid item id) and we actually
// can't find the item. The spec requires us to actually go
// to the database for this, even though we know earlier
// that the item has an invalid number.
atomic.AddUint64(&n.config.auditor.newOrderRollbacks, 1)
return errSimulated
}
return errors.New("missing item row")
}
err = rows.Scan(&item.iPrice, &item.iName, iData)
if err != nil {
rows.Close()
return err
}
}
if rows.Next() {
return errors.New("extra item row")
}
if err := rows.Err(); err != nil {
return err
}
rows.Close()
stockIDs := make([]string, d.oOlCnt)
for i, item := range d.items {
stockIDs[i] = fmt.Sprintf("(%d, %d)", item.olIID, item.olSupplyWID)
}
rows, err = tx.Query(
ctx,
fmt.Sprintf(`
SELECT s_quantity, s_ytd, s_order_cnt, s_remote_cnt, s_data, s_dist_%02[1]d
FROM stock
WHERE (s_i_id, s_w_id) IN (%[2]s)
ORDER BY s_i_id`,
d.dID, strings.Join(stockIDs, ", "),
),
)
if err != nil {
return err
}
distInfos := make([]string, d.oOlCnt)
sQuantityUpdateCases := make([]string, d.oOlCnt)
sYtdUpdateCases := make([]string, d.oOlCnt)
sOrderCntUpdateCases := make([]string, d.oOlCnt)
sRemoteCntUpdateCases := make([]string, d.oOlCnt)
for i := range d.items {
item := &d.items[i]
if !rows.Next() {
if err := rows.Err(); err != nil {
return err
}
return errors.New("missing stock row")
}
var sQuantity, sYtd, sOrderCnt, sRemoteCnt int
var sData string
err = rows.Scan(&sQuantity, &sYtd, &sOrderCnt, &sRemoteCnt, &sData, &distInfos[i])
if err != nil {
rows.Close()
return err
}
if strings.Contains(sData, originalString) && strings.Contains(iDatas[i], originalString) {
item.brandGeneric = "B"
} else {
item.brandGeneric = "G"
}
newSQuantity := sQuantity - item.olQuantity
if sQuantity < item.olQuantity+10 {
newSQuantity += 91
}
newSRemoteCnt := sRemoteCnt
if item.remoteWarehouse {
newSRemoteCnt++
}
sQuantityUpdateCases[i] = fmt.Sprintf("WHEN %s THEN %d", stockIDs[i], newSQuantity)
sYtdUpdateCases[i] = fmt.Sprintf("WHEN %s THEN %d", stockIDs[i], sYtd+item.olQuantity)
sOrderCntUpdateCases[i] = fmt.Sprintf("WHEN %s THEN %d", stockIDs[i], sOrderCnt+1)
sRemoteCntUpdateCases[i] = fmt.Sprintf("WHEN %s THEN %d", stockIDs[i], newSRemoteCnt)
}
if rows.Next() {
return errors.New("extra stock row")
}
if err := rows.Err(); err != nil {
return err
}
rows.Close()
// Insert row into the orders and new orders table.
if _, err := n.insertOrder.ExecTx(
ctx, tx,
d.oID, d.dID, d.wID, d.cID, d.oEntryD.Format("2006-01-02 15:04:05"), d.oOlCnt, allLocal,
); err != nil {
return err
}
if _, err := n.insertNewOrder.ExecTx(
ctx, tx, d.oID, d.dID, d.wID,
); err != nil {
return err
}
// Update the stock table for each item.
if _, err := tx.Exec(
ctx,
fmt.Sprintf(`
UPDATE stock
SET
s_quantity = CASE (s_i_id, s_w_id) %[1]s ELSE crdb_internal.force_error('', 'unknown case') END,
s_ytd = CASE (s_i_id, s_w_id) %[2]s END,
s_order_cnt = CASE (s_i_id, s_w_id) %[3]s END,
s_remote_cnt = CASE (s_i_id, s_w_id) %[4]s END
WHERE (s_i_id, s_w_id) IN (%[5]s)`,
strings.Join(sQuantityUpdateCases, " "),
strings.Join(sYtdUpdateCases, " "),
strings.Join(sOrderCntUpdateCases, " "),
strings.Join(sRemoteCntUpdateCases, " "),
strings.Join(stockIDs, ", "),
),
); err != nil {
return err
}
// Insert a new order line for each item in the order.
olValsStrings := make([]string, d.oOlCnt)
for i := range d.items {
item := &d.items[i]
item.olAmount = float64(item.olQuantity) * item.iPrice
d.totalAmount += item.olAmount
olValsStrings[i] = fmt.Sprintf("(%d,%d,%d,%d,%d,%d,%d,%f,'%s')",
d.oID, // ol_o_id
d.dID, // ol_d_id
d.wID, // ol_w_id
item.olNumber, // ol_number
item.olIID, // ol_i_id
item.olSupplyWID, // ol_supply_w_id
item.olQuantity, // ol_quantity
item.olAmount, // ol_amount
distInfos[i], // ol_dist_info
)
}
if _, err := tx.Exec(
ctx,
fmt.Sprintf(`
INSERT INTO order_line(ol_o_id, ol_d_id, ol_w_id, ol_number, ol_i_id, ol_supply_w_id, ol_quantity, ol_amount, ol_dist_info)
VALUES %s`,
strings.Join(olValsStrings, ", "),
),
); err != nil {
return err
}
// 2.4.2.2: total_amount = sum(OL_AMOUNT) * (1 - C_DISCOUNT) * (1 + W_TAX + D_TAX)
d.totalAmount *= (1 - d.cDiscount) * (1 + d.wTax + d.dTax)
return nil
})
if errors.Is(err, errSimulated) {
return d, nil
}
return d, err
}

You can see by inspection that this implies that an error is returned from this block:

err := crdbpgx.ExecuteTx(

and that will certainly do proper retries?

So my reading was that something in code is doing some (probably less obviously wrong version of)

err := something() // retry err
err = errors.Errorf("oops messing it up %s", err)
return err

@yuzefovich
Copy link
Member

Hm, I'm confused. The Streamer doesn't do anything with the errors other than calling GoError:

w.s.setError(err.GoError())

No wrapping / error modification is done on the newly-introduced TxnKVStreamer either.


Trying to deconstruct the error message:

error in newOrder: ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn

error in newOrder comes from

return errors.Wrapf(err, "error in %s", txInfo.name)

then ERROR is likely because of pgerror.DefaultSeverity being set in
Severity: GetSeverity(err),

then restart transaction is
resErr.Message = TxnRetryMsgPrefix + ": " + resErr.Message

then TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn probably is
retErr := roachpb.NewTransactionRetryWithProtoRefreshError(

Then because TransactionRetryWithProtoRefreshError implements pgerror.ClientVisibleRetryError, the error should have 40001 code which is then used to determine that the error is indeed retryable:
https://github.com/cockroachdb/cockroach-go/blob/7a4e30224f1a484982a53f29cd65eebba4d40b92/crdb/tx.go#L192

@tbg
Copy link
Member

tbg commented Jan 20, 2022

It does say "(SQLSTATE 40001)" in the error from newOrder above. I think this really really means SQL "did everything right"? Flummoxed by what is going wrong here then.

@yuzefovich
Copy link
Member

Yeah, that's what puzzles me too.

@yuzefovich
Copy link
Member

I'll kick off this roachtest with the streamer disabled on #75257.

@tbg
Copy link
Member

tbg commented Jan 20, 2022

If we're looking for crackpot theories, could it be that we're getting the retry error on a BEGIN?

https://github.dev/cockroachdb/cockroach-go/blob/7a4e30224f1a484982a53f29cd65eebba4d40b92/crdb/tx.go#L158

@yuzefovich
Copy link
Member

Lol I hope not.

@yuzefovich
Copy link
Member

Hm, all 5 builds failed. I think I kicked them off in a correct way (from https://github.com/cockroachdb/cockroach/tree/disable-streamer branch), so maybe it's not the streamer work after all to blame.

@tbg
Copy link
Member

tbg commented Jan 21, 2022

That looks correct. Ugh, another bisection.

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 58ceac139a7e83052171121b28026a7366f16f7e:

		  |  1024.0s        0            7.0            9.5  85899.3 103079.2 103079.2 103079.2 delivery
		  |  1024.0s        0           31.0           96.0 103079.2 103079.2 103079.2 103079.2 newOrder
		  |  1024.0s        0            6.0            9.4  85899.3 103079.2 103079.2 103079.2 orderStatus
		  |  1024.0s        0           36.0           93.9  94489.3 103079.2 103079.2 103079.2 payment
		  |  1024.0s        0            6.0            9.4  66572.0 103079.2 103079.2 103079.2 stockLevel
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |  1025.0s        0            4.0            9.5  11274.3 103079.2 103079.2 103079.2 delivery
		  |  1025.0s        0           33.0           96.0 103079.2 103079.2 103079.2 103079.2 newOrder
		  |  1025.0s        0            3.0            9.4 103079.2 103079.2 103079.2 103079.2 orderStatus
		  |  1025.0s        0           36.0           93.8 103079.2 103079.2 103079.2 103079.2 payment
		  |  1025.0s        0            4.0            9.4  38654.7 103079.2 103079.2 103079.2 stockLevel
		Wraps: (8) COMMAND_PROBLEM
		Wraps: (9) Node 5. Command with error:
		  | ``````
		  | ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json  --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333  {pgurl:1-4}
		  | ``````
		Wraps: (10) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:6498
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@tbg
Copy link
Member

tbg commented Jan 21, 2022

FWIW it failed on b3877b8, to my surprise.

b3877b8775 cdc: Allow webhook sink to provide client certificates to the remote webhook server <-- bad
afb8dbe096 streampb: delete `stream.pb.go`
5c3e798c08 bazel: upgrade `rules_go` to pull in new changes
785af465ac sql,server: add VIEWACTIVITYREDACTED role
9653dd13ce build: add <release branch> to nightly and latest tag values
6664d0c34d kv: circuit-break requests to unavailable replicas
ad59351e4b echotest: add testing helper
055a55f52c authors: add natelong to authors
19d12a63e7 roachtest: update 22.1 version map to v21.2.4
7577c4e6df cloud: bump orchestrator to v21.2.4
<-- "good" (probably)

@tbg
Copy link
Member

tbg commented Jan 21, 2022

None of this makes sense, going to try 7577c4e (build)

@blathers-crl
Copy link

blathers-crl bot commented Jan 27, 2022

cc @cockroachdb/bulk-io

@rafiss rafiss added branch-release-21.2 and removed T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Jan 27, 2022
@Azhng Azhng removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Feb 2, 2022
@ajwerner
Copy link
Contributor

ajwerner commented Feb 8, 2022

Is there any chance this is related to #76230. I don't see an oom there, but I don't see much of anything there.

@cockroach-teamcity
Copy link
Member Author

roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on release-21.2 @ 31f167ca5bbe404abcb215f80524770ddc8c0163:

		  | I220514 14:21:42.147031 1 workload/tpcc/tpcc.go:509  [-] 1  check 3.3.2.1 took 257.678751ms
		  | I220514 14:21:54.673612 1 workload/tpcc/tpcc.go:509  [-] 2  check 3.3.2.2 took 12.526485234s
		  | I220514 14:21:57.515815 1 workload/tpcc/tpcc.go:509  [-] 3  check 3.3.2.3 took 2.842140408s
		  | I220514 14:25:35.024080 1 workload/tpcc/tpcc.go:509  [-] 4  check 3.3.2.4 took 3m37.508110259s
		  | I220514 14:25:42.163398 1 workload/tpcc/tpcc.go:509  [-] 5  check 3.3.2.5 took 7.138712008s
		  | Error: check failed: 3.3.2.5: pq: inbox communication error: rpc error: code = Canceled desc = context canceled
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 5. Command with error:
		  |   | ``````
		  |   | ./cockroach workload check tpcc --warehouses=909 {pgurl:1}
		  |   | ``````
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (4) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *exec.ExitError

	mixed_version_jobs.go:73,versionupgrade.go:207,tpcc.go:444,test_runner.go:777: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:207
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:444
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:777
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:172
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:81
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:6498
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Reproduce

See: roachtest README

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

@adityamaru
Copy link
Contributor

This is a very old issue on a branch that is EOL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-disaster-recovery
Projects
No open projects
Archived in project
Development

No branches or pull requests

8 participants