Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: restore2TB/nodes=32 failed [raft recv oom, reproducible] #80155

Closed
cockroach-teamcity opened this issue Apr 19, 2022 · 27 comments
Closed
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data)

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Apr 19, 2022

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 1fdbb16fa206c5fd77c3aa111ec40916edfb55df:

		  | 	main/pkg/cmd/roachtest/monitor.go:105
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_090427.112984326_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | ./cockroach sql --insecure -e "
		  |   | 				RESTORE csv.bank FROM
		  |   | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  |   | 				WITH into_db = 'restore2tb'"
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,restore.go:453,test_runner.go:875: monitor failure: monitor task failed: dial tcp 35.229.83.139:26257: connect: connection refused
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor task failed
		Wraps: (5) dial tcp 35.229.83.139:26257
		Wraps: (6) connect
		Wraps: (7) connection refused
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *net.OpError (6) *os.SyscallError (7) syscall.Errno
Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/bulk-io

This test on roachdash | Improve this report!

Jira issue: CRDB-15827

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Apr 19, 2022
@dt
Copy link
Member

dt commented Apr 19, 2022

n10 OOMKilled, 09:09:30.

15s prior, at 09_09_15, https://share.polarsignals.com/e12c582/

5s prior, at 09_09_25: https://share.polarsignals.com/926274a/

@blathers-crl blathers-crl bot added the T-kv KV Team label Apr 19, 2022
@dt
Copy link
Member

dt commented Apr 19, 2022

@tbg this looks like another one?

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 7ea67fd5d4966508877fc88ba21bdd498ff162e8:

		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:439
		  | main.(*monitorImpl).Go.func1
		  | 	main/pkg/cmd/roachtest/monitor.go:105
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_095430.114132621_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | ./cockroach sql --insecure -e "
		  |   | 				RESTORE csv.bank FROM
		  |   | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  |   | 				WITH into_db = 'restore2tb'"
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,restore.go:453,test_runner.go:876: monitor failure: monitor command failure: unexpected node event: 3: dead (exit status 137)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func3
		  | 	main/pkg/cmd/roachtest/monitor.go:202
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor command failure
		Wraps: (5) unexpected node event: 3: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 8fe42c18e5e2a82494fa6a34e068f4ab801f87c4:

		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:439
		  | main.(*monitorImpl).Go.func1
		  | 	main/pkg/cmd/roachtest/monitor.go:105
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_085802.987259187_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | ./cockroach sql --insecure -e "
		  |   | 				RESTORE csv.bank FROM
		  |   | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  |   | 				WITH into_db = 'restore2tb'"
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,restore.go:453,test_runner.go:876: monitor failure: monitor command failure: unexpected node event: 29: dead (exit status 137)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func3
		  | 	main/pkg/cmd/roachtest/monitor.go:202
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor command failure
		Wraps: (5) unexpected node event: 29: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@nvanbenschoten
Copy link
Member

@tbg @dt how are we treating these OOMs during restore? GA blockers? Is this being tracked in a more central location? I that #78408?

@nvanbenschoten
Copy link
Member

Confirming that the last failure hit the same issue. The memprofiles look the same.

@lidorcarmel
Copy link
Contributor

@dt for posterity - how did you find that node 10 OOMed? I saw other crashes where it was obvious in the issue, but here it's not really clear. I guess greping would do the work but I'm wondering whether you have a better trick.

@dt
Copy link
Member

dt commented Apr 25, 2022

this one is less clear from the issue poster; usually the monitor says which node died but here it just has the ip that couldn't connect, but that just added a hop: I opened the roachprod-state json file in the test's artifacts dir and searched for that ip, which revealed it was n10.

From there it was back to the artifacts and check the end of the dmesg file for n10 to see if there was an ommkiller message; in this case there was. If there's not, I go over to the unredacted logs for n10 and scroll to the bottom looking for a stacktrace, then up to the top of that to see who crashed.

@tbg tbg changed the title roachtest: restore2TB/nodes=32 failed roachtest: restore2TB/nodes=32 failed [raft recv oom, reproducible] Apr 26, 2022
@tbg tbg self-assigned this Apr 26, 2022
@nvanbenschoten
Copy link
Member

I ran some tests with different values for kv.bulk_io_write.restore_node_concurrency, which was recently increased from 1 to min(numcpus - 1, 4). This test is using 4 vCPUs, so it was running with 3 restore workers per nodes.

I found that with 3 workers, the failure rate was right around 10%, with 5/50 iterations hitting an OOM.

With 2 workers (min(numcpus - 2, 4)), the failure rate was also right around 10%. However, I've only run 10 iterations so far and seen 1 failure, so that failure rate is imprecise. I'm running more iterations now.

With 1 worker (min(numcpus - 3, 4)), the failure rate appears to be 0%. I've had 0 of 30 iterations without failure.

@lunevalex lunevalex removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. blocks-22.1.0-rc.1 labels Apr 26, 2022
@tbg
Copy link
Member

tbg commented Apr 27, 2022

Additional relevant PR: #80494
Adds a server-wide AddSSTableRequest-send semaphore (default off).

So we have one more escape hatch should we see this kind of OOM in prod.

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 72e485705aa859ef850ebc67568bc29fe9b7cfd6:

		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:439
		  | main.(*monitorImpl).Go.func1
		  | 	main/pkg/cmd/roachtest/monitor.go:105
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_091546.757677286_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | ./cockroach sql --insecure -e "
		  |   | 				RESTORE csv.bank FROM
		  |   | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  |   | 				WITH into_db = 'restore2tb'"
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,restore.go:453,test_runner.go:876: monitor failure: monitor command failure: unexpected node event: 23: dead (exit status 137)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func3
		  | 	main/pkg/cmd/roachtest/monitor.go:202
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor command failure
		Wraps: (5) unexpected node event: 23: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ bc1ee7c7c276984fce8ff5ba4fcfcdff335dde50:

		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:439
		  | main.(*monitorImpl).Go.func1
		  | 	main/pkg/cmd/roachtest/monitor.go:105
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_091830.232434909_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | ./cockroach sql --insecure -e "
		  |   | 				RESTORE csv.bank FROM
		  |   | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  |   | 				WITH into_db = 'restore2tb'"
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,restore.go:453,test_runner.go:876: monitor failure: monitor command failure: unexpected node event: 24: dead (exit status 137)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func3
		  | 	main/pkg/cmd/roachtest/monitor.go:202
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor command failure
		Wraps: (5) unexpected node event: 24: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 7dd1c51f6b5802e32bafd82e46747f349836592f:

		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:439
		  | main.(*monitorImpl).Go.func1
		  | 	main/pkg/cmd/roachtest/monitor.go:105
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_090238.375228236_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | ./cockroach sql --insecure -e "
		  |   | 				RESTORE csv.bank FROM
		  |   | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  |   | 				WITH into_db = 'restore2tb'"
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,restore.go:453,test_runner.go:884: monitor failure: monitor command failure: unexpected node event: 18: dead (exit status 137)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func3
		  | 	main/pkg/cmd/roachtest/monitor.go:202
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor command failure
		Wraps: (5) unexpected node event: 18: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 12198a51408e7333cd4f96b221e6734239479765:

		Wraps: (2) output in run_092353.088439413_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: -e: driver: bad connection
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:884: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:884
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ e6815947a050e32f21e983aa30dc74ab2a247af3:

		Wraps: (2) output in run_091940.240825277_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: job 765937527531077633: node liveness error: restarting in background
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:884: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:884
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@tbg tbg added the S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data) label May 30, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 662cc5c3070e6d64d155a9cc9f33253ee5d99ee9:

		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:439
		  | main.(*monitorImpl).Go.func1
		  | 	main/pkg/cmd/roachtest/monitor.go:105
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:74
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (2) output in run_092402.306130394_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | UNCLASSIFIED_PROBLEM: context canceled
		  | (1) UNCLASSIFIED_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ``````
		  |   | ./cockroach sql --insecure -e "
		  |   | 				RESTORE csv.bank FROM
		  |   | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  |   | 				WITH into_db = 'restore2tb'"
		  |   | ``````
		  | Wraps: (3) context canceled
		  | Error types: (1) errors.Unclassified (2) *hintdetail.withDetail (3) *errors.errorString
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	monitor.go:127,restore.go:453,test_runner.go:884: monitor failure: monitor command failure: unexpected node event: 24: dead (exit status 137)
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func3
		  | 	main/pkg/cmd/roachtest/monitor.go:202
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (4) monitor command failure
		Wraps: (5) unexpected node event: 24: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 54b4fcaa231abd16260fb09722ed291dbd6a4900:

		Wraps: (2) output in run_094355.953217052_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: splitting key /Table/106/1/820138: change replicas of r189 failed: descriptor changed: [expected] r189:/Table/106/1/8{09556-12829} [(n18,s18):6VOTER_DEMOTING_LEARNER, (n16,s16):5, (n8,s8):4, (n28,s28):7VOTER_INCOMING, next=8, gen=30, sticky=1657363440.029636288,0] != [actual] r189:/Table/106/1/8{09556-12829} [(n28,s28):7, (n16,s16):5, (n8,s8):4VOTER_DEMOTING_LEARNER, (n11,s11):8VOTER_INCOMING, next=9, gen=34, sticky=1657363440.029636288,0]
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ e4cafeb8b1d586d091fb98e3e570650d7eeea294:

		Wraps: (2) output in run_095611.644988471_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: splitting key /Table/106/1/291311: change replicas of r134 failed: descriptor changed: [expected] r134:/Table/106/1/{147680-301713} [(n4,s4):1, (n25,s25):2, (n24,s24):3VOTER_DEMOTING_LEARNER, (n32,s32):5VOTER_INCOMING, next=6, gen=24, sticky=1658055375.615050547,0] != [actual] r134:/Table/106/1/{147680-301713} [(n4,s4):1, (n25,s25):2VOTER_DEMOTING_LEARNER, (n32,s32):5, (n2,s2):6VOTER_INCOMING, next=7, gen=28, sticky=1658055375.615050547,0]
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ eabdc49383a19d5731b765dbb0d8b45bd9e24404:

		Wraps: (2) output in run_095805.047293985_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: splitting key /Table/106/1/351278: change replicas of r449 failed: descriptor changed: [expected] r449:/Table/106/1/3{16209-73788} [(n13,s13):1, (n14,s14):2, (n16,s16):3VOTER_DEMOTING_LEARNER, (n3,s3):4VOTER_INCOMING, next=5, gen=46, sticky=1658746689.799530631,0] != [actual] r449:/Table/106/1/3{16209-73788} [(n13,s13):1, (n14,s14):2VOTER_DEMOTING_LEARNER, (n3,s3):4, (n24,s24):5VOTER_INCOMING, next=6, gen=50, sticky=1658746689.799530631,0]
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6498
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:238
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1581
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 8db26695114981eec5c12317f3dc5d770b522a3f:

		Wraps: (2) output in run_094850.561421583_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: splitting key /Table/106/1/246060: change replicas of r155 failed: descriptor changed: [expected] r155:/Table/106/1/{136253-394198} [(n28,s28):1, (n24,s24):2, (n5,s5):5VOTER_DEMOTING_LEARNER, (n11,s11):6VOTER_INCOMING, next=7, gen=36, sticky=1658832545.294384739,0] != [actual] r155:/Table/106/1/1{36253-85360} [(n3,s3):7, (n24,s24):2VOTER_DEMOTING_LEARNER, (n11,s11):6, (n9,s9):9VOTER_INCOMING, next=10, gen=49, sticky=1658832545.294384739,0]
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6222
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:233
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1571
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ f4042d47fa8062a612c38d4696eb6bee9cee7c21:

		Wraps: (2) output in run_110236.222577156_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: splitting key /Table/106/1/41378214: refusing to transfer lease to (n3,s3):4VOTER_INCOMING because target may need a Raft snapshot: replica in StateProbe
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6240
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:233
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1571
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ b173a16715e71e94115820374da1eb350b3b459d:

		Wraps: (2) output in run_102034.829909688_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: splitting key /Table/106/1/40569611: refusing to transfer lease to (n6,s6):4VOTER_INCOMING because target may need a Raft snapshot: replica in StateProbe
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6240
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:233
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1571
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 5c2c62ecf1bea60c807edc6b4da22d900ad4ae03:

		Wraps: (2) output in run_104239.477103670_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: splitting key /Table/106: refusing to transfer lease to (n23,s23):36VOTER_INCOMING because target may need a Raft snapshot: replica in StateProbe
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6240
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:233
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1571
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ cb55144cdec54d2a70f074ad64b4eca5e6c6891a:

		Wraps: (2) output in run_104556.136519580_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: splitting key /Table/106/1/49878081: refusing to transfer lease to (n25,s25):5VOTER_INCOMING because target may need a Raft snapshot: replica in StateProbe
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6240
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:233
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1571
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 80c274877a917580af62be6eb0cd48c8c7ae9c08:

		Wraps: (2) output in run_101743.427781383_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: splitting key /Table/107/1/37214174: refusing to transfer lease to (n22,s22):4VOTER_INCOMING because target may need a Raft snapshot: replica in StateProbe
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6240
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:233
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1571
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

roachtest.restore2TB/nodes=32 failed with artifacts on master @ 003c0360de8b64319b5f0f127b99be91dbdca8a3:

		Wraps: (2) output in run_102549.719080287_n1_cockroach_sql
		Wraps: (3) ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | ERROR: importing 21888 ranges: splitting key /Table/106/1/16777215/PrefixEnd: refusing to transfer lease to (n20,s20):10VOTER_INCOMING because target may need a Raft snapshot: replica in StateProbe
		  | Failed running "sql"
		  |
		  | stdout:
		Wraps: (4) COMMAND_PROBLEM
		Wraps: (5) Node 1. Command with error:
		  | ``````
		  | ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank?AUTH=implicit'
		  | 				WITH into_db = 'restore2tb'"
		  | ``````
		Wraps: (6) exit status 1
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.Cmd (5) *hintdetail.withDetail (6) *exec.ExitError

	monitor.go:127,restore.go:453,test_runner.go:896: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).WaitE
		  | 	main/pkg/cmd/roachtest/monitor.go:115
		  | main.(*monitorImpl).Wait
		  | 	main/pkg/cmd/roachtest/monitor.go:123
		  | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerRestore.func1
		  | 	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/restore.go:453
		  | main.(*testRunner).runTest.func2
		  | 	main/pkg/cmd/roachtest/test_runner.go:896
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitorImpl).wait.func2
		  | 	main/pkg/cmd/roachtest/monitor.go:171
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	main/pkg/cmd/roachtest/monitor.go:80
		  | runtime.doInit
		  | 	GOROOT/src/runtime/proc.go:6240
		  | runtime.main
		  | 	GOROOT/src/runtime/proc.go:233
		  | runtime.goexit
		  | 	GOROOT/src/runtime/asm_amd64.s:1571
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

Same failure on other branches

This test on roachdash | Improve this report!

@erikgrinaker
Copy link
Contributor

Until we can fix this properly in #71805, the test OOMs are mitigated by using highmem instances in #88444.

The later failures here, listed below, were fixed by #85405.

ERROR: importing 21888 ranges: splitting key /Table/106/1/16777215/PrefixEnd: refusing to transfer lease to (n20,s20):10VOTER_INCOMING because target may need a Raft snapshot: replica in StateProbe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. S-3 Medium-low impact: incurs increased costs for some users (incl lower avail, recoverable bad data)
Projects
None yet
Development

No branches or pull requests

9 participants