roachtest: restore2TB/nodes=10 failed #53197

cockroach-teamcity · 2020-08-21T14:08:00Z

(roachtest).restore2TB/nodes=10 failed on master@bbbedabbf6ea0b1ff6fc799a0c04a75295a9f4c2:

	cluster.go:2167,restore.go:255,cluster.go:2575,errgroup.go:57: output in run_080728.793_n1_cockroach_sql: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2206091-1597991489-21-n10cpu4:1 -- ./cockroach sql --insecure -e ": context canceled
		(1) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2245
		  | main.(*cluster).Run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2165
		  | main.registerRestore.func1.1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/restore.go:255
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2575
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (2) 2 safe details enclosed
		Wraps: (3) output in run_080728.793_n1_cockroach_sql
		Wraps: (4) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2206091-1597991489-21-n10cpu4:1 -- ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  |
		  | stdout:
		Wraps: (5) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (6) context canceled
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *secondary.withSecondaryError (6) *errors.errorString

	cluster.go:2597,restore.go:262,test_runner.go:754: monitor failure: monitor task failed: context canceled
		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2585
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2593
		  | main.registerRestore.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/restore.go:262
		  | [...repeated from below...]
		Wraps: (2) 1 safe detail enclosed
		Wraps: (3) monitor failure
		Wraps: (4) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2641
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (5) 1 safe detail enclosed
		Wraps: (6) monitor task failed
		Wraps: (7) context canceled
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *withstack.withStack (5) *safedetails.withSafeDetails (6) *errutil.withMessage (7) *errors.errorString

More

Artifacts: /restore2TB/nodes=10
Related:

roachtest: restore2TB/nodes=10 failed #51199 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-provisional_202007081918_v20.2.0-alpha.2 release-blocker
roachtest: restore2TB/nodes=10 failed #51127 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-provisional_202007071743_v20.2.0-alpha.2 release-blocker
roachtest: restore2TB/nodes=10 failed #49503 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-release-19.1 release-blocker
roachtest: restore2TB/nodes=10 failed #49480 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-release-19.2 release-blocker
roachtest: restore2TB/nodes=10 failed #49440 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-release-20.1 release-blocker

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

The text was updated successfully, but these errors were encountered:

cockroach-teamcity · 2020-08-26T13:37:31Z

(roachtest).restore2TB/nodes=10 failed on master@ef55609797c92e46eb9efd08facc9db49558291d:

		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | nd.go:914 +0x2fb fp=0xc0004bfe70 sp=0xc0004bfd98 pc=0x393474b
		  | github.com/spf13/cobra.(*Command).Execute(...)
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | github.com/cockroachdb/cockroach/pkg/cli.Run(...)
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:225
		  | github.com/cockroachdb/cockroach/pkg/cli.Main()
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cli/cli.go:67 +0x272 fp=0xc0004bff50 sp=0xc0004bfe70 pc=0x3c95692
		  | main.main()
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/cockroach/main.go:26 +0x20 fp=0xc0004bff60 sp=0xc0004bff50 pc=0x3dea690
		  | runtime.main()
		  | 	/usr/local/go/src/runtime/proc.go:203 +0x206 fp=0xc0004bffe0 sp=0xc0004bff60 pc=0x7b28f6
		  | runtime.goexit()
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc0004bffe8 sp=0xc0004bffe0 pc=0x7e1391
		  |
		  | goroutine 2 [force gc (idle), 2 minutes]:
		  | runtime.gopark(0x56284b8, 0x9371890, 0x1411, 0x1)
		  | 	/usr/local/go/src/runtime/proc.go:304 +0xe0 fp=0xc000090fb0 sp=0xc000090f90 pc=0x7b2cb0
		  | runtime.goparkunlock(...)
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	cluster.go:2597,restore.go:262,test_runner.go:754: monitor failure: monitor task failed: context canceled
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2585
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2593
		  | main.registerRestore.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/restore.go:262
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2641
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (4) monitor task failed
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString

More

Artifacts: /restore2TB/nodes=10
Related:

roachtest: restore2TB/nodes=10 failed #51199 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-provisional_202007081918_v20.2.0-alpha.2 release-blocker
roachtest: restore2TB/nodes=10 failed #51127 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-provisional_202007071743_v20.2.0-alpha.2 release-blocker
roachtest: restore2TB/nodes=10 failed #49503 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-release-19.1 release-blocker
roachtest: restore2TB/nodes=10 failed #49480 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-release-19.2 release-blocker
roachtest: restore2TB/nodes=10 failed #49440 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-release-20.1 release-blocker

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

pbardea · 2020-08-26T14:30:13Z

Looks like the first failure timed out: test_runner.go:801: test timed out (6h0m0s)|n|n, still investigating.

cockroach-teamcity · 2020-08-31T09:12:08Z

(roachtest).restore2TB/nodes=10 failed on master@cd8267ca84f6a0267a898220cfaa6630188b45c5:

		  | stderr:
		  | Flag --insecure has been deprecated, it will be removed in a subsequent release.
		  | For details, see: https://go.crdb.dev/issue/53404
		  | ERROR: importing span /Table/51/1/4153{1654/0-3324}: fetching "3-2526.sst": Get https://storage.googleapis.com/cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank/3-2526.sst: status code 500 trying to fetch http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token
		  | Failed running "sql"
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ```
		  |   | ./cockroach sql --insecure -e "
		  |   | 				RESTORE csv.bank FROM
		  |   | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank'
		  |   | 				WITH into_db = 'restore2tb'"
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (4) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *exec.ExitError

	cluster.go:2613,restore.go:262,test_runner.go:754: monitor failure: monitor task failed: t.Fatal() was called
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2601
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2609
		  | main.registerRestore.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/restore.go:262
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:754
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2657
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.init
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2571
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5228
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:190
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (6) t.Fatal() was called
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

More

Artifacts: /restore2TB/nodes=10
Related:

roachtest: restore2TB/nodes=10 failed #51199 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-provisional_202007081918_v20.2.0-alpha.2 release-blocker
roachtest: restore2TB/nodes=10 failed #51127 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-provisional_202007071743_v20.2.0-alpha.2 release-blocker
roachtest: restore2TB/nodes=10 failed #49503 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-release-19.1 release-blocker
roachtest: restore2TB/nodes=10 failed #49480 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-release-19.2 release-blocker
roachtest: restore2TB/nodes=10 failed #49440 roachtest: restore2TB/nodes=10 failed C-test-failure O-roachtest O-robot branch-release-20.1 release-blocker

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

pbardea · 2020-08-31T17:38:59Z

Most Recent Failure
The most recent failure looks like GCS wasn't happy that day:

		  | ERROR: importing span /Table/51/1/4153{1654/0-3324}: fetching "3-2526.sst": Get https://storage.googleapis.com/cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank/3-2526.sst: status code 500 trying to fetch http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/token

I guess the question here is if we should be tolerant to such failures.

2nd most recent failure
Looking at the failure before the latest it looks like the restore timed out. Based on the stacks it looks like an import request is retrying in DistSender:

goroutine 913610 [select]:
github.com/cockroachdb/cockroach/pkg/util/retry.(*Retry).Next(0xc007f3d8d0, 0xc004789b00)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:128 +0x185
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch(0xc000a9c800, 0x60c5840, 0xc013d01710, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1438 +0x23a
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges(0xc000a9c800, 0x60c5840, 0xc013d01710, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1127 +0x16b2
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send(0xc000a9c800, 0x60c5840, 0xc013d01710, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:766 +0x7a6
github.com/cockroachdb/cockroach/pkg/kv.(*CrossRangeTxnWrapperSender).Send(0xc00078e9e0, 0x60c5840, 0xc013d01710, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:219 +0x9f
github.com/cockroachdb/cockroach/pkg/kv.(*DB).sendUsingSender(0xc00078e980, 0x60c5840, 0xc013d01710, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:744 +0x13c
github.com/cockroachdb/cockroach/pkg/kv.(*DB).send(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:726
github.com/cockroachdb/cockroach/pkg/kv.sendAndFill(0x60c5840, 0xc013d01710, 0xc007f3eb20, 0xc003b9c500, 0xc002a5b810, 0xc002a5b7e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:654 +0x104
github.com/cockroachdb/cockroach/pkg/kv.(*DB).Run(0xc00078e980, 0x60c5840, 0xc013d01710, 0xc003b9c500, 0xc00ac4eb00, 0xc02e4c8000)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:677 +0x9c
github.com/cockroachdb/cockroach/pkg/kv.(*DB).AddSSTable(0xc00078e980, 0x60c5840, 0xc013d01710, 0x4d30f80, 0xc00ac4eae0, 0x4d30f80, 0xc00ac4eb00, 0xc02e4c8000, 0xcc1658, 0x118bf8e, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:639 +0xf3
github.com/cockroachdb/cockroach/pkg/kv/bulk.AddSSTable.func1(0x60c5840, 0xc013d01710, 0xc011d1add0, 0xc0003ad500, 0x606f780, 0xc00078e980, 0x611d760, 0xc00ebe3cb0, 0x2fe06a1f, 0xed6d80e40, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:424 +0x331
github.com/cockroachdb/cockroach/pkg/kv/bulk.AddSSTable(0x60c5840, 0xc013d01710, 0x606f780, 0xc00078e980, 0xc002a5b7e0, 0x8, 0x8, 0xc002a5b810, 0x9, 0x9, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:460 +0x590
github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).doFlush(0xc000c93e00, 0x60c5840, 0xc013d01710, 0x1, 0xc00c985dc8, 0x8, 0x8, 0xc004466780, 0xc007f3f8d8)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:299 +0x3db
github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).flushIfNeeded(0xc000c93e00, 0x60c5840, 0xc013d01710, 0xc00c985dc8, 0x8, 0x8, 0x2, 0xc003935500)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:234 +0x176
github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).AddMVCCKey(0xc000c93e00, 0x60c5840, 0xc013d01710, 0xc00c985dc8, 0x8, 0x8, 0x151c2d1ed07de6c1, 0x0, 0xc003935500, 0x2808, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:150 +0x33b
github.com/cockroachdb/cockroach/pkg/ccl/storageccl.evalImport(0x60c5840, 0xc013d01710, 0x623ac80, 0xc006fd2000, 0x162ec2b2e68eb182, 0x0, 0x600000006, 0x6, 0x0, 0x3d81, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/storageccl/import.go:265 +0x124e
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeAdminBatch(0xc006fd2000, 0x60c5840, 0xc013d01710, 0xc00f7f53b0, 0x0, 0x0)
...

Full stack

goroutine 913610 [select]:
github.com/cockroachdb/cockroach/pkg/util/retry.(*Retry).Next(0xc007f3d8d0, 0xc004789b00)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:128 +0x185
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch(0xc000a9c800, 0x60c5840, 0xc013d01710, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1438 +0x23a
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges(0xc000a9c800, 0x60c5840, 0xc013d01710, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1127 +0x16b2
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send(0xc000a9c800, 0x60c5840, 0xc013d01710, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:766 +0x7a6
github.com/cockroachdb/cockroach/pkg/kv.(*CrossRangeTxnWrapperSender).Send(0xc00078e9e0, 0x60c5840, 0xc013d01710, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:219 +0x9f
github.com/cockroachdb/cockroach/pkg/kv.(*DB).sendUsingSender(0xc00078e980, 0x60c5840, 0xc013d01710, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:744 +0x13c
github.com/cockroachdb/cockroach/pkg/kv.(*DB).send(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:726
github.com/cockroachdb/cockroach/pkg/kv.sendAndFill(0x60c5840, 0xc013d01710, 0xc007f3eb20, 0xc003b9c500, 0xc002a5b810, 0xc002a5b7e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:654 +0x104
github.com/cockroachdb/cockroach/pkg/kv.(*DB).Run(0xc00078e980, 0x60c5840, 0xc013d01710, 0xc003b9c500, 0xc00ac4eb00, 0xc02e4c8000)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:677 +0x9c
github.com/cockroachdb/cockroach/pkg/kv.(*DB).AddSSTable(0xc00078e980, 0x60c5840, 0xc013d01710, 0x4d30f80, 0xc00ac4eae0, 0x4d30f80, 0xc00ac4eb00, 0xc02e4c8000, 0xcc1658, 0x118bf8e, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:639 +0xf3
github.com/cockroachdb/cockroach/pkg/kv/bulk.AddSSTable.func1(0x60c5840, 0xc013d01710, 0xc011d1add0, 0xc0003ad500, 0x606f780, 0xc00078e980, 0x611d760, 0xc00ebe3cb0, 0x2fe06a1f, 0xed6d80e40, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:424 +0x331
github.com/cockroachdb/cockroach/pkg/kv/bulk.AddSSTable(0x60c5840, 0xc013d01710, 0x606f780, 0xc00078e980, 0xc002a5b7e0, 0x8, 0x8, 0xc002a5b810, 0x9, 0x9, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:460 +0x590
github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).doFlush(0xc000c93e00, 0x60c5840, 0xc013d01710, 0x1, 0xc00c985dc8, 0x8, 0x8, 0xc004466780, 0xc007f3f8d8)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:299 +0x3db
github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).flushIfNeeded(0xc000c93e00, 0x60c5840, 0xc013d01710, 0xc00c985dc8, 0x8, 0x8, 0x2, 0xc003935500)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:234 +0x176
github.com/cockroachdb/cockroach/pkg/kv/bulk.(*SSTBatcher).AddMVCCKey(0xc000c93e00, 0x60c5840, 0xc013d01710, 0xc00c985dc8, 0x8, 0x8, 0x151c2d1ed07de6c1, 0x0, 0xc003935500, 0x2808, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/bulk/sst_batcher.go:150 +0x33b
github.com/cockroachdb/cockroach/pkg/ccl/storageccl.evalImport(0x60c5840, 0xc013d01710, 0x623ac80, 0xc006fd2000, 0x162ec2b2e68eb182, 0x0, 0x600000006, 0x6, 0x0, 0x3d81, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/ccl/storageccl/import.go:265 +0x124e
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeAdminBatch(0xc006fd2000, 0x60c5840, 0xc013d01710, 0xc00f7f53b0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:616 +0xfd6
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).sendWithRangeID(0xc006fd2000, 0x60c5840, 0xc013d016e0, 0x3d81, 0xc00f7f53b0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:105 +0x6d8
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).Send(0xc006fd2000, 0x60c5840, 0xc013d016e0, 0x162ec2b2e68eb182, 0x0, 0x600000006, 0x6, 0x0, 0x3d81, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_send.go:37 +0x93
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).Send(0xc00060d800, 0x60c5840, 0xc013d009f0, 0x162ec2b2e68eb182, 0x0, 0x600000006, 0x6, 0x0, 0x3d81, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store_send.go:194 +0x5af
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).Send(0xc00078ea00, 0x60c5840, 0xc013d009f0, 0x0, 0x0, 0x600000006, 0x6, 0x0, 0x3d81, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/stores.go:177 +0x108
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal.func1(0x60c5840, 0xc013d00870, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:847 +0x220
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunTaskWithErr(0xc000b167e0, 0x60c5840, 0xc013d00870, 0x50484c2, 0x10, 0xc00693b600, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:326 +0x140
github.com/cockroachdb/cockroach/pkg/server.(*Node).batchInternal(0xc000311600, 0x60c5840, 0xc013d00870, 0xc00f7f5320, 0xc013d00870, 0xc00004e000, 0x4ae8fe0)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:835 +0x197
github.com/cockroachdb/cockroach/pkg/server.(*Node).Batch(0xc000311600, 0x60c5840, 0xc013d00840, 0xc00f7f5320, 0xc000311600, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/node.go:873 +0x9f
github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler.func1(0x60c5840, 0xc013d00840, 0x4ff2a00, 0xc00f7f5320, 0x4ff2a00, 0xc00f7f5320, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:10575 +0x86
github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc.OpenTracingServerInterceptor.func1(0x60c5840, 0xc013d00840, 0x4ff2a00, 0xc00f7f5320, 0xc0055365c0, 0xc0055365e0, 0x0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/grpc-ecosystem/grpc-opentracing/go/otgrpc/server.go:44 +0x9ef
google.golang.org/grpc.getChainUnaryHandler.func1(0x60c5840, 0xc013d00840, 0x4ff2a00, 0xc00f7f5320, 0xc00693ba80, 0xf910e8, 0x4d56360, 0xc00683d680)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:921 +0xe7
github.com/cockroachdb/cockroach/pkg/rpc.NewServer.func1(0x60c5840, 0xc013d00840, 0x4ff2a00, 0xc00f7f5320, 0xc0055365c0, 0xc00683d680, 0xc00683d680, 0x20, 0x4afd5e0, 0xc00ded8001)
	/go/src/github.com/cockroachdb/cockroach/pkg/rpc/context.go:199 +0xa8
google.golang.org/grpc.chainUnaryServerInterceptors.func1(0x60c5840, 0xc013d00840, 0x4ff2a00, 0xc00f7f5320, 0xc0055365c0, 0xc0055365e0, 0xc001e5db58, 0x977468, 0x4d72420, 0xc013d00840)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:907 +0xca
github.com/cockroachdb/cockroach/pkg/roachpb._Internal_Batch_Handler(0x4f9f360, 0xc000311600, 0x60c5840, 0xc013d00840, 0xc00ded80c0, 0xc000fe61e0, 0x60c5840, 0xc013d00840, 0xc006f55400, 0x27a)
	/go/src/github.com/cockroachdb/cockroach/pkg/roachpb/api.pb.go:10577 +0x14b
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0007a2b60, 0x61331e0, 0xc001992a80, 0xc001dbf600, 0xc001269710, 0x89fcdc0, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1082 +0x4fd
google.golang.org/grpc.(*Server).handleStream(0xc0007a2b60, 0x61331e0, 0xc001992a80, 0xc001dbf600, 0x0)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:1405 +0xd25
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc000a87e50, 0xc0007a2b60, 0x61331e0, 0xc001992a80, 0xc001dbf600)
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:746 +0xbb
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/go/src/github.com/cockroachdb/cockroach/vendor/google.golang.org/grpc/server.go:744 +0xa1

Which reminds me of #52468 (comment). @adityamaru has been trying to reproduce this stuck behaviour while running restore2TB without much luck.

I tried looking through the debug.zip to see if there were any logs that might help us uncover the root cause but my searching didn't turn up anything. (Going to assign to him for now since he's been digging into it.)

cc @andreimatei in case there's either, since the symptoms look familiar to #52468 (comment):

Anywhere in the debug.zip that might help us see if distsender was actually stuck (and why if it was)
or
Anything in particular we should collect if we do manage to repro a stuck restore in addition to the dist_sender's logs.

Release justification: small change improving logging, hoping to help debug cockroachdb#53197. This patch moves the logging that the DistSender does when it detects an RPC that is slow such that it kicks in in more cases: 1) if the RPC was successful (but slow) 2) if the lower layer in the DistSender failed to contact any replica and returned a sendError{}. In this case the range descriptor will be refreshed and the RPC will be retried, similar to the case in which a replica returned RangeKeyMismatchError. The RKME case was already covered by the logging, but the sendError case wasn't. Touches cockroachdb#53197 Release note: None

andreimatei · 2020-08-31T21:13:24Z

I'm not sure what to say. This looks different from #52468 because here, if goroutine 913610 is indeed looping constantly, it's doing it at a higher level than in #52468. The DistSender has some logging for requests that are constantly retried around there, except it wasn't put in a very good place and so it was missing some cases. I'm trying to improve that in the PR above.
If you catch this problem in the act, the best thing to do is to logspy the goroutine id for a few seconds.

Release justification: small change improving logging, hoping to help debug cockroachdb#53197. This patch moves the logging that the DistSender does when it detects an RPC that is slow such that it kicks in in more cases: 1) if the RPC was successful (but slow) 2) if the lower layer in the DistSender failed to contact any replica and returned a sendError{}. In this case the range descriptor will be refreshed and the RPC will be retried, similar to the case in which a replica returned RangeKeyMismatchError. The RKME case was already covered by the logging, but the sendError case wasn't. Touches cockroachdb#53197 Release note: None

53705: kvclient: improve logging of slow RPCs in DistSender r=andreimatei a=andreimatei Release justification: small change improving logging, hoping to help debug #53197. This patch moves the logging that the DistSender does when it detects an RPC that is slow such that it kicks in in more cases: 1) if the RPC was successful (but slow) 2) if the lower layer in the DistSender failed to contact any replica and returned a sendError{}. In this case the range descriptor will be refreshed and the RPC will be retried, similar to the case in which a replica returned RangeKeyMismatchError. The RKME case was already covered by the logging, but the sendError case wasn't. Touches #53197 Release note: None 53721: geo: rebrand geospatial to spatial with minor refactors r=sumeerbhola a=otan * Rename geospatial to spatial (for outwards facing docs). * Replace all (args[d].(*tree.DType)) to tree.MustBeDType(args[d]) * Add tests to ensure all geo builtin info has a capital letter and full stop. Release justification: low risk update to new functionality Release note: None 53778: backupccl: fixes cluster backup error string r=pbardea,dt a=adityamaru Fixes: #52018 Release justification: low risk, high benefit changes to existing functionality 53785: opt: remove xfunc references and clean up EnsureKey r=RaduBerinde a=mgartner #### norm: clean up EnsureKey decorrelation function This commit replaces a single-case type switch statement in EnsureKey with an `if` with a type assertion. It also adds clariying comments. Release justification: This is a low-risk change to new and minor changes to existing functionality. Release note: None #### opt: fix comment in xform custom_funcs.go This commit removes references to the non-existent `xfunc` package. Release justification: This is a non-production code-change. Release note: None 53793: sql: skip TestCancelQueryPermissions r=solongordon a=solongordon This test is exposing more data races. Skipping until we can fix them. Refs: #53791 Release justification: non-production code changes Release note: None Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Solon Gordon <[email protected]>

cockroach-teamcity · 2020-09-04T13:44:05Z

(roachtest).restore2TB/nodes=10 failed on master@9fae2abb48a00c26ac0344f35cd100df7faa2aec:

		  -- stack trace:
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2284
		  | main.(*cluster).Run
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2204
		  | main.registerRestore.func1.1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/restore.go:255
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2626
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (2) output in run_074331.472_n1_cockroach_sql
		Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2247288-1599199488-21-n10cpu4:1 -- ./cockroach sql --insecure -e "
		  | 				RESTORE csv.bank FROM
		  | 				'gs://cockroach-fixtures/workload/bank/version=1.0.0,payload-bytes=10240,ranges=0,rows=65104166,seed=1/bank'
		  | 				WITH into_db = 'restore2tb'" returned
		  | stderr:
		  | Flag --insecure has been deprecated, it will be removed in a subsequent release.
		  | For details, see: https://go.crdb.dev/issue-v/53404/v20.2
		  |
		  | stdout:
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

	cluster.go:2648,restore.go:262,test_runner.go:754: monitor failure: monitor task failed: context canceled
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2636
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2644
		  | main.registerRestore.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/restore.go:262
		  | [...repeated from below...]
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2692
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (4) monitor task failed
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString

More

Artifacts: /restore2TB/nodes=10

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

pbardea · 2020-09-04T16:16:33Z

Looks like another timeout, got a bit of logs this time:

Hanging adminsplit request stack:

goroutine 1075307 [select]:
github.com/cockroachdb/cockroach/pkg/util/retry.(*Retry).Next(0xc001c50d40, 0xc00452a480)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:128 +0x185
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch(0xc0004e4400, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1450 +0x273
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges(0xc0004e4400, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1127 +0x16b2
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send(0xc0004e4400, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:766 +0x7a6
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnLockGatekeeper).SendLocked(0xc001cf1ac0, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_lock_gatekeeper.go:86 +0x118
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnMetricRecorder).SendLocked(0xc001cf1a88, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_metric_recorder.go:46 +0x9a
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnCommitter).SendLocked(0xc001cf1a58, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go:126 +0x8b5
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).sendLockedWithRefreshAttempts(0xc001cf19b8, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:267 +0xa8
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).SendLocked(0xc001cf19b8, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:202 +0x28a
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnPipeliner).SendLocked(0xc001cf18f8, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go:252 +0x1a0
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSeqNumAllocator).SendLocked(0xc001cf18d8, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_seq_num_allocator.go:105 +0x219
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnHeartbeater).SendLocked(0xc001cf1838, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_heartbeater.go:172 +0x1bd
github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*TxnCoordSender).Send(0xc001cf1680, 0x6152d60, 0xc012eaa6f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:500 +0x413
github.com/cockroachdb/cockroach/pkg/kv.(*DB).sendUsingSender(0xc0004a7500, 0x6152d60, 0xc012eaa690, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:744 +0x13c
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Send(0xc0122f4ab0, 0x6152d60, 0xc012eaa690, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:919 +0x12b
github.com/cockroachdb/cockroach/pkg/kv.sendAndFill(0x6152d60, 0xc012eaa690, 0xc001c53058, 0xc001796000, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:654 +0x104
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Run(0xc0122f4ab0, 0x6152d60, 0xc012eaa690, 0xc001796000, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:590 +0xfa
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Get(0xc0122f4ab0, 0x6152d60, 0xc012eaa690, 0x4d8e1e0, 0xc00504ef20, 0x4, 0x0, 0x0, 0x0, 0xc010ac4940, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:369 +0xbc
github.com/cockroachdb/cockroach/pkg/kv/kvserver.conditionalGetDescValueFromDB(0x6152d60, 0xc012eaa690, 0xc0122f4ab0, 0xc0153dc3d8, 0x6, 0x8, 0xc0046c5860, 0x9789cc, 0xc009d67980, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:2018 +0xf2
github.com/cockroachdb/cockroach/pkg/kv/kvserver.splitTxnStickyUpdateAttempt(0x6152d60, 0xc012eaa690, 0xc0122f4ab0, 0xc00e21bb90, 0x1631897873e4c96e, 0xc000000000, 0x9789cc, 0xc012eaa4e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:262 +0x98
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).adminSplitWithDescriptor.func1(0x6152d60, 0xc012eaa690, 0xc0122f4ab0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:385 +0x58
github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1(0x6152d60, 0xc012eaa690, 0xc0122f4ab0, 0x3ff0000000000000, 0x62c9580)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:707 +0x43
github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec(0xc0122f4ab0, 0x6152d60, 0xc012eaa690, 0xc001c53730, 0xc0122f4ab0, 0xc000000000)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:811 +0xd9
github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn(0xc0004a7500, 0x6152d60, 0xc012eaa690, 0xc001c539d8, 0x6, 0x8)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:706 +0xfa
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).adminSplitWithDescriptor(0xc00f0dd000, 0x6152d60, 0xc012eaa690, 0xc013230228, 0x6, 0x8, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:384 +0x535
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).AdminSplit.func1(0xc00e21bb90, 0xc00e21bb90, 0xc012eaa690)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:58 +0xf5
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).executeAdminCommandWithDescriptor(0xc00f0dd000, 0x6152d60, 0xc012eaa690, 0xc001c53e88, 0x4b9b180)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:549 +0x20a
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).AdminSplit(0xc00f0dd000, 0x6152d60, 0xc012eaa690, 0xc013230228, 0x6, 0x8, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica_command.go:56 +0x1a6

Some of the logs:

Grepping the stuck goroutine above:
$ cat nodes/*/logs/* | grep 1075307

W200904 08:09:13.549124 1075307 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n1,s1,r3715/5:‹/Table/53/1/2588030{-/0}›,txn=‹c7bd557d›] slow range RPC: ‹have been waiting 60.42s (65 attempts) for RPC [txn: c7bd557d], [can-forward-ts], Get [/Local/Range/Table/53/1/2588030/RangeDescriptor,/Min) to r247:/Table/53/1/2{499238/0-799217/0} [(n1,s1):1, (n9,s9):2, (n3,s3):3, next=4, gen=44, sticky=1599209017.179184168,1]; resp: failed to send RPC: sending to all replicas failed; last error: r247 was not found on s1›

And all slow range logs:
$ cat nodes/*/logs/* | grep "slow range RPC"

W200904 08:02:29.548152 6070 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n1,job=586989617193713665,restore-distsql] slow range RPC: ‹have been waiting 759.19s (1 attempts) for RPC Import [/Table/53/1/6651521,/Min) to r2261:/Table/53/1/6651521{-/0} [(n1,s1):1, (n10,s10):2, (n8,s8):3, next=4, gen=152, sticky=1599209029.129901140,0]; resp: (err: <nil>), *roachpb.ImportResponse›
W200904 08:09:13.549124 1075307 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n1,s1,r3715/5:‹/Table/53/1/2588030{-/0}›,txn=‹c7bd557d›] slow range RPC: ‹have been waiting 60.42s (65 attempts) for RPC [txn: c7bd557d], [can-forward-ts], Get [/Local/Range/Table/53/1/2588030/RangeDescriptor,/Min) to r247:/Table/53/1/2{499238/0-799217/0} [(n1,s1):1, (n9,s9):2, (n3,s3):3, next=4, gen=44, sticky=1599209017.179184168,1]; resp: failed to send RPC: sending to all replicas failed; last error: r247 was not found on s1›
W200904 08:02:21.588439 514550 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n10,s10,r13040/4:‹/Table/53/1/233{48614…-51886}›] slow range RPC: ‹have been waiting 811.02s (1 attempts) for RPC AdminSplit [/Table/53/1/23348614,/Min) to r17156:/Table/53/1/23{040148-348614/0} [(n4,s4):1, (n10,s10):2, (n7,s7):3, next=4, gen=473, sticky=1599209160.852120104,0]; resp: (err: key range /Table/53/1/23348614-/Table/53/1/23348614 outside of bounds of range /Table/53/1/23040148-/Table/53/1/23040148/0; suggested ranges: [desc: r17156:/Table/53/1/23040148{-/0} [(n4,s4):1, (n10,s10):2, (n7,s7):3, next=4, gen=474, sticky=1599209188.849271700,0] lease: repl=(n10,s10):2 seq=18 start=1599205566.151951687,0 epo=1 pro=1599205566.151954298,0])›
W200904 08:02:27.103194 609 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n10] slow range RPC: ‹have been waiting 817.21s (1 attempts) for RPC Import [/Table/53/1/8140764/0,/Min) to r5474:/Table/53/1/814{0764/0-4037} [(n8,s8):1, (n10,s10):2, (n9,s9):3, next=4, gen=105, sticky=1599209068.781542347,1]; resp: (err: <nil>), *roachpb.ImportResponse›
W200904 08:02:23.280058 1348 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n2] slow range RPC: ‹have been waiting 647.37s (1 attempts) for RPC Import [/Table/53/1/4906141/0,/Min) to r2472:/Table/53/1/490{6141/0-9414} [(n1,s1):4, (n5,s5):2, (n10,s10):3, next=5, gen=274, sticky=1599209033.406791799,0]; resp: (err: <nil>), *roachpb.ImportResponse›
W200904 08:05:28.509396 295036 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n2] slow range RPC: ‹have been waiting 1115.81s (1 attempts) for RPC GC [/Table/53/1/33529706/0,/Table/53/1/33529706/0/NULL) to r21657:/Table/53/1/335{29706/0-32978} [(n2,s2):1, (n6,s6):2, (n1,s1):3, next=4, gen=270, sticky=1599209211.637839257,0]; resp: (err: <nil>), *roachpb.GCResponse›
W200904 08:02:29.550307 1370 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n3] slow range RPC: ‹have been waiting 809.96s (1 attempts) for RPC Import [/Table/53/1/1798878/0,/Min) to r2806:/Table/53/1/1{798878/0-802151} [(n10,s10):1, (n9,s9):2, (n5,s5):3, next=4, gen=243, sticky=1599209037.588061728,0]; resp: (err: <nil>), *roachpb.ImportResponse›
W200904 08:02:26.019307 1035 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n4] slow range RPC: ‹have been waiting 816.54s (1 attempts) for RPC Import [/Table/53/1/1653249/0,/Min) to r2525:/Table/53/1/165{3249/0-6522} [(n10,s10):1, (n3,s3):2, (n5,s5):3, next=4, gen=117, sticky=1599209034.107904073,1]; resp: (err: <nil>), *roachpb.ImportResponse›
W200904 10:07:26.599479 3300777 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n4,merge,s4,r3711/2:‹/Table/53/1/258{4757-8030}›] slow range RPC: ‹have been waiting 60.00s (1 attempts) for RPC AdminRelocateRange [/Table/53/1/2588030,/Min) to r3715:/Table/53/1/2588030{-/0} [(n10,s10):4, (n1,s1):5, (n7,s7):3, next=6, gen=146, sticky=1599209049.308945756,0]; resp: aborted during DistSender.Send: context deadline exceeded›
W200904 08:02:21.522178 1069 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n5] slow range RPC: ‹have been waiting 952.67s (1 attempts) for RPC AdminSplit [/Table/53/1/23040148,/Min) to r17156:/Table/53/1/23{040148-348614/0} [(n4,s4):1, (n10,s10):2, (n7,s7):3, next=4, gen=473, sticky=1599209160.852120104,0]; resp: (err: <nil>), *roachpb.AdminSplitResponse›
W200904 08:02:23.280485 989 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n5] slow range RPC: ‹have been waiting 810.92s (1 attempts) for RPC Import [/Table/53/1/18382918,/Min) to r10224:/Table/53/1/18382918{-/0} [(n10,s10):1, (n9,s9):4, (n5,s5):3, next=5, gen=105, sticky=1599209109.510530053,0]; resp: (err: <nil>), *roachpb.ImportResponse›
W200904 08:02:28.193744 928 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n7] slow range RPC: ‹have been waiting 815.21s (1 attempts) for RPC Import [/Table/53/1/32345180,/Min) to r28742:/Table/53/1/32345180{-/0} [(n4,s4):1, (n1,s1):2, (n10,s10):3, next=4, gen=405, sticky=1599209281.460985107,1]; resp: (err: <nil>), *roachpb.ImportResponse›
W200904 08:02:28.198013 760 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n8] slow range RPC: ‹have been waiting 813.46s (1 attempts) for RPC Import [/Table/53/1/32217250/0,/Min) to r18252:/Table/53/1/322{17250/0-20522} [(n3,s3):1, (n7,s7):2, (n10,s10):3, next=4, gen=312, sticky=1599209241.422306607,0]; resp: (err: <nil>), *roachpb.ImportResponse›
W200904 08:02:28.194162 785 kv/kvclient/kvcoord/dist_sender.go:1493 ⋮ [n9] slow range RPC: ‹have been waiting 806.42s (1 attempts) for RPC Import [/Table/53/1/6012683,/Min) to r562:/Table/53/1/6012683{-/0} [(n9,s9):1, (n10,s10):4, (n2,s2):3, next=5, gen=133, sticky=1599209019.787172750,0]; resp: (err: <nil>), *roachpb.ImportResponse›

dt · 2020-09-08T13:38:53Z

@pbardea If these are still hanging AdminSplit, should we move this over to KV's project board for now?

pbardea · 2020-09-08T15:31:38Z

I think that makes sense -- I'm not sure if there's more investigation that we can do on the Bulk IO side. Happy to help if there's anything that Bulk IO can help with.

andreimatei · 2020-09-10T22:52:20Z

This reproduced for me in 1/20 runs.
What I saw was a node processing an AdminSplit; in the process of processing the split, it was trying to read its range desc. Reading the range desc was retrying over and over because a stale descriptor was being used - the descriptor didn't include the current leaseholder, which was the node sending the request. Every time the DistSender would run through the replicas in the it has descriptor, then evict the descriptor from the cache and read a fresh one from meta2. For some reason, every read seemed to return a stale desc. Unfortunately the node crashed because of #54226 when I was trying to get more info on the meta2 reads, but what I noticed is that the range in question seems to be in some sort of messy state.

r17261 Range _ Debug _ Cockroach Console.pdf

andreimatei · 2020-09-15T23:32:16Z

Finally figured it out. Will have a fix tomorrow.

This patch fixes a bug in the DistSender's interaction with the range cache. The DistSender interacts with the cache through an EvictionToken; this token is used to update the lease information for a cache entry and to evict the entry completely if the descriptor is found to be too stale to be useful. The problem was that the eviction does not work as intended if a copy of the token had been used to perform a lease update prior. Operations that update the cache take a token and return an updated token, and only the most up to date such token can be used in order to perform subsequent cache updates (in particular, to evict the cache entry). The bug was that the DistSender sometimes lost track of this most up to date token, and tried to evict using an older copy - which eviction was a no-op. Specifically, the bad scenario was the following: - DistSender.sendPartialBatch made a copy of the token and handed it to sendToReplicas. - sendToReplicas updated the cached lease through this token (through tok.UpdateLease). The updated token was further used by sendToReplicas but not returned to sendPartialBatch. - No replica can serve the request cause the descriptor is stale, and control flow gets back to sendPartialBatch. - sendPartialBatch uses its own, stale, copy of the token to attempt a cache eviction. The cache ignores the eviction request, claiming that it has a more up to date entry than the one that sendPartialBatch is trying to evict (which is true). - sendPartialBatch then asks the cache for a new descriptor, and the cache returns the same one as before. It also returns the lease that we have added before, but that doesn't help very much. - All this can happen over and over again in some case that I don't fully understand. In the restore2TB test, what happens is that the node that's supposed to be the leaseholder (say, n1), perpetually doesn't know about the range, and another node perpetually thinks that n1 is the leaseholder. I'm not sure why the latter happens, but this range is probably in an unusual state because the request that's spinning in this way is an ImportRequest. This patch fixes the problem by changing the token's interface - now different token methods take the receiver by pointer and mutate it, and the DistSender methods share a single the token instance between them. Fixes cockroachdb#54118 Fixes cockroachdb#53197 Release note: None

This patch fixes a bug in the DistSender's interaction with the range cache. The DistSender interacts with the cache through an EvictionToken; this token is used to update the lease information for a cache entry and to evict the entry completely if the descriptor is found to be too stale to be useful. The problem was that the eviction does not work as intended if a copy of the token had been used to perform a lease update prior. Operations that update the cache take a token and return an updated token, and only the most up to date such token can be used in order to perform subsequent cache updates (in particular, to evict the cache entry). The bug was that the DistSender sometimes lost track of this most up to date token, and tried to evict using an older copy - which eviction was a no-op. Specifically, the bad scenario was the following: - DistSender.sendPartialBatch made a copy of the token and handed it to sendToReplicas. - sendToReplicas updated the cached lease through this token (through tok.UpdateLease). The updated token was further used by sendToReplicas but not returned to sendPartialBatch. - No replica can serve the request cause the descriptor is stale, and control flow gets back to sendPartialBatch. - sendPartialBatch uses its own, stale, copy of the token to attempt a cache eviction. The cache ignores the eviction request, claiming that it has a more up to date entry than the one that sendPartialBatch is trying to evict (which is true). - sendPartialBatch then asks the cache for a new descriptor, and the cache returns the same one as before. It also returns the lease that we have added before, but that doesn't help very much. - All this can happen over and over again in some case that I don't fully understand. In the restore2TB test, what happens is that the node that's supposed to be the leaseholder (say, n1), perpetually doesn't know about the range, and another node perpetually thinks that n1 is the leaseholder. I'm not sure why the latter happens, but this range is probably in an unusual state because the request that's spinning in this way is an ImportRequest. This patch fixes the problem by changing how descriptor evictions work: instead of comparing a token's pointer to the current cache entry and refusing to do anything if they don't match, now evicting a descriptor compares the descriptor's value with what's in the cache. The cache code generally moves away from comparing pointers anywhere, and an EvictionToken stops storing pointers to the cache's insides for clarity. Fixes cockroachdb#54118 Fixes cockroachdb#53197 Release note: None

54468: kvclient: fix a request routing bug r=andreimatei a=andreimatei This patch fixes a bug in the DistSender's interaction with the range cache. The DistSender interacts with the cache through an EvictionToken; this token is used to update the lease information for a cache entry and to evict the entry completely if the descriptor is found to be too stale to be useful. The problem was that the eviction does not work as intended if a copy of the token had been used to perform a lease update prior. Operations that update the cache take a token and return an updated token, and only the most up to date such token can be used in order to perform subsequent cache updates (in particular, to evict the cache entry). The bug was that the DistSender sometimes lost track of this most up to date token, and tried to evict using an older copy - which eviction was a no-op. Specifically, the bad scenario was the following: - DistSender.sendPartialBatch made a copy of the token and handed it to sendToReplicas. - sendToReplicas updated the cached lease through this token (through tok.UpdateLease). The updated token was further used by sendToReplicas but not returned to sendPartialBatch. - No replica can serve the request cause the descriptor is stale, and control flow gets back to sendPartialBatch. - sendPartialBatch uses its own, stale, copy of the token to attempt a cache eviction. The cache ignores the eviction request, claiming that it has a more up to date entry than the one that sendPartialBatch is trying to evict (which is true). - sendPartialBatch then asks the cache for a new descriptor, and the cache returns the same one as before. It also returns the lease that we have added before, but that doesn't help very much. - All this can happen over and over again in some case that I don't fully understand. In the restore2TB test, what happens is that the node that's supposed to be the leaseholder (say, n1), perpetually doesn't know about the range, and another node perpetually thinks that n1 is the leaseholder. I'm not sure why the latter happens, but this range is probably in an unusual state because the request that's spinning in this way is an ImportRequest. This patch fixes the problem by changing how descriptor evictions work: instead of comparing a token's pointer to the current cache entry and refusing to do anything if they don't match, now evicting a descriptor compares the descriptor's value with what's in the cache. The cache code generally moves away from comparing pointers anywhere, and an EvictionToken stops storing pointers to the cache's insides for clarity. Fixes #54118 Fixes #53197 Release note: None Co-authored-by: Andrei Matei <[email protected]>

andreimatei · 2020-09-18T02:57:48Z

reopening until backported

andreimatei · 2020-09-18T02:59:57Z

now backported

This patch fixes a bug in the DistSender's interaction with the range cache. The DistSender interacts with the cache through an EvictionToken; this token is used to update the lease information for a cache entry and to evict the entry completely if the descriptor is found to be too stale to be useful. The problem was that the eviction does not work as intended if a copy of the token had been used to perform a lease update prior. Operations that update the cache take a token and return an updated token, and only the most up to date such token can be used in order to perform subsequent cache updates (in particular, to evict the cache entry). The bug was that the DistSender sometimes lost track of this most up to date token, and tried to evict using an older copy - which eviction was a no-op. Specifically, the bad scenario was the following: - DistSender.sendPartialBatch made a copy of the token and handed it to sendToReplicas. - sendToReplicas updated the cached lease through this token (through tok.UpdateLease). The updated token was further used by sendToReplicas but not returned to sendPartialBatch. - No replica can serve the request cause the descriptor is stale, and control flow gets back to sendPartialBatch. - sendPartialBatch uses its own, stale, copy of the token to attempt a cache eviction. The cache ignores the eviction request, claiming that it has a more up to date entry than the one that sendPartialBatch is trying to evict (which is true). - sendPartialBatch then asks the cache for a new descriptor, and the cache returns the same one as before. It also returns the lease that we have added before, but that doesn't help very much. - All this can happen over and over again in some case that I don't fully understand. In the restore2TB test, what happens is that the node that's supposed to be the leaseholder (say, n1), perpetually doesn't know about the range, and another node perpetually thinks that n1 is the leaseholder. I'm not sure why the latter happens, but this range is probably in an unusual state because the request that's spinning in this way is an ImportRequest. This patch fixes the problem by changing how descriptor evictions work: instead of comparing a token's pointer to the current cache entry and refusing to do anything if they don't match, now evicting a descriptor compares the descriptor's value with what's in the cache. The cache code generally moves away from comparing pointers anywhere, and an EvictionToken stops storing pointers to the cache's insides for clarity. Fixes #54118 Fixes #53197 Release note: None

cockroach-teamcity added this to the 20.2 milestone Aug 21, 2020

pbardea self-assigned this Aug 21, 2020

adityamaru mentioned this issue Aug 27, 2020

roachtest: restore2TB/nodes=32 failed #53534

Closed

pbardea assigned adityamaru Aug 31, 2020

andreimatei mentioned this issue Aug 31, 2020

kvclient: improve logging of slow RPCs in DistSender #53705

Merged

cockroach-teamcity mentioned this issue Sep 9, 2020

roachtest: restore2TB/nodes=10 failed #54118

Closed

dt assigned andreimatei and unassigned pbardea and adityamaru Sep 10, 2020

dt added the branch-release-20.2 label Sep 14, 2020

andreimatei mentioned this issue Sep 16, 2020

kvclient: fix a request routing bug #54468

Merged

andreimatei mentioned this issue Sep 17, 2020

release-20.2: kvclient: fix a request routing bug #54533

Merged

craig bot closed this as completed in 84fa010 Sep 17, 2020

andreimatei reopened this Sep 18, 2020

andreimatei closed this as completed Sep 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: restore2TB/nodes=10 failed #53197

roachtest: restore2TB/nodes=10 failed #53197

cockroach-teamcity commented Aug 21, 2020

cockroach-teamcity commented Aug 26, 2020

pbardea commented Aug 26, 2020

cockroach-teamcity commented Aug 31, 2020

pbardea commented Aug 31, 2020

andreimatei commented Aug 31, 2020

cockroach-teamcity commented Sep 4, 2020

pbardea commented Sep 4, 2020

dt commented Sep 8, 2020

pbardea commented Sep 8, 2020

andreimatei commented Sep 10, 2020

andreimatei commented Sep 15, 2020 •

edited

Loading

andreimatei commented Sep 18, 2020

andreimatei commented Sep 18, 2020

roachtest: restore2TB/nodes=10 failed #53197

roachtest: restore2TB/nodes=10 failed #53197

Comments

cockroach-teamcity commented Aug 21, 2020

cockroach-teamcity commented Aug 26, 2020

pbardea commented Aug 26, 2020

cockroach-teamcity commented Aug 31, 2020

pbardea commented Aug 31, 2020

andreimatei commented Aug 31, 2020

cockroach-teamcity commented Sep 4, 2020

pbardea commented Sep 4, 2020

dt commented Sep 8, 2020

pbardea commented Sep 8, 2020

andreimatei commented Sep 10, 2020

andreimatei commented Sep 15, 2020 • edited Loading

andreimatei commented Sep 18, 2020

andreimatei commented Sep 18, 2020

andreimatei commented Sep 15, 2020 •

edited

Loading