Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpce/c=5000/nodes=3 failed #66717

Closed
cockroach-teamcity opened this issue Jun 22, 2021 · 4 comments · Fixed by #66792
Closed

roachtest: tpce/c=5000/nodes=3 failed #66717

cockroach-teamcity opened this issue Jun 22, 2021 · 4 comments · Fixed by #66792
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

roachtest.tpce/c=5000/nodes=3 failed with artifacts on release-21.1 @ 6267da0f19b6cdc08e9fdd39899e35212694cb78:

		Reported tpsE :    --   (not between 80% and 100%)
		
		thread '<unnamed>' panicked at 'called ``Result::unwrap()`` on an ``Err`` value: "SendError(..)"', driver/src/customer_emulator.rs:47:52
		thread '<unnamed>' panicked at 'called ``Result::unwrap()`` on an ``Err`` value: "SendError(..)"', driver/src/market_exchange_emulator.rs:48:53

	cluster.go:2668,tpce.go:96,tpce.go:113,test_runner.go:767: monitor failure: unexpected node event: 1: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2656
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2664
		  | main.registerTPCE.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpce.go:96
		  | main.registerTPCE.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpce.go:113
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 1: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:89,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3103837-1624342463-75-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		1: dead
		2: 10374
		3: 10076
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

To reproduce, try:

# From https://go.crdb.dev/p/roachstress, perhaps edited lightly.
caffeinate ./roachstress.sh tpce/c=5000/nodes=3

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-release-21.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 22, 2021
@cockroach-teamcity cockroach-teamcity added this to the 21.1 milestone Jun 22, 2021
@cockroach-teamcity
Copy link
Member Author

roachtest.tpce/c=5000/nodes=3 failed with artifacts on release-21.1 @ f60fbf28b0bb6110cf9870cc398ea169b81ff8f0:

		
		thread '<unnamed>' panicked at 'called ``Result::unwrap()`` on an ``Err`` value: "SendError(..)"', driver/src/customer_emulator.rs:thread '47<unnamed>:' panicked at '52called ``Result::unwrap()`` on an ``Err`` value: "SendError(..)"
		', driver/src/market_exchange_emulator.rs:48:53
		thread '<unnamed>' panicked at 'called ``Result::unwrap()`` on an ``Err`` value: "SendError(..)"', driver/src/data_maintenance_emulator.rs:54:30

	cluster.go:2668,tpce.go:96,tpce.go:113,test_runner.go:767: monitor failure: unexpected node event: 1: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2656
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2664
		  | main.registerTPCE.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpce.go:96
		  | main.registerTPCE.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpce.go:113
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 1: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:89,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3108486-1624428499-77-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		1: dead
		2: 9625
		3: 10150
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

To reproduce, try:

# From https://go.crdb.dev/p/roachstress, perhaps edited lightly.
caffeinate ./roachstress.sh tpce/c=5000/nodes=3

Same failure on other branches

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

@nvanbenschoten
Copy link
Member

This is failing with a panic in Cockroach:

fatal error: concurrent map writes

goroutine 408320 [running]:
runtime.throw(0x4c5ce17, 0x15)
	/usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc008485978 sp=0xc008485948 pc=0x48bb32
runtime.mapassign_fast64(0x452d560, 0xc008da0a20, 0x2, 0x5acd5a0)
	/usr/local/go/src/runtime/map_fast64.go:101 +0x33e fp=0xc0084859b8 sp=0xc008485978 pc=0x46707e
github.com/cockroachdb/cockroach/pkg/sql/opt.(*TableMeta).copyScalars(0xc0060bb230, 0xc008485bf0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/table_meta.go:193 +0x2cd fp=0xc008485ad0 sp=0xc0084859b8 pc=0x23770cd
github.com/cockroachdb/cockroach/pkg/sql/opt.(*Metadata).CopyFrom(0xc004024380, 0xc004b84000, 0xc008485bf0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/metadata.go:227 +0x445 fp=0xc008485ba8 sp=0xc008485ad0 pc=0x2370845
github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*Factory).CopyAndReplace(0xc00bfe6c98, 0x5bbb060, 0xc006593c90, 0xc008f8f180, 0xc008485c48)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/factory.go:222 +0xb2 fp=0xc008485c10 sp=0xc008485ba8 pc=0x253c2b2
github.com/cockroachdb/cockroach/pkg/sql/opt/norm.(*Factory).AssignPlaceholders(0xc00bfe6c98, 0xc004b84000, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/norm/factory.go:269 +0x105 fp=0xc008485c80 sp=0xc008485c10 pc=0x253c585
github.com/cockroachdb/cockroach/pkg/sql.(*optPlanningCtx).reuseMemo(0xc00bfe6c38, 0xc004b84000, 0xc00c5e8960, 0x4c55468, 0x13)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:436 +0x8c fp=0xc008485cc0 sp=0xc008485c80 pc=0x35a0a4c
github.com/cockroachdb/cockroach/pkg/sql.(*optPlanningCtx).buildExecMemo(0xc00bfe6c38, 0x5a76280, 0xc00c5e8960, 0x0, 0xc008485e38, 0x4a35ac)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:469 +0x84a fp=0xc008485de8 sp=0xc008485cc0 pc=0x35a136a
github.com/cockroachdb/cockroach/pkg/sql.(*planner).makeOptimizerPlan(0xc00bfe64f8, 0x5a76280, 0xc00c5e8960, 0xc0110f5a05, 0x110f5a0505fc8000)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_opt.go:194 +0xde fp=0xc008485ea0 sp=0xc008485de8 pc=0x359fade
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).makeExecPlan(0xc00bfe6000, 0x5a76280, 0xc00c5e8960, 0xc00bfe64f8, 0x1, 0x4e1a76)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:948 +0x5a fp=0xc008485f48 sp=0xc008485ea0 pc=0x34ac6fa
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine(0xc00bfe6000, 0x5a76280, 0xc00c5e8960, 0xc00bfe64f8, 0x7f1698c432b8, 0xc008564680, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:829 +0x18a fp=0xc008486208 sp=0xc008485f48 pc=0x34ab82a
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState(0xc00bfe6000, 0x5a76280, 0xc00c5e8960, 0x5a94c00, 0xc007828ff0, 0xc0075c6005, 0x406, 0xa, 0x9, 0xc0078eb860, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:667 +0xfa6 fp=0xc008486ff0 sp=0xc008486208 pc=0x34a89e6
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt(0xc00bfe6000, 0x5a761c0, 0xc006f81780, 0x5a94c00, 0xc007828ff0, 0xc0075c6005, 0x406, 0xa, 0x9, 0xc0078eb860, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:123 +0xb14 fp=0xc008487520 sp=0xc008486ff0 pc=0x34a7334
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execPortal(0xc00bfe6000, 0x5a761c0, 0xc006f81780, 0xc0078eb860, 0xc0067fc0a0, 0xa, 0xa, 0xc0189b31b0, 0x4, 0x4, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:210 +0x14e fp=0xc008487658 sp=0xc008487520 pc=0x34a778e
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd.func2(0xc0075c64fa, 0x0, 0x0, 0x110e6f68, 0xed8651d55, 0x0, 0xc00bfe6000, 0xc008487a20, 0xc008487a10, 0x7, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1596 +0x47b fp=0xc008487870 sp=0xc008487658 pc=0x366be9b
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd(0xc00bfe6000, 0x5a761c0, 0xc006f81780, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1598 +0x45c fp=0xc008487cc8 sp=0xc008487870 pc=0x349b7dc
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run(0xc00bfe6000, 0x5a761c0, 0xc005840280, 0xc0013fc8c0, 0x5400, 0x15000, 0xc0013fc960, 0xc013c96b20, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1450 +0x228 fp=0xc008487d68 sp=0xc008487cc8 pc=0x349b108
github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn(0xc001416680, 0x5a761c0, 0xc005840280, 0xc00bfe6000, 0x5400, 0x15000, 0xc0013fc960, 0xc013c96b20, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:484 +0xce fp=0xc008487df0 sp=0xc008487d68 pc=0x3496aee
github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).processCommandsAsync.func1(0xc014bde81b, 0xc007b20020, 0x5a761c0, 0xc005840280, 0xc013c96b20, 0xc001416680, 0xc008564000, 0x5add560, 0xc0029f6500, 0xc0043d5320, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:627 +0x3ea fp=0xc008487f38 sp=0xc008487df0 pc=0x3b6720a

@mgartner it seems like this could be fallout from d17efcd.

That commit has been backported to release-20.2 and release-21.1. We should probably block new patch releases on either of those branches from going out until this is resolved.

@nvanbenschoten nvanbenschoten added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jun 23, 2021
@rytaft
Copy link
Collaborator

rytaft commented Jun 23, 2021

I will take a look since @mgartner is OOO

rytaft added a commit to rytaft/cockroach that referenced this issue Jun 23, 2021
This commit fixes a bug that caused a panic due to concurrent map writes
when copying table metadata. The fix is to make a deep copy of the map
before updating it.

Fixes cockroachdb#66717

Release note (bug fix): Fixed a panic that could occur in the optimizer
when executing a prepared plan with placeholders. This could happen when
one of the tables used by the query had computed columns or a partial
index.
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 23, 2021
This commit fixes a bug that caused a panic due to concurrent map writes
when copying table metadata. The fix is to make a deep copy of the map
before updating it.

Fixes cockroachdb#66717

Release note (bug fix): Fixed a panic that could occur in the optimizer
when executing a prepared plan with placeholders. This could happen when
one of the tables used by the query had computed columns or a partial
index.
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 23, 2021
This commit fixes a bug that caused a panic due to concurrent map writes
when copying table metadata. The fix is to make a deep copy of the map
before updating it.

Fixes cockroachdb#66717

Release note (bug fix): Fixed a panic that could occur in the optimizer
when executing a prepared plan with placeholders. This could happen when
one of the tables used by the query had computed columns or a partial
index.
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 24, 2021
This commit fixes a bug that caused a panic due to concurrent map writes
when copying table metadata. The fix is to make a deep copy of the map
before updating it.

Fixes cockroachdb#66717

Release note (bug fix): Fixed a panic that could occur in the optimizer
when executing a prepared plan with placeholders. This could happen when
one of the tables used by the query had computed columns or a partial
index.
rytaft added a commit to rytaft/cockroach that referenced this issue Jun 24, 2021
This commit fixes a bug that caused a panic due to concurrent map writes
when copying table metadata. The fix is to make a deep copy of the map
before updating it.

Fixes cockroachdb#66717

Release note (bug fix): Fixed a panic that could occur in the optimizer
when executing a prepared plan with placeholders. This could happen when
one of the tables used by the query had computed columns or a partial
index.
craig bot pushed a commit that referenced this issue Jun 24, 2021
66792: opt: fix panic due to concurrent map writes when copying metadata r=rytaft a=rytaft

This commit fixes a bug that caused a panic due to concurrent map writes
when copying table metadata. The fix is to make a deep copy of the map
before updating it.

Fixes #66717

Release note (bug fix): Fixed a panic that could occur in the optimizer
when executing a prepared plan with placeholders. This could happen when
one of the tables used by the query had computed columns or a partial
index.

66802: backupccl: stream writes of returned SSTs to remote files r=dt a=dt

This changes the backup processor to open remote files for writing and then write the content of returned SSTs as they are returned instead of writing to an in-memory SSTable and then flushing that to cloud storage later.

Release note: none.

Co-authored-by: Rebecca Taft <[email protected]>
Co-authored-by: David Taylor <[email protected]>
@craig craig bot closed this as completed in 990807e Jun 24, 2021
@tbg tbg reopened this Jun 24, 2021
@rytaft
Copy link
Collaborator

rytaft commented Jun 24, 2021

Fixed by #66833

@rytaft rytaft closed this as completed Jun 24, 2021
JuanLeon1 pushed a commit that referenced this issue Jul 1, 2021
This commit fixes a bug that caused a panic due to concurrent map writes
when copying table metadata. The fix is to make a deep copy of the map
before updating it.

Fixes #66717

Release note (bug fix): Fixed a panic that could occur in the optimizer
when executing a prepared plan with placeholders. This could happen when
one of the tables used by the query had computed columns or a partial
index.
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants