Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: scaledata/filesystem_simulator/nodes=3 failed #50687

Closed
cockroach-teamcity opened this issue Jun 26, 2020 · 19 comments · Fixed by #51143
Closed

roachtest: scaledata/filesystem_simulator/nodes=3 failed #50687

cockroach-teamcity opened this issue Jun 26, 2020 · 19 comments · Fixed by #51143
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@d3791a81c0716478de08d44459d3fcf5b4f3ea1e:

	cluster.go:2484,scaledata.go:112,scaledata.go:49,test_runner.go:753: monitor failure: monitor task failed: output in run_064622.613_n4_filesystemsimulator_: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2041122-1593153685-15-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.229:26257,10.128.0.221:26257,10.128.0.245:26257'  returned: exit status 30
		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2472
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2480
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:112
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2528
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2115
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:108
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2460
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_064622.613_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2041122-1593153685-15-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.229:26257,10.128.0.221:26257,10.128.0.245:26257'  returned
		  | stderr:
		  | 6 06:47:19 RobustDB.RandomDB chose DB at index 0
		  | 2020/06/26 06:47:19 Created file 14_252 with uuid 5264ae30-0531-4e49-9325-cf679047fddb and parent /default
		  | 2020/06/26 06:47:19 RobustDB.RandomDB chose DB at index 0
		  | 2020/06/26 06:47:19 Created file 1_240 with uuid 3cc7195d-f99a-4fca-9097-42719f76a990 and parent /default
		  | 2020/06/26 06:47:19 Consistency Test 7_175 @ 1593154039238851230.0000000000: sizes :- files - 0, childRelations - 1611, stripes - 267
		  | 2020/06/26 06:47:19 Consistency Test 7_175 @ 1593154039238851230.0000000000: ChildRelation {/default 0_0 0820fa9c-31e5-4468-afc8-1e6a424c48c1 default}: /default parent does not exist in files
		  | Error: DEAD_ROACH_PROBLEM: exit status 1
		  | (1) DEAD_ROACH_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.229:26257,10.128.0.221:26257,10.128.0.245:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cockroach (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 30
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem_simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 26, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Jun 26, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@8f768ad14cfb3f514db6d40465b2dd60ee1f2890:

		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2455
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2463
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:112
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2511
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2115
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:108
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_061811.715_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2044000-1593238262-03-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.3:26257,10.128.0.61:26257,10.128.0.29:26257'  returned
		  | stderr:
		  | izes :- files - 1823, childRelations - 1822, stripes - 288
		  | 2020/06/27 06:19:07 RobustDB.RandomDB chose DB at index 0
		  | 2020/06/27 06:19:07 Created file 5_286 with uuid 11bddf86-6b71-4a7b-87d4-c240957eec4f and parent /default
		  | 2020/06/27 06:19:07 RobustDB.RandomDB chose DB at index 1
		  | 2020/06/27 06:19:07 Created file 15_295 with uuid 9f036582-e5ee-4cf6-a0eb-dbbf4442222d and parent /default
		  | 2020/06/27 06:19:07 Consistency Test 7_164 @ 1593238747719012033.0000000000: sizes :- files - 1825, childRelations - 0, stripes - 288
		  | 2020/06/27 06:19:07 Consistency Test 7_164 @ 1593238747719012033.0000000000: 7d3a7553-436a-45d4-9908-1571b4f9868f is parentless
		  | Error: DEAD_ROACH_PROBLEM: exit status 1
		  | (1) DEAD_ROACH_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.3:26257,10.128.0.61:26257,10.128.0.29:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cockroach (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 30
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem_simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@c627e3490d30e8ba88f6c7136717a392a054da4e:

		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2455
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2463
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:112
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2511
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2115
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:108
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_061929.155_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2045486-1593324849-02-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.18:26257,10.128.0.27:26257,10.128.0.43:26257'  returned
		  | stderr:
		  | domDB chose DB at index 2
		  | 2020/06/28 06:20:27 Consistency Test 5_167 @ 1593325226985974184.0000000000: sizes :- files - 1962, childRelations - 1961, stripes - 326
		  | 2020/06/28 06:20:27 RobustDB.RandomDB chose DB at index 0
		  | 2020/06/28 06:20:27 Created file 1_293 with uuid 18a6f273-07be-4e6e-9fa0-aa14479e3c28 and parent /default
		  | 2020/06/28 06:20:27 RobustDB.RandomDB chose DB at index 2
		  | 2020/06/28 06:20:27 Consistency Test 10_172 @ 1593325227007117792.0000000006: sizes :- files - 1962, childRelations - 0, stripes - 327
		  | 2020/06/28 06:20:27 Consistency Test 10_172 @ 1593325227007117792.0000000006: 478be52c-b20f-45b3-9f00-1b5d2ff324d7 is parentless
		  | Error: DEAD_ROACH_PROBLEM: exit status 1
		  | (1) DEAD_ROACH_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.18:26257,10.128.0.27:26257,10.128.0.43:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cockroach (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 30
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem_simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@17c8048e80935f8a01477416980d18bf39cba1bb:

	cluster.go:2467,scaledata.go:112,scaledata.go:49,test_runner.go:757: monitor failure: monitor task failed: output in run_062930.869_n4_filesystemsimulator_: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2046708-1593411872-13-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.107:26257,10.128.0.98:26257,10.128.0.97:26257'  returned: exit status 30
		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2455
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2463
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:112
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2511
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2115
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:108
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_062930.869_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2046708-1593411872-13-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.107:26257,10.128.0.98:26257,10.128.0.97:26257'  returned
		  | stderr:
		  | 0/06/29 06:30:27 Consistency Test 10_159 @ 1593412227431418372.0000000000: sizes :- files - 1766, childRelations - 1765, stripes - 310
		  | 2020/06/29 06:30:27 RobustDB.RandomDB chose DB at index 0
		  | 2020/06/29 06:30:27 Created file 9_294 with uuid fac83490-8d3c-4e99-bc0f-f5b5b18723b8 and parent /default
		  | 2020/06/29 06:30:27 RobustDB.RandomDB chose DB at index 2
		  | 2020/06/29 06:30:27 Consistency Test 8_159 @ 1593412227474824190.0000000000: sizes :- files - 0, childRelations - 0, stripes - 310
		  | 2020/06/29 06:30:27 Consistency Test 8_159 @ 1593412227474824190.0000000000: File uuid 028b2a4e-71aa-4ac2-aea3-5d865d8a2519 corresponding to stripe 0 not found
		  | Error: DEAD_ROACH_PROBLEM: exit status 1
		  | (1) DEAD_ROACH_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.107:26257,10.128.0.98:26257,10.128.0.97:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cockroach (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 30
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem_simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@3a03f3843a8cdf04f82c52753c61cf01b0d2ddcd:

		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2455
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2463
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:112
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2511
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2115
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:108
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_063216.077_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2054002-1593584816-11-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.235:26257,10.128.0.234:26257,10.128.0.236:26257'  returned
		  | stderr:
		  | chose DB at index 0
		  | 2020/07/01 06:33:10 RobustDB.RandomDB chose DB at index 1
		  | 2020/07/01 06:33:10 Consistency Test 8_167 @ 1593585190583228438.0000000000: sizes :- files - 1745, childRelations - 1744, stripes - 307
		  | 2020/07/01 06:33:11 RobustDB.RandomDB chose DB at index 0
		  | 2020/07/01 06:33:11 Writing new stripe 0
		  | 2020/07/01 06:33:11 &{b6f64417-8e9e-49eb-805b-843a3fc1be4e 0 default}
		  | 2020/07/01 06:33:11 Consistency Test 4_171 @ 1593585190572019031.0000000000: sizes :- files - 1745, childRelations - 0, stripes - 307
		  | 2020/07/01 06:33:11 Consistency Test 4_171 @ 1593585190572019031.0000000000: 5379b6d4-01bd-4319-af16-fcdce590e9ee is parentless
		  | Error: DEAD_ROACH_PROBLEM: exit status 1
		  | (1) DEAD_ROACH_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.235:26257,10.128.0.234:26257,10.128.0.236:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cockroach (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 30
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem_simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@456a07cfc1e53b87abc7709052e54efb1450e758:

		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2455
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2463
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:112
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2511
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2115
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:108
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_061952.966_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2062841-1593843322-14-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.103:26257,10.128.0.99:26257,10.128.0.93:26257'  returned
		  | stderr:
		  | a6-afca-434b7a605f04 and parent /default
		  | 2020/07/04 06:20:47 RobustDB.RandomDB chose DB at index 0
		  | 2020/07/04 06:20:47 RobustDB.RandomDB chose DB at index 2
		  | 2020/07/04 06:20:48 Consistency Test 3_171 @ 1593843647620875015.0000000000: sizes :- files - 1918, childRelations - 0, stripes - 347
		  | 2020/07/04 06:20:48 Created file 10_278 with uuid a37fd50b-ec01-4ec2-9c27-eb7e2fcf67bd and parent /default
		  | 2020/07/04 06:20:48 Consistency Test 8_171 @ 1593843647608597533.0000000000: sizes :- files - 1918, childRelations - 0, stripes - 347
		  | 2020/07/04 06:20:48 Consistency Test 3_171 @ 1593843647620875015.0000000000: 5241ff98-4934-40e9-a2b9-0e58d0028f43 is parentless
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.103:26257,10.128.0.99:26257,10.128.0.93:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem_simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@3e0de239121813ea4d47873388a2828a66d9edf7:

		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2455
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2463
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:112
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2511
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2115
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:108
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2445
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_061641.633_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2063962-1593929497-06-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.32:26257,10.128.0.21:26257,10.128.0.34:26257'  returned
		  | stderr:
		  | 836, childRelations - 1835, stripes - 350
		  | 2020/07/05 06:17:36 Created file 15_281 with uuid dad1b1e6-f5bb-4f44-bc64-f3df4b075407 and parent /default
		  | 2020/07/05 06:17:36 RobustDB.RandomDB chose DB at index 2
		  | 2020/07/05 06:17:36 RobustDB.RandomDB chose DB at index 0
		  | 2020/07/05 06:17:36 Consistency Test 2_175 @ 1593929855931341452.0000000000: sizes :- files - 1836, childRelations - 0, stripes - 350
		  | 2020/07/05 06:17:36 Consistency Test 5_176 @ 1593929855901674202.0000000000: sizes :- files - 1836, childRelations - 0, stripes - 349
		  | 2020/07/05 06:17:36 Consistency Test 2_175 @ 1593929855931341452.0000000000: 299cbd29-f961-4571-87ea-f5ce6c80e045 is parentless
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.32:26257,10.128.0.21:26257,10.128.0.34:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem_simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@9304ecd70e9f3ba4cb16b5443a10b4e17d7baee0:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2471,scaledata.go:112,scaledata.go:49,test_runner.go:757: monitor failure: monitor task failed: output in run_063731.968_n4_filesystemsimulator_: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2068260-1594103442-02-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.1.36:26257,10.128.1.50:26257,10.128.1.35:26257'  returned: exit status 20
		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2459
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2467
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:112
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2515
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2119
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:108
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2449
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_063731.968_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2068260-1594103442-02-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.1.36:26257,10.128.1.50:26257,10.128.1.35:26257'  returned
		  | stderr:
		  | attempt 1 failed, started at 2020-07-07 06:39:32.099039864 +0000 UTC m=+119.429841680, now = 2020-07-07 06:39:32.099988808 +0000 UTC m=+119.430790641, took 948.961µs
		  | 2020/07/07 06:39:32 Attempt failed with error dial tcp 10.128.1.50:26257: connect: connection refused: ... Retrying after sleeping 5ns
		  | 2020/07/07 06:39:32 Consistency Test 10_305 @ 1594103971783613185.0000000000: sizes :- files - 3990, childRelations - 3989, stripes - 611
		  | 2020/07/07 06:39:32 Consistency Test 6_317 @ 1594103971780569654.0000000000: ChildRelation {/default 0_100 a6a9e9c9-e26e-4642-b078-493398cecea6 default}: a6a9e9c9-e26e-4642-b078-493398cecea6 child does not exist in files
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.1.36:26257,10.128.1.50:26257,10.128.1.35:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem_simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@nvanbenschoten
Copy link
Member

@irfansharif would you mind taking a look at whether this is the same issue as #50175, but on master?

@irfansharif
Copy link
Contributor

Hm, these don't look like benign setup errors:

2020/07/07 06:39:32 Consistency Test 6_317 @ 1594103971780569654.0000000000: ChildRelation {/default 0_100 a6a9e9c9-e26e-4642-b078-493398cecea6 default}: a6a9e9c9-e26e-4642-b078-493398cecea6 child does not exist in files
2020/07/05 06:17:36 Consistency Test 2_175 @ 1593929855931341452.0000000000: 299cbd29-f961-4571-87ea-f5ce6c80e045 is parentless
Error: COMMAND_PROBLEM: exit status 1

@irfansharif
Copy link
Contributor

would you mind taking a look at whether this is the same issue as #50175, but on master?

Not the same issue, but looks like failure mode was introduced ~11 days ago. I'll try repro-ing now (and bisecting if not immediately obvious).

@irfansharif
Copy link
Contributor

irfansharif commented Jul 7, 2020

Immediately reproducible. As a future note to myself to clean up how scaledata tests are run, here's how to run scaledata tests locally.


# In your cockroachdb/rksql checkout
cd $GOPATH/src/github.com/cockroachdb/rksql
src/go/BUILD.py
cp src/go/bin/filesystem_simulator $GOPATH/src/github.com/cockroachdb/cockroach

cd $GOPATH/src/github.com/cockroachdb/cockroach
make bin/roachprod; make bin/roachtest
roachprod wipe local; roachprod destroy local
bin/roachtest run scaledata/filesystem_simulator/nodes=3 --wipe=false --cockroach ./cockroach --roachprod bin/roachprod --local

With the following diff applied to your cockroachdb checkout.

diff --git i/pkg/cmd/roachtest/scaledata.go w/pkg/cmd/roachtest/scaledata.go
index 9fd5d9abf9..4d6a27b11c 100644
--- i/pkg/cmd/roachtest/scaledata.go
+++ w/pkg/cmd/roachtest/scaledata.go
@@ -13,11 +13,8 @@ package main
 import (
 	"context"
 	"fmt"
-	"runtime"
 	"strings"
 	"time"
-
-	"github.com/cockroachdb/cockroach/pkg/util/binfetcher"
 )
 
 func registerScaleData(r *testRegistry) {
@@ -58,21 +55,8 @@ func runSqlapp(ctx context.Context, t *test, c *cluster, app, flags string, dur
 	roachNodes := c.Range(1, roachNodeCount)
 	appNode := c.Node(c.spec.NodeCount)
 
-	if local && runtime.GOOS != "linux" {
-		t.Fatalf("must run on linux os, found %s", runtime.GOOS)
-	}
-	b, err := binfetcher.Download(ctx, binfetcher.Options{
-		Component: "rubrik",
-		Binary:    app,
-		Version:   "LATEST",
-		GOOS:      "linux",
-		GOARCH:    "amd64",
-	})
-	if err != nil {
-		t.Fatal(err)
-	}
-
-	c.Put(ctx, b, app, appNode)
+	// Expects to find the named binary in repo root.
+	c.Put(ctx, app, app, appNode)
 	c.Put(ctx, cockroach, "./cockroach", roachNodes)
 	c.Start(ctx, t, roachNodes)

@irfansharif
Copy link
Contributor

The bisect script just terminated, definitively pointing to ed02ab5 (#50388) as the culprit. I manually verified that the test does not fail in the commit before, and does for the faulty commit.

@asubiotto, mind taking a look? I'm not sure about much of the area touched in #50388, and why this failure would be caused by it. To understand what this file simulator test is doing, take a look at https://github.com/cockroachdb/rksql/blob/master/src/go/src/rubrik/sqlapp/filesystem_simulator/main.go

After building the right roachtest binary, using the patch above, it suffices to simply use the following to run the test (this is also including the manually constructed filesystem_simulator binary as per the steps above, from https://github.com/cockroachdb/rksql). Takes a few minutes.

make buildshort
roachprod wipe local; roachprod destroy local
bin/roachtest run scaledata/filesystem_simulator/nodes=3 --wipe=false --cockroach ./cockroach --roachprod bin/roachprod --local

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@e3fb5aa18d0f5064f7ba5d4df3864e94b3abb96d:

	cluster.go:2471,scaledata.go:112,scaledata.go:49,test_runner.go:757: monitor failure: monitor task failed: output in run_063708.339_n4_filesystemsimulator_: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2072412-1594189844-08-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.1.167:26257,10.128.1.164:26257,10.128.1.149:26257'  returned: exit status 20
		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2459
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2467
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:112
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2515
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2119
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:108
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2449
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_063708.339_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2072412-1594189844-08-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.1.167:26257,10.128.1.164:26257,10.128.1.149:26257'  returned
		  | stderr:
		  | 06:38:03 Deleted child_relations for uuid b4d52851-a83e-4b36-8045-42fbadf6db92
		  | 2020/07/08 06:38:03 Deleted &{b4d52851-a83e-4b36-8045-42fbadf6db92 1 0 167 default}
		  | 2020/07/08 06:38:03 RobustDB.RandomDB chose DB at index 0
		  | 2020/07/08 06:38:03 Created file 12_246 with uuid b533cd54-f8c6-4e6e-a095-244bf6d0cab6 and parent /default
		  | 2020/07/08 06:38:03 Consistency Test 14_170 @ 1594190283070739244.0000000000: sizes :- files - 0, childRelations - 1666, stripes - 258
		  | 2020/07/08 06:38:03 Consistency Test 14_170 @ 1594190283070739244.0000000000: ChildRelation {/default 0_105 85d1e48b-3971-4db4-bc1a-27018524d106 default}: /default parent does not exist in files
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.1.167:26257,10.128.1.164:26257,10.128.1.149:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem_simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@asubiotto
Copy link
Contributor

I'll take a look

@asubiotto
Copy link
Contributor

asubiotto commented Jul 8, 2020

Looks like the test passes with the following diff:

diff --git a/pkg/sql/colexec/materializer.go b/pkg/sql/colexec/materializer.go
index c2a8c701d9..d13fc157cd 100644
--- a/pkg/sql/colexec/materializer.go
+++ b/pkg/sql/colexec/materializer.go
@@ -168,10 +168,21 @@ func NewMaterializer(
                output,
                nil, /* memMonitor */
                execinfra.ProcStateOpts{
-                       InputsToDrain: []execinfra.RowSource{m.drainHelper},
+                       //InputsToDrain: []execinfra.RowSource{m.drainHelper},
                        TrailingMetaCallback: func(ctx context.Context) []execinfrapb.ProducerMetadata {
-                               m.InternalClose()
-                               return nil
+                               var resultMeta []execinfrapb.ProducerMetadata
+                               for {
+                                       row, meta := m.drainHelper.Next()
+                                       if meta != nil {
+                                               resultMeta = append(resultMeta, *meta)
+                                       }
+                                       if row == nil && meta == nil {
+                                               break
+                                       }
+                               }
+                               defer m.InternalClose()
+                               return resultMeta
                        },
                },
        ); err != nil {

The difference is that this patch doesn't swallow ReadWithinUncertaintyInterval errors when draining which was the previous behavior that the referenced commit fixed. We should swallow these errors when draining because returning an error like this after results have been returned doesn't make sense (and we did this in the row execution engine). The interesting thing is that this test doesn't fail with row execution, which means we're incorrectly swallowing one of these errors in the vectorized execution cc @yuzefovich in case you see anything I don't.

@asubiotto
Copy link
Contributor

I think the culprit is the vectorized inbox (looking at the plans of the queries dropping this error). If it encounters any metadata during execution, it buffers it and returns it later. This means that the flow first transitions to draining and only then observes the error, which is swallowed because it's a ReadWithinUncertaintyError so the materializer thinks that this was an error that we got while draining. The correct behavior would be to propagate that error to the materializer immediately, which would cause the error to be returned since it was observed during execution. I'm currently trying to verify this.

@asubiotto
Copy link
Contributor

That was it. PR incoming. Unfortunately I think the commit that exposed this issue is part of the alpha so that will need to be restarted.

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@1b5d070c93375d3e14c146241e8bafde349529bd:

		(1) attached stack trace
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2459
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2467
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:112
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2515
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2119
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:108
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2449
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_061458.316_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2076266-1594275003-01-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.15.195:26257,10.128.15.210:26257,10.128.0.252:26257'  returned
		  | stderr:
		  | file 13_998 with uuid 4fb36028-0855-460b-a44b-c3872da0d052 and parent /default
		  | 2020/07/09 06:18:23 ExecuteTx retry attempt 1 failed, started at 2020-07-09 06:18:23.523162134 +0000 UTC m=+204.509943392, now = 2020-07-09 06:18:23.879212616 +0000 UTC m=+204.865993920, took 356.050528ms
		  | 2020/07/09 06:18:23 pq error - Error code : 57014, Error class : 57
		  | 2020/07/09 06:18:23 pq error - Error code : 57014, Error class : 57
		  | 2020/07/09 06:18:23 Aborting Retries because this error of type *pq.Error is not retryable : pq: query execution canceled
		  | 2020/07/09 06:18:23 postgres error code is 57014 and class is 57
		  | 2020/07/09 06:18:23 pq: query execution canceled
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.15.195:26257,10.128.15.210:26257,10.128.0.252:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem_simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem-simulator/nodes=3 failed on master@e9a4f83e3eee59510f97db2c6e0df9b57cf6b944:

		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2541
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2549
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:119
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:49
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2597
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2201
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:115
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2531
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (6) 2 safe details enclosed
		Wraps: (7) output in run_062425.182_n4_filesystemsimulator_
		Wraps: (8) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2107908-1595398673-15-n4cpu4:4 -- ./filesystem-simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.15.232:26257,10.128.15.229:26257,10.128.15.231:26257'  returned
		  | stderr:
		  | and parent /default
		  | 2020/07/22 06:26:28 RobustDB.RandomDB chose DB at index 2
		  | 2020/07/22 06:26:28 ExecuteTx retry attempt 1 failed, started at 2020-07-22 06:26:28.523019493 +0000 UTC m=+122.620008318, now = 2020-07-22 06:26:28.745459312 +0000 UTC m=+122.842448193, took 222.439875ms
		  | 2020/07/22 06:26:28 pq error - Error code : 57014, Error class : 57
		  | 2020/07/22 06:26:28 pq error - Error code : 57014, Error class : 57
		  | 2020/07/22 06:26:28 Aborting Retries because this error of type *pq.Error is not retryable : pq: query execution canceled
		  | 2020/07/22 06:26:28 postgres error code is 57014 and class is 57
		  | 2020/07/22 06:26:28 pq: query execution canceled
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 4. Command with error:
		  |   | ```
		  |   | ./filesystem-simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.15.232:26257,10.128.15.229:26257,10.128.15.231:26257'
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (9) exit status 20
		Error types: (1) *withstack.withStack (2) *errutil.withMessage (3) *withstack.withStack (4) *errutil.withMessage (5) *withstack.withStack (6) *safedetails.withSafeDetails (7) *errutil.withMessage (8) *main.withCommandDetails (9) *exec.ExitError

More

Artifacts: /scaledata/filesystem-simulator/nodes=3
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants