Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: disk-stalled/log=true,data=true failed #52181

Closed
cockroach-teamcity opened this issue Jul 31, 2020 · 4 comments · Fixed by #52343
Closed

roachtest: disk-stalled/log=true,data=true failed #52181

cockroach-teamcity opened this issue Jul 31, 2020 · 4 comments · Fixed by #52343
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).disk-stalled/log=true,data=true failed on release-19.1@f02b6abab9b5688fe06c28105f77c4be4f4c2623:

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/disk-stalled/log=true_data=true/run_1
	disk_stall.go:130,disk_stall.go:40,test_runner.go:754: unexpected output: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2134079-1596174811-33-n1cpu4:1 -- timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=40ms ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/faulty/logs: exit status 20 Error: unknown command "start-single-node" for "cockroach"
		Run 'cockroach --help' for usage.
		Failed running "cockroach"
		Error: COMMAND_PROBLEM: exit status 1
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=40ms ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/faulty/logs
		  | ```
		Wraps: (3) exit status 1
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=true

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-release-19.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 31, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Jul 31, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=true failed on release-19.1@7c03505d8daa19dee7f5f0268c9e728e38d4ba6d:

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/disk-stalled/log=true_data=true/run_1
	disk_stall.go:130,disk_stall.go:40,test_runner.go:754: unexpected output: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2137346-1596261039-32-n1cpu4:1 -- timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=40ms ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/faulty/logs: exit status 20 Error: unknown command "start-single-node" for "cockroach"
		Run 'cockroach --help' for usage.
		Failed running "cockroach"
		Error: COMMAND_PROBLEM: exit status 1
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=40ms ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/faulty/logs
		  | ```
		Wraps: (3) exit status 1
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=true

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=true failed on release-19.1@86b7271623ad797e9c42d5f7900a5cb424fed436:

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/disk-stalled/log=true_data=true/run_1
	disk_stall.go:130,disk_stall.go:40,test_runner.go:754: unexpected output: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2145078-1596520433-32-n1cpu4:1 -- timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=40ms ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/faulty/logs: exit status 20 Error: unknown command "start-single-node" for "cockroach"
		Run 'cockroach --help' for usage.
		Failed running "cockroach"
		Error: COMMAND_PROBLEM: exit status 1
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=40ms ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/faulty/logs
		  | ```
		Wraps: (3) exit status 1
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=true

See this test on roachdash
powered by pkg/cmd/internal/issues

@petermattis
Copy link
Collaborator

@irfansharif This looks like fallout from the roachprod start refactor: unknown command "start-single-node" for "cockroach".

@irfansharif
Copy link
Contributor

irfansharif commented Aug 4, 2020

This test doesn't make use of all the roachtest start smartness to figure out what sub-command to use, instead using

"./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/%s --log-dir {store-dir}/%s",

Given our support policy, I'm just going to bump the minimum version this test expects to run. Another instance where #51897 seems the right direction we should be going towards.

craig bot pushed a commit that referenced this issue Aug 4, 2020
51959: backupccl: distribute RESTORE work using DistSQL r=dt,yuzefovich a=pbardea

This commit replaces restore's old method of coordinating work on a
single node to instead use DistSQL. It creates a 2 stage DistSQL flow.
The first stage assigns chunks to SplitAndScatter processors which
forward spans that they scattered to the second stage made up of the
RestoreData processors which ingest the data. The SplitAndScatter
processors attempt to send the work to the RestoreData colocated with
the leaseholder of the span after the scattering.

Release note: None.

52269: roachtest: correctly start crdb in acceptance/rapid_restart r=andreimatei a=andreimatei

--start-single-node was needed. Without it, I think the test's killing
of a node raced with that node dieing by itself, and sometimes the race
resulted in `cockroach stop` first seeing the process but then
`/bin/bash: line 8: kill: (16016) - No such proces`

Fixes #52060

Release note: None

52273: roachtest: remove confusing "--- PASS" log lines r=andreimatei a=andreimatei

Two workloads were printing PASS/FAIL lines from inside the predicates
passed to Searcher.Search. This is really confusing when reading the
test's output, because they don't correspond to the test's disposition.

Release note: None

52343: roachtest: reflake (skip) disk-stalled test for release-19.1 r=irfansharif a=irfansharif

Fixes #52181. This test doesn't make use of all the `roachtest start`
smartness to figure out what sub-command to use and instead constructs
the raw start command by hand.

Given our support policy, let's just bump the minimum version this test
expects to run. This feels like another instance where #51897 would be 
nice to have.

Release note: None

Co-authored-by: Paul Bardea <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
@craig craig bot closed this as completed in 4189198 Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants