Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: exit status 1 during roachprod start (all nodes actually start) #36963

Closed
cockroach-teamcity opened this issue Apr 19, 2019 · 4 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/9938cb1a2cca4c0350244f76845f0c61391d44a7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=12/cpu=16 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1249130&tab=buildLog

The test failed on release-19.1:
	cluster.go:1255,tpcc.go:730,search.go:47,search.go:177,tpcc.go:725,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-1249130-tpccbench-nodes-12-cpu-16:1-12 returned:
		stderr:
		
		stdout:
		prod.log;  export ROACHPROD=12 && GOTRACEBACK=crash COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING=1 COCKROACH_ENABLE_RPC_COMPRESSION=false ./cockroach start --insecure --store=path=/mnt/data1/cockroach --log-dir=${HOME}/logs --background --cache=25% --max-sql-memory=25% --port=26257 --http-port=26258 --locality=cloud=gce,region=us-east1,zone=us-east1-b --join=35.227.124.3:26257 >> ${HOME}/logs/cockroach.stdout.log 2>> ${HOME}/logs/cockroach.stderr.log || (x=$?; cat ${HOME}/logs/cockroach.stderr.log; exit $x)
		
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func7
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:397
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1420
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1333: 
		I190419 19:28:16.660868 1 cluster_synced.go:1502  command failed
		: exit status 1
	cluster.go:1688,tpcc.go:842,tpcc.go:543,test.go:1237: Goexit() was called
	cluster.go:953,context.go:89,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:842,tpcc.go:543,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1249130-tpccbench-nodes-12-cpu-16 --oneshot --ignore-empty-nodes: exit status 1 13: skipped
		12: dead
		5: 6621
		2: 6460
		3: 6195
		9: 6595
		1: 6416
		10: 6977
		4: 6905
		8: 6617
		6: 7066
		11: 6624
		7: 6387
		Error:  12: dead

@cockroach-teamcity cockroach-teamcity added this to the 19.1 milestone Apr 19, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Apr 19, 2019
@tbg
Copy link
Member

tbg commented Apr 23, 2019

@ajwerner could you close this once we capture more details as announced in #36961?

@tbg
Copy link
Member

tbg commented Apr 23, 2019

cc #37001

@tbg tbg changed the title roachtest: tpccbench/nodes=12/cpu=16 failed roachtest: exit status 1 during roachprod start (all nodes actually start) Apr 23, 2019
@tbg
Copy link
Member

tbg commented Apr 23, 2019

cc @bobvawter

@solongordon solongordon mentioned this issue Apr 23, 2019
18 tasks
@tbg tbg added C-test-failure Broken test (automatically or manually discovered). and removed C-test-failure Broken test (automatically or manually discovered). labels Apr 30, 2019
@ajwerner
Copy link
Contributor

ajwerner commented May 7, 2019

We've put in a few patches to try to mitigate and debug this in the future, namely #37001 and #37148. I'm closing it until it pops back up as there is no more action to take on the given information.

@ajwerner ajwerner closed this as completed May 7, 2019
tbg added a commit to tbg/cockroach that referenced this issue May 20, 2019
I added verbose logging in cockroachdb#37483 but made the poor decision to spew it
to stderr when the command exited nonzero. As a consequence, many test
failures are now obfuscated by high-verbosity ssh logs.

Change the behavior so that the error message is wrapped with a path to
the logs. We grab the contents of `~/.roachprod` in CI, so we'll be able
to access the logs in the artifacts (`roachprod_state`).

```
$ ./bin/roachprod ssh tobias-test -- echo hi
hi
$ ls ~/.roachprod/debug/
$ ./bin/roachprod ssh tobias-test -- false
Error:  ssh verbose log retained in /Users/tschottdorf/.roachprod/debug/ssh_35.231.83.24_2019-05-20T12:53:49Z: exit status 1
$ ls ~/.roachprod/debug/
ssh_35.231.83.24_2019-05-20T12:53:49Z
```

Touches cockroachdb#36963.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue May 20, 2019
I added verbose logging in cockroachdb#37483 but made the poor decision to spew it
to stderr when the command exited nonzero. As a consequence, many test
failures are now obfuscated by high-verbosity ssh logs.

Change the behavior so that the error message is wrapped with a path to
the logs. We grab the contents of `~/.roachprod` in CI, so we'll be able
to access the logs in the artifacts (`roachprod_state`).

```
$ ./bin/roachprod ssh tobias-test -- echo hi
hi
$ ls ~/.roachprod/debug/
$ ./bin/roachprod ssh tobias-test -- false
Error:  ssh verbose log retained in /Users/tschottdorf/.roachprod/debug/ssh_35.231.83.24_2019-05-20T12:53:49Z: exit status 1
$ ls ~/.roachprod/debug/
ssh_35.231.83.24_2019-05-20T12:53:49Z
```

Touches cockroachdb#36963.

Release note: None
craig bot pushed a commit that referenced this issue May 20, 2019
37594: roachprod: don't pollute logs with verbose ssh logs r=nvanbenschoten,ajwerner a=tbg

I added verbose logging in #37483 but made the poor decision to spew it
to stderr when the command exited nonzero. As a consequence, many test
failures are now obfuscated by high-verbosity ssh logs.

Change the behavior so that the error message is wrapped with a path to
the logs. We grab the contents of `~/.roachprod` in CI, so we'll be able
to access the logs in the artifacts (`roachprod_state`).

```
$ ./bin/roachprod ssh tobias-test -- echo hi
hi
$ ls ~/.roachprod/debug/
$ ./bin/roachprod ssh tobias-test -- false
Error:  ssh verbose log retained in /Users/tschottdorf/.roachprod/debug/ssh_35.231.83.24_2019-05-20T12:53:49Z: exit status 1
$ ls ~/.roachprod/debug/
ssh_35.231.83.24_2019-05-20T12:53:49Z
```

Touches #36963.

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

3 participants