Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: schemachange/index/tpcc/w=1000 failed #36094

Closed
cockroach-teamcity opened this issue Mar 25, 2019 · 47 comments · Fixed by #38579
Closed

roachtest: schemachange/index/tpcc/w=1000 failed #36094

cockroach-teamcity opened this issue Mar 25, 2019 · 47 comments · Fixed by #38579
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/5a746073c3f8ede851f37dd895cf1a91d6dcc3cf

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1195714&tab=buildLog

The test failed on master:
	cluster.go:1267,tpcc.go:130,cluster.go:1605,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1195714-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		    0.0      0.0      0.0      0.0      0.0 delivery
		  11m35s    61978            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		  11m35s    61978            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		  11m35s    61978            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  11m35s    61978            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190324 14:19:33.328096 1 workload/cli/run.go:420  error in delivery: dial tcp 10.142.0.78:26257: connect: connection refused
		  11m36s    91193            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		  11m36s    91193            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		  11m36s    91193            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		  11m36s    91193            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  11m36s    91193            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:1202: test timed out (6h0m0s)
	schemachange.go:446,schemachange.go:314,cluster.go:1605,errgroup.go:57: pq: server is not accepting clients
	cluster.go:1626,tpcc.go:140,schemachange.go:310,test.go:1214: unexpected node event: 2: dead
	test.go:978,asm_amd64.s:523,panic.go:513,log.go:219,cluster.go:926,context.go:90,cluster.go:916,test.go:1159,asm_amd64.s:522,panic.go:397,test.go:774,test.go:760,cluster.go:1626,tpcc.go:140,schemachange.go:310,test.go:1214: write /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190324-1195714/schemachange/index/tpcc/w=1000/test.log: file already closed

@cockroach-teamcity cockroach-teamcity added this to the 19.1 milestone Mar 25, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Mar 25, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/25398c010b2af75b11fed189680ea6b9645f0cf5

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1199659&tab=buildLog

The test failed on master:
	cluster.go:1267,tpcc.go:130,cluster.go:1605,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1199659-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		     0.0      0.0 orderStatus
		   8m56s    39618            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   8m56s    39618            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190326 14:17:22.036792 1 workload/cli/run.go:420  error in payment: dial tcp 10.142.0.26:26257: connect: connection refused
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   8m57s   102718            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   8m57s   102718            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   8m57s   102718            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   8m57s   102718            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   8m57s   102718            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190326 14:17:23.036797 1 workload/cli/run.go:420  error in orderStatus: dial tcp 10.142.0.26:26257: connect: connection refused
		: signal: killed
	schemachange.go:446,schemachange.go:314,cluster.go:1605,errgroup.go:57: read tcp 172.17.0.2:38170->35.196.225.213:26257: read: connection reset by peer
	cluster.go:1626,tpcc.go:140,schemachange.go:310,test.go:1214: unexpected node event: 2: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/23f9707873abbd2de91a42055535529d7ff296ce

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1209900&tab=buildLog

The test failed on release-19.1:
	cluster.go:1293,tpcc.go:130,cluster.go:1631,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1209900-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		Error: read tcp 10.142.0.11:47180->10.142.0.6:26257: read: connection reset by peer
		Error:  exit status 1
		: exit status 1
	cluster.go:1652,tpcc.go:140,schemachange.go:310,test.go:1223: Goexit() was called

@vivekmenezes
Copy link
Contributor

14:06:12 schemachange.go:447: addindex: running statement 1...
Error: read tcp 10.142.0.11:47180->10.142.0.6:26257: read: connection reset by peer
Error:  exit status 1
14:08:22 schemachange.go:452: addindex: statement 1: "CREATE UNIQUE INDEX ON tpcc.order (o_entry_d, o_w_id, o_d_id, o_carrier_id, o_id);" took 2m9.693082468s
14:09:22 schemachange.go:447: addindex: running statement 2...
14:12:30 schemachange.go:452: addindex: statement 2: "CREATE INDEX ON tpcc.order (o_carrier_id);" took 3m7.95380517s
14:13:30 schemachange.go:447: addindex: running statement 3...
14:55:06 schemachange.go:452: addindex: statement 3: "CREATE INDEX ON tpcc.customer (c_last, c_first);" took 41m36.21531487s

this test seems to have passed. But I don't understand the error above. The test seems to have move forward after hitting the error.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5921cf0dcc76548931cc85500c0fa2186a82142f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1212185&tab=buildLog

The test failed on release-19.1:
	cluster.go:1293,tpcc.go:130,cluster.go:1631,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1212185-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		      0.0 newOrder
		  37m11s    38428            0.0            1.5      0.0      0.0      0.0      0.0 orderStatus
		  37m11s    38428            0.0           13.6      0.0      0.0      0.0      0.0 payment
		  37m11s    38428            0.0            1.5      0.0      0.0      0.0      0.0 stockLevel
		E190401 12:19:04.488603 1 workload/cli/run.go:420  error in stockLevel: dial tcp 10.128.0.24:26257: connect: connection refused
		  37m12s    88928            0.0            1.5      0.0      0.0      0.0      0.0 delivery
		  37m12s    88928            0.0           15.0      0.0      0.0      0.0      0.0 newOrder
		  37m12s    88928            0.0            1.5      0.0      0.0      0.0      0.0 orderStatus
		  37m12s    88928            0.0           13.6      0.0      0.0      0.0      0.0 payment
		  37m12s    88928            0.0            1.5      0.0      0.0      0.0      0.0 stockLevel
		E190401 12:19:05.488620 1 workload/cli/run.go:420  error in payment: dial tcp 10.128.0.24:26257: connect: connection refused
		: signal: killed
	cluster.go:1652,tpcc.go:140,schemachange.go:310,test.go:1223: unexpected node event: 3: dead
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:774,test.go:760,cluster.go:1652,tpcc.go:140,schemachange.go:310,test.go:1223: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1212185-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		3: dead
		4: 3135
		1: 3121
		2: 3676
		Error:  3: dead

@vivekmenezes
Copy link
Contributor

node 3 died on an OOO

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5267932f6fec0405b31328c1ad43711b0bb013e5

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1220238&tab=buildLog

The test failed on master:
	cluster.go:1329,tpcc.go:130,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1220238-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		     0.0 newOrder
		    3m9s    40833            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		    3m9s    40833            0.0            0.0      0.0      0.0      0.0      0.0 payment
		    3m9s    40833            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190404 14:24:48.798789 1 workload/cli/run.go:420  error in payment: dial tcp 10.142.0.50:26257: connect: connection refused
		   3m10s    92703            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   3m10s    92703            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   3m10s    92703            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   3m10s    92703            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   3m10s    92703            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190404 14:24:49.798913 1 workload/cli/run.go:420  error in orderStatus: dial tcp 10.142.0.50:26257: connect: connection refused
		: signal: killed
	test.go:1216: test timed out (6h0m0s)
	schemachange.go:450,schemachange.go:314,cluster.go:1667,errgroup.go:57: pq: server is not accepting clients
	cluster.go:1688,tpcc.go:140,schemachange.go:310,test.go:1228: unexpected node event: 2: dead
	cluster.go:1405,cluster.go:1424,cluster.go:1528,cluster.go:968,asm_amd64.s:522,panic.go:397,test.go:776,test.go:762,cluster.go:1688,tpcc.go:140,schemachange.go:310,test.go:1228: context canceled

@tbg
Copy link
Member

tbg commented Apr 5, 2019

It looks like a node died around an hour into the test, and then the test never returned until it timed out (at the 6h mark):

E190404 14:24:49.798913 1 workload/cli/run.go:420  error in orderStatus: dial tcp 10.142.0.50:26257: connect: connection refused
19:18:20 test.go:798: test failure: 	test.go:1216: test timed out (6h0m0s)

The timeout wasn't handled properly and prevented debug info from being collected (the contexts were expired so all the commands failed), so we don't know why a node died. I'm honestly not quite sure why the contexts were canceled (from the looks of it, we shouldn've only cancelled the one given to the test, not the one used for getting the debugging) but the main problem is also that we were destroying the cluster the very moment the timeout hit, so there wasn't a chance in the world that the debugging code would've grabbed something useful from it.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1a5eabad4511a3371a6b2809d2bfc29e8aff66a6

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1224702&tab=buildLog

The test failed on master:
	cluster.go:1329,tpcc.go:130,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1224702-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		    0.0 newOrder
		  10m22s    35840            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		  10m22s    35840            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  10m22s    35840            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190406 14:07:29.463604 1 workload/cli/run.go:420  error in newOrder: dial tcp 10.142.0.55:26257: connect: connection refused
		  10m23s    84559            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		  10m23s    84559            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		  10m23s    84559            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		  10m23s    84559            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  10m23s    84559            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190406 14:07:30.463701 1 workload/cli/run.go:420  error in orderStatus: dial tcp 10.142.0.55:26257: connect: connection refused
		: signal: killed
	schemachange.go:450,schemachange.go:314,cluster.go:1667,errgroup.go:57: pq: internal error: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_CLIENT_REJECT): id=7a07798c key=/Table/SystemConfigSpan/Start rw=true pri=0.06451714 stat=ABORTED epo=0 ts=1554559203.450241166,0 orig=1554558927.020530612,0 max=1554558927.520530612,0 wto=true seq=1
	cluster.go:1688,tpcc.go:140,schemachange.go:310,test.go:1228: unexpected node event: 3: dead
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:776,test.go:762,cluster.go:1688,tpcc.go:140,schemachange.go:310,test.go:1228: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1224702-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		3: dead
		2: 3658
		1: 3601
		4: 3540
		Error:  3: dead
	test.go:1216: test timed out (6h0m0s)
	test.go:986,asm_amd64.s:523,panic.go:513,log.go:219,cluster.go:926,context.go:90,cluster.go:916,test.go:1179,asm_amd64.s:522,panic.go:397,test.go:776,test.go:766,cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:776,test.go:762,cluster.go:1688,tpcc.go:140,schemachange.go:310,test.go:1228: write /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190406-1224702/schemachange/index/tpcc/w=1000/test.log: file already closed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/6da68d7fe2c9a29b85e2ec0c7e545a0d6bdc4c5c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1226521&tab=buildLog

The test failed on release-19.1:
	schemachange.go:450,schemachange.go:314,cluster.go:1667,errgroup.go:57: read tcp 172.17.0.2:39874->35.243.134.143:26257: read: connection reset by peer
	cluster.go:1329,tpcc.go:169,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1226521-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		   2.2      0.0      0.0      0.0      0.0 delivery
		1h27m41s    23239            0.0           22.3      0.0      0.0      0.0      0.0 newOrder
		1h27m41s    23239            0.0            2.2      0.0      0.0      0.0      0.0 orderStatus
		1h27m41s    23239            0.0           21.6      0.0      0.0      0.0      0.0 payment
		1h27m41s    23239            0.0            2.2      0.0      0.0      0.0      0.0 stockLevel
		E190407 15:07:57.845702 1 workload/cli/run.go:420  error in newOrder: dial tcp 10.142.0.111:26257: connect: connection refused
		1h27m42s    80332            0.0            2.2      0.0      0.0      0.0      0.0 delivery
		1h27m42s    80332            0.0           22.3      0.0      0.0      0.0      0.0 newOrder
		1h27m42s    80332            0.0            2.2      0.0      0.0      0.0      0.0 orderStatus
		1h27m42s    80332            0.0           21.6      0.0      0.0      0.0      0.0 payment
		1h27m42s    80332            0.0            2.2      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1688,tpcc.go:179,schemachange.go:310,test.go:1237: Goexit() was called
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:179,schemachange.go:310,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1226521-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		1: dead
		4: 3562
		2: 3706
		3: dead
		Error:  3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/58c458efeaa3b38c8c982f23a36381aac1b1004b

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1226503&tab=buildLog

The test failed on master:
	cluster.go:1329,tpcc.go:169,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1226503-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		      0.0            1.2      0.0      0.0      0.0      0.0 orderStatus
		  53m39s    53648            0.0            9.8      0.0      0.0      0.0      0.0 payment
		  53m39s    53648            0.0            1.1      0.0      0.0      0.0      0.0 stockLevel
		E190407 15:07:54.804864 1 workload/cli/run.go:420  error in newOrder: dial tcp 10.142.0.193:26257: connect: connection refused
		  53m40s    78679            0.0            1.1      0.0      0.0      0.0      0.0 delivery
		  53m40s    78679            0.0           11.2      0.0      0.0      0.0      0.0 newOrder
		  53m40s    78679            0.0            1.2      0.0      0.0      0.0      0.0 orderStatus
		  53m40s    78679            0.0            9.8      0.0      0.0      0.0      0.0 payment
		  53m40s    78679            0.0            1.1      0.0      0.0      0.0      0.0 stockLevel
		E190407 15:07:55.816081 1 workload/cli/run.go:420  error in stockLevel: ERROR: communication error: rpc error: code = Canceled desc = context canceled (SQLSTATE 08006)
		: signal: killed
	schemachange.go:450,schemachange.go:314,cluster.go:1667,errgroup.go:57: dial tcp 34.73.18.251:26257: connect: connection refused
	cluster.go:1688,tpcc.go:179,schemachange.go:310,test.go:1237: unexpected node event: 4: dead
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:179,schemachange.go:310,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1226503-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		4: dead
		1: dead
		3: 3373
		2: 3114
		Error:  1: dead

@tbg
Copy link
Member

tbg commented Apr 8, 2019

New failures here over the weekend that look like overloaded nodes. @vivekmenezes can you triage?

@vivekmenezes
Copy link
Contributor

This is not looking like a legit failure. Will continue to keep an eye on it for now.

@tbg
Copy link
Member

tbg commented Apr 8, 2019

Can you help me understand why these failures are not legit? I don't think nodes should be dying during this test.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/682c2f2f466bbf768545ca4687822206a63983ad

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1231772&tab=buildLog

The test failed on master:
	cluster.go:1329,tpcc.go:158,tpcc.go:160,schemachange.go:310,test.go:1237: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1231772-schemachange-index-tpcc-w-1000:5 -- ./workload fixtures load tpcc --warehouses=1000  {pgurl:1} returned:
		stderr:
		
		stdout:
		I190410 13:10:24.955284 1 ccl/workloadccl/cliccl/fixtures.go:296  starting load of 9 tables
		I190410 13:11:56.251653 28 ccl/workloadccl/fixture.go:601  loaded district (1m31s, 10000 rows, 0 index entries, 1006 KiB)
		I190410 13:12:22.425577 27 ccl/workloadccl/fixture.go:601  loaded warehouse (1m57s, 1000 rows, 0 index entries, 53 KiB)
		I190410 13:12:22.880164 33 ccl/workloadccl/fixture.go:601  loaded item (1m58s, 100000 rows, 0 index entries, 7.8 MiB)
		I190410 13:13:55.470286 32 ccl/workloadccl/fixture.go:601  loaded new_order (3m31s, 9000000 rows, 0 index entries, 126 MiB)
		I190410 13:16:15.596322 31 ccl/workloadccl/fixture.go:601  loaded order (5m51s, 30000000 rows, 60000000 index entries, 1.8 GiB)
		I190410 13:16:49.485941 30 ccl/workloadccl/fixture.go:601  loaded history (6m25s, 30000000 rows, 60000000 index entries, 3.8 GiB)
		Error: restoring fixture: dial tcp 10.142.0.104:26257: connect: connection refused
		Error:  exit status 1
		: exit status 1
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1329,tpcc.go:158,tpcc.go:160,schemachange.go:310,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1231772-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		3: 3072
		2: 3312
		4: 3239
		1: dead
		Error:  1: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/bf399d2677783dc1eea7f5ede6d4561f95c0ea10

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1234662&tab=buildLog

The test failed on master:
	cluster.go:1329,tpcc.go:169,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1234662-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		0.0 delivery
		   3m50s    13634            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   3m50s    13634            0.0            0.1      0.0      0.0      0.0      0.0 orderStatus
		   3m50s    13634            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   3m50s    13634            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190411 15:00:35.587556 1 workload/cli/run.go:428  error in newOrder: ERROR: communication error: rpc error: code = Canceled desc = context canceled (SQLSTATE 08006)
		   3m51s    13635            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   3m51s    13635            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   3m51s    13635            0.0            0.1      0.0      0.0      0.0      0.0 orderStatus
		   3m51s    13635            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   3m51s    13635            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	schemachange.go:450,schemachange.go:314,cluster.go:1667,errgroup.go:57: pq: internal error: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_CLIENT_REJECT): id=89045445 key=/Table/SystemConfigSpan/Start rw=true pri=0.01746658 stat=ABORTED epo=0 ts=1554994856.451676422,0 orig=1554994668.349094437,0 max=1554994668.849094437,0 wto=true seq=1
	cluster.go:1688,tpcc.go:179,schemachange.go:310,test.go:1237: unexpected node event: 3: dead
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:179,schemachange.go:310,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1234662-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		3: dead
		4: 3130
		1: 3212
		2: 3079
		Error:  3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/509c5b130fb1ad0042beb74e083817aa68e4fc92

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1237068&tab=buildLog

The test failed on release-19.1:
	cluster.go:1329,tpcc.go:169,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1237068-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		0 delivery
		    4m5s    13743            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		    4m5s    13743            0.0            0.1      0.0      0.0      0.0      0.0 orderStatus
		    4m5s    13743            0.0            0.0      0.0      0.0      0.0      0.0 payment
		    4m5s    13743            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190412 08:12:56.401529 1 workload/cli/run.go:428  error in stockLevel: ERROR: communication error: rpc error: code = Canceled desc = context canceled (SQLSTATE 08006)
		    4m6s    13744            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		    4m6s    13744            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		    4m6s    13744            0.0            0.1      0.0      0.0      0.0      0.0 orderStatus
		    4m6s    13744            0.0            0.0      0.0      0.0      0.0      0.0 payment
		    4m6s    13744            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	schemachange.go:450,schemachange.go:314,cluster.go:1667,errgroup.go:57: dial tcp 35.225.149.188:26257: connect: connection refused
	cluster.go:1688,tpcc.go:179,schemachange.go:310,test.go:1237: unexpected node event: 4: dead
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:179,schemachange.go:310,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1237068-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		4: dead
		3: 3287
		1: dead
		2: 3163
		Error:  1: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/0c83360778c511ab79103aefd8f5e3a115990144

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1237179&tab=buildLog

The test failed on release-19.1:
	schemachange.go:450,schemachange.go:314,cluster.go:1667,errgroup.go:57: read tcp 172.17.0.2:60106->34.73.253.72:26257: read: connection reset by peer
	cluster.go:1329,tpcc.go:169,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1237179-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		un.go:428  error in newOrder: dial tcp 10.142.0.64:26257: connect: connection refused
		   39m3s     6376            0.0            0.5      0.0      0.0      0.0      0.0 delivery
		   39m3s     6376            0.0            4.9      0.0      0.0      0.0      0.0 newOrder
		   39m3s     6376            0.0            0.6      0.0      0.0      0.0      0.0 orderStatus
		   39m3s     6376            0.0            4.4      0.0      0.0      0.0      0.0 payment
		   39m3s     6376            0.0            0.4      0.0      0.0      0.0      0.0 stockLevel
		   39m4s     6376            0.0            0.5      0.0      0.0      0.0      0.0 delivery
		   39m4s     6376            0.0            4.9      0.0      0.0      0.0      0.0 newOrder
		   39m4s     6376            0.0            0.6      0.0      0.0      0.0      0.0 orderStatus
		   39m4s     6376            0.0            4.4      0.0      0.0      0.0      0.0 payment
		   39m4s     6376            0.0            0.4      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1688,tpcc.go:179,schemachange.go:310,test.go:1237: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/9e6ae3cc37e7691147bb6f5d1a156ebe4c5cf7f9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1245443&tab=buildLog

The test failed on master:
	cluster.go:1329,tpcc.go:168,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1245443-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		   0.0 delivery
		   9m45s    17479            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   9m45s    17479            0.0            0.2      0.0      0.0      0.0      0.0 orderStatus
		   9m45s    17479            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   9m45s    17479            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		   9m46s    17483            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   9m46s    17483            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   9m46s    17483            0.0            0.2      0.0      0.0      0.0      0.0 orderStatus
		   9m46s    17483            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   9m46s    17483            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190417 20:05:59.029999 1 workload/cli/run.go:428  error in newOrder: ERROR: operation "intent_resolver_ir_batcher.sendBatch" timed out after 30s (SQLSTATE XXUUU)
		: signal: killed
	schemachange.go:450,schemachange.go:314,cluster.go:1667,errgroup.go:57: pq: internal error: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_CLIENT_REJECT): "unnamed" id=7e76bc9f key=/Table/SystemConfigSpan/Start rw=true pri=0.00438429 stat=ABORTED epo=0 ts=1555531576.452389234,1 orig=1555531260.405988184,0 max=1555531260.905988184,0 wto=false seq=1
	cluster.go:1688,tpcc.go:178,schemachange.go:310,test.go:1237: unexpected node event: 4: dead
	cluster.go:953,context.go:89,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:178,schemachange.go:310,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1245443-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		4: dead
		2: 3217
		1: 3238
		3: 3416
		Error:  4: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/c65b71a27e4d0941bf9427b5dec1ff7f096bba7b

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1245461&tab=buildLog

The test failed on release-19.1:
	cluster.go:1329,tpcc.go:168,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1245461-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		d
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		 1h3m33s    12257            0.0            2.2      0.0      0.0      0.0      0.0 delivery
		 1h3m33s    12257            0.0           22.0      0.0      0.0      0.0      0.0 newOrder
		 1h3m33s    12257            0.0            2.2      0.0      0.0      0.0      0.0 orderStatus
		 1h3m33s    12257            7.0           20.9  34359.7 103079.2 103079.2 103079.2 payment
		 1h3m33s    12257            0.0            2.1      0.0      0.0      0.0      0.0 stockLevel
		 1h3m34s    12257            0.0            2.2      0.0      0.0      0.0      0.0 delivery
		 1h3m34s    12257            1.0           22.0  73014.4  73014.4  73014.4  73014.4 newOrder
		 1h3m34s    12257            0.0            2.2      0.0      0.0      0.0      0.0 orderStatus
		 1h3m34s    12257            0.0           20.9      0.0      0.0      0.0      0.0 payment
		 1h3m34s    12257            0.0            2.1      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1688,tpcc.go:178,schemachange.go:310,test.go:1237: unexpected node event: 3: dead
	cluster.go:953,context.go:89,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:178,schemachange.go:310,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1245461-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		3: dead
		1: 3197
		4: 3169
		2: 3410
		Error:  3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/83de585d331b05a4aa02a65b353bed6bf829b696

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1247383&tab=buildLog

The test failed on master:
	cluster.go:1329,tpcc.go:168,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1247383-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		d
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   8m41s    16332            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   8m41s    16332            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   8m41s    16332            0.0            0.3      0.0      0.0      0.0      0.0 orderStatus
		   8m41s    16332            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   8m41s    16332            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		   8m42s    16332            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   8m42s    16332            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   8m42s    16332            0.0            0.3      0.0      0.0      0.0      0.0 orderStatus
		   8m42s    16332            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   8m42s    16332            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:1225: test timed out (6h0m0s)
	schemachange.go:450,schemachange.go:314,cluster.go:1667,errgroup.go:57: pq: server is not accepting clients
	cluster.go:1688,tpcc.go:178,schemachange.go:310,test.go:1237: unexpected node event: 4: dead
	cluster.go:1405,cluster.go:1424,cluster.go:1528,cluster.go:968,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:178,schemachange.go:310,test.go:1237: context canceled

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/4b3a1216e3a387aad900e70fde65b97b0fa17a8c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1251417&tab=buildLog

The test failed on master:
	schemachange.go:450,schemachange.go:314,cluster.go:1667,errgroup.go:57: read tcp 172.17.0.2:34596->35.237.7.219:26257: read: connection reset by peer
	cluster.go:1329,tpcc.go:168,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1251417-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		d
		  25m28s    17890            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		  25m28s    17890            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		  25m28s    17890            0.0            0.1      0.0      0.0      0.0      0.0 orderStatus
		  25m28s    17890            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  25m28s    17890            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  25m29s    17890            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		  25m29s    17890            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		  25m29s    17890            0.0            0.1      0.0      0.0      0.0      0.0 orderStatus
		  25m29s    17890            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  25m29s    17890            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1688,tpcc.go:178,schemachange.go:310,test.go:1237: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/99306ec3e9fcbba01c05431cbf496e8b5b8954b4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1260033&tab=buildLog

The test failed on master:
	cluster.go:1423,tpcc.go:168,cluster.go:1761,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1260033-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		run.go:428  error in payment: dial tcp 10.142.0.60:26257: connect: connection refused
		1h43m14s    13142            0.0            2.5      0.0      0.0      0.0      0.0 delivery
		1h43m14s    13142            1.0           24.9 103079.2 103079.2 103079.2 103079.2 newOrder
		1h43m14s    13142            0.0            2.5      0.0      0.0      0.0      0.0 orderStatus
		1h43m14s    13142            1.0           24.5  45097.2  45097.2  45097.2  45097.2 payment
		1h43m14s    13142            0.0            2.5      0.0      0.0      0.0      0.0 stockLevel
		1h43m15s    13142            0.0            2.5      0.0      0.0      0.0      0.0 delivery
		1h43m15s    13142            3.0           24.9 103079.2 103079.2 103079.2 103079.2 newOrder
		1h43m15s    13142            1.0            2.5   7516.2   7516.2   7516.2   7516.2 orderStatus
		1h43m15s    13142            1.0           24.5  90194.3  90194.3  90194.3  90194.3 payment
		1h43m15s    13142            0.0            2.5      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1782,tpcc.go:178,schemachange.go:310,test.go:1245: unexpected node event: 4: dead
	cluster.go:1016,context.go:89,cluster.go:1005,asm_amd64.s:522,panic.go:397,test.go:790,test.go:776,cluster.go:1782,tpcc.go:178,schemachange.go:310,test.go:1245: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1260033-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		4: dead
		2: 3573
		3: 3408
		1: 4461
		Error:  4: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7b2651400b2003d0a381cba9dbfc0b7bc0dfee00

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1293898&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1474,tpcc.go:157,tpcc.go:159,schemachange.go:305,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1293898-schemachange-index-tpcc-w-1000:5 -- ./workload fixtures load tpcc --warehouses=1000  {pgurl:1} returned:
		stderr:
		
		stdout:
		t open -> drain
		debug2: channel 0: obuf empty
		debug2: channel 0: close_write
		debug2: channel 0: output drain -> closed
		debug3: receive packet: type 98
		debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
		debug3: receive packet: type 97
		debug2: channel 0: rcvd close
		debug3: channel 0: will not send data after close
		debug2: channel 0: almost dead
		debug2: channel 0: gc: notify user
		debug2: channel 0: gc: user detached
		debug2: channel 0: send close
		debug3: send packet: type 97
		debug2: channel 0: is dead
		debug2: channel 0: garbage collecting
		debug1: channel 0: free: client-session, nchannels 1
		debug3: channel 0: status: The following connections are open:
		  #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1)
		
		debug3: send packet: type 1
		debug1: fd 0 clearing O_NONBLOCK
		debug1: fd 1 clearing O_NONBLOCK
		debug3: fd 2 is not O_NONBLOCK
		Transferred: sent 5360, received 5396 bytes, in 3943.4 seconds
		Bytes per second: sent 1.4, received 1.4
		debug1: Exit status 1
		: exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/923a3b2a6f4a6492883141092280d1041de1381a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1295056&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1474,tpcc.go:157,tpcc.go:159,schemachange.go:305,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1295056-schemachange-index-tpcc-w-1000:5 -- ./workload fixtures load tpcc --warehouses=1000  {pgurl:1} returned:
		stderr:
		
		stdout:
		t-status reply 0
		debug3: receive packet: type 96
		debug2: channel 0: rcvd eof
		debug2: channel 0: output open -> drain
		debug2: channel 0: obuf empty
		debug2: channel 0: close_write
		debug2: channel 0: output drain -> closed
		debug3: receive packet: type 97
		debug2: channel 0: rcvd close
		debug3: channel 0: will not send data after close
		debug2: channel 0: almost dead
		debug2: channel 0: gc: notify user
		debug2: channel 0: gc: user detached
		debug2: channel 0: send close
		debug3: send packet: type 97
		debug2: channel 0: is dead
		debug2: channel 0: garbage collecting
		debug1: channel 0: free: client-session, nchannels 1
		debug3: channel 0: status: The following connections are open:
		  #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1)
		
		debug3: send packet: type 1
		debug1: fd 0 clearing O_NONBLOCK
		debug1: fd 1 clearing O_NONBLOCK
		debug3: fd 2 is not O_NONBLOCK
		Transferred: sent 5252, received 5360 bytes, in 3624.3 seconds
		Bytes per second: sent 1.4, received 1.5
		debug1: Exit status 1
		: exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/cab299a0ef983f8b4ffe5d724e44587d9665d3a3

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1295811&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1474,tpcc.go:157,tpcc.go:159,schemachange.go:305,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1295811-schemachange-index-tpcc-w-1000:5 -- ./workload fixtures load tpcc --warehouses=1000  {pgurl:1} returned:
		stderr:
		
		stdout:
		t open -> drain
		debug2: channel 0: obuf empty
		debug2: channel 0: close_write
		debug2: channel 0: output drain -> closed
		debug3: receive packet: type 98
		debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
		debug3: receive packet: type 97
		debug2: channel 0: rcvd close
		debug3: channel 0: will not send data after close
		debug2: channel 0: almost dead
		debug2: channel 0: gc: notify user
		debug2: channel 0: gc: user detached
		debug2: channel 0: send close
		debug3: send packet: type 97
		debug2: channel 0: is dead
		debug2: channel 0: garbage collecting
		debug1: channel 0: free: client-session, nchannels 1
		debug3: channel 0: status: The following connections are open:
		  #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1)
		
		debug3: send packet: type 1
		debug1: fd 0 clearing O_NONBLOCK
		debug1: fd 1 clearing O_NONBLOCK
		debug3: fd 2 is not O_NONBLOCK
		Transferred: sent 5072, received 5292 bytes, in 3486.3 seconds
		Bytes per second: sent 1.5, received 1.5
		debug1: Exit status 1
		: exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/58c567a325056033b326cb9c4ed9ba490e8956da

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1296592&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1474,tpcc.go:157,tpcc.go:159,schemachange.go:305,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1296592-schemachange-index-tpcc-w-1000:5 -- ./workload fixtures load tpcc --warehouses=1000  {pgurl:1} returned:
		stderr:
		
		stdout:
		t-status reply 0
		debug3: receive packet: type 96
		debug2: channel 0: rcvd eof
		debug2: channel 0: output open -> drain
		debug2: channel 0: obuf empty
		debug2: channel 0: close_write
		debug2: channel 0: output drain -> closed
		debug3: receive packet: type 97
		debug2: channel 0: rcvd close
		debug3: channel 0: will not send data after close
		debug2: channel 0: almost dead
		debug2: channel 0: gc: notify user
		debug2: channel 0: gc: user detached
		debug2: channel 0: send close
		debug3: send packet: type 97
		debug2: channel 0: is dead
		debug2: channel 0: garbage collecting
		debug1: channel 0: free: client-session, nchannels 1
		debug3: channel 0: status: The following connections are open:
		  #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1)
		
		debug3: send packet: type 1
		debug1: fd 0 clearing O_NONBLOCK
		debug1: fd 1 clearing O_NONBLOCK
		debug3: fd 2 is not O_NONBLOCK
		Transferred: sent 5216, received 5340 bytes, in 3755.1 seconds
		Bytes per second: sent 1.4, received 1.4
		debug1: Exit status 1
		: exit status 1
		: exit status 1

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@nvanbenschoten
Copy link
Member

Previous two issues addressed by #37701.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/c9301cf71ea69da451fe5e5ba2c3074a4fe53831

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1303699&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1516,tpcc.go:168,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1303699-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		   12648            4.0           15.4    243.3  55834.6  55834.6  55834.6 stockLevel
		1h32m31s    12648            0.0           15.4      0.0      0.0      0.0      0.0 delivery
		1h32m31s    12648            0.0          153.1      0.0      0.0      0.0      0.0 newOrder
		1h32m31s    12648            0.0           15.4      0.0      0.0      0.0      0.0 orderStatus
		1h32m31s    12648            0.0          153.0      0.0      0.0      0.0      0.0 payment
		1h32m31s    12648            0.0           15.4      0.0      0.0      0.0      0.0 stockLevel
		1h32m32s    12648            0.0           15.4      0.0      0.0      0.0      0.0 delivery
		1h32m32s    12648            0.0          153.1      0.0      0.0      0.0      0.0 newOrder
		1h32m32s    12648            0.0           15.4      0.0      0.0      0.0      0.0 orderStatus
		1h32m32s    12648            0.0          152.9      0.0      0.0      0.0      0.0 payment
		1h32m32s    12648            0.0           15.4      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1875,tpcc.go:178,schemachange.go:305,test.go:1251: unexpected node event: 2: dead
	cluster.go:1038,context.go:89,cluster.go:1027,asm_amd64.s:522,panic.go:397,test.go:788,test.go:774,cluster.go:1875,tpcc.go:178,schemachange.go:305,test.go:1251: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1303699-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		2: dead
		1: 4677
		4: 4516
		3: 4505
		Error:  2: dead

@nvanbenschoten
Copy link
Member

F190523 18:34:10.101886 859178 kv/txn_coord_sender.go:913 [n2,client=10.142.0.98:42342,user=root] unexpected txn state: "sql txn" id=47c21f4d key=/Table/58/1/17/3/0 rw=true pri=0.06266986 stat=COMMITTED epo=1 ts=1558636408.791007024,0 orig=1558636408.791007024,0 max=1558636409.291007024,0 wto=false seq=34 int=33 ifw=23

Same failure as #37488 (comment).

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/630a6e9cb3771912cd138f9aa3bea1f0ca9fa7c9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1306250&tab=buildLog

The test failed on branch=master, cloud=gce:
	schemachange.go:416,schemachange.go:309,cluster.go:1854,errgroup.go:57: dial tcp 34.73.9.131:26257: connect: connection refused
	cluster.go:1516,tpcc.go:168,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1306250-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		9   6710.9 delivery
		   4m11s    21051           59.0          234.2  30064.8  53687.1  73014.4  81604.4 newOrder
		   4m11s    21051           17.0           24.5    872.4   7247.8   8321.5   8321.5 orderStatus
		   4m11s    21051           18.0          232.6  22548.6  53687.1  57982.1  57982.1 payment
		   4m11s    21051           12.0           24.2   3087.0  13958.6  16643.0  16643.0 stockLevel
		E190524 16:29:53.675449 1 workload/cli/run.go:428  error in newOrder: ERROR: TransactionStatusError: already committed (REASON_TXN_COMMITTED) (SQLSTATE XXUUU)
		   4m12s    21193           16.0           24.2   2013.3   7247.8   9126.8   9126.8 delivery
		   4m12s    21193           69.0          233.6  24696.1  40802.2  81604.4 103079.2 newOrder
		   4m12s    21193           10.0           24.4     46.1   5100.3   5100.3   5100.3 orderStatus
		   4m12s    21193           30.0          231.8  13958.6  38654.7  57982.1  57982.1 payment
		   4m12s    21193           10.0           24.1    906.0  34359.7  34359.7  34359.7 stockLevel
		: signal: killed
	cluster.go:1875,tpcc.go:178,schemachange.go:305,test.go:1251: Goexit() was called
	cluster.go:1038,context.go:89,cluster.go:1027,asm_amd64.s:522,panic.go:397,test.go:788,test.go:774,cluster.go:1875,tpcc.go:178,schemachange.go:305,test.go:1251: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1306250-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		1: dead
		4: 4335
		3: 4358
		2: 4661
		Error:  1: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/fc7e48295cd05f94fd2883498d96d91ad538e559

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308263&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1516,tpcc.go:180,schemachange.go:305,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1308263-schemachange-index-tpcc-w-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190525 18:49:47.943797 1 workload/tpcc/tpcc.go:290  check 3.3.2.1 took 270.084167ms
		Error: check failed: 3.3.2.1: 1 rows returned, expected zero
		Error:  ssh verbose log retained in /root/.roachprod/debug/ssh_104.196.60.26_2019-05-25T18:49:46Z: exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/6bc296955cbbc4313d91b94ee129b73b81ab12f4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1337184&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	schemachange.go:413,schemachange.go:306,cluster.go:1851,errgroup.go:57: dial tcp 34.74.249.54:26257: connect: connection refused
	cluster.go:1513,tpcc.go:169,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1337184-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		0            0.0      0.0      0.0      0.0      0.0 newOrder
		    2m8s   178124            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		    2m8s   178124            0.0            0.0      0.0      0.0      0.0      0.0 payment
		    2m8s   178124            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190614 14:45:04.262541 1 workload/cli/run.go:425  error in payment: dial tcp 10.142.0.232:26257: connect: connection refused
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		    2m9s   316816            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		    2m9s   316816            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		    2m9s   316816            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		    2m9s   316816            0.0            0.0      0.0      0.0      0.0      0.0 payment
		    2m9s   316816            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1872,tpcc.go:179,schemachange.go:302,test.go:1248: Goexit() was called
	cluster.go:1035,context.go:87,cluster.go:1024,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1872,tpcc.go:179,schemachange.go:302,test.go:1248: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1337184-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		4: 4049
		1: dead
		2: 3715
		3: 3735
		Error:  1: dead
	test.go:1234: test timed out (6h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5f358ed804af05f8c4b404efc4d8a282d8e0916c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1360435&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1511,tpcc.go:167,cluster.go:1849,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1360435-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		1h10m33s    16840            9.0           21.0   7516.2  34359.7  34359.7  34359.7 delivery
		1h10m33s    16840           87.8          209.5  15569.3  60129.5  81604.4 103079.2 newOrder
		1h10m33s    16840            8.0           21.0     44.0  20401.1  20401.1  20401.1 orderStatus
		1h10m33s    16840           23.9          208.9  11811.2  45097.2  45097.2  45097.2 payment
		1h10m33s    16840           13.0           21.0   9126.8  21474.8  31138.5  31138.5 stockLevel
		E190626 16:46:33.774171 1 workload/cli/run.go:427  error in payment: ERROR: result is ambiguous (error=failed to connect to n3 at teamcity-1360435-schemachange-index-tpcc-w-1000-0003:26257: initial connection heartbeat failed: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.142.0.59:26257: connect: connection refused" [exhausted]) (SQLSTATE 40003)
		: signal: killed
	cluster.go:1870,tpcc.go:177,schemachange.go:300,test.go:1249: unexpected node event: 3: dead
	cluster.go:1033,context.go:122,cluster.go:1022,asm_amd64.s:522,panic.go:397,test.go:783,test.go:769,cluster.go:1870,tpcc.go:177,schemachange.go:300,test.go:1249: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1360435-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		3: dead
		4: 4423
		1: 4506
		2: 4317
		Error:  3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5f358ed804af05f8c4b404efc4d8a282d8e0916c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1361643&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1511,tpcc.go:167,cluster.go:1849,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1361643-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		    21.5      0.0      0.0      0.0      0.0 delivery
		  32m50s    10080            0.0          214.3      0.0      0.0      0.0      0.0 newOrder
		  32m50s    10080            0.0           21.5      0.0      0.0      0.0      0.0 orderStatus
		  32m50s    10080            0.0          213.0      0.0      0.0      0.0      0.0 payment
		  32m50s    10080            0.0           21.5      0.0      0.0      0.0      0.0 stockLevel
		E190627 04:10:04.317804 1 workload/cli/run.go:427  error in payment: dial tcp 10.142.0.17:26257: connect: connection refused
		  32m51s    11115            0.0           21.5      0.0      0.0      0.0      0.0 delivery
		  32m51s    11115            0.0          214.2      0.0      0.0      0.0      0.0 newOrder
		  32m51s    11115            0.0           21.5      0.0      0.0      0.0      0.0 orderStatus
		  32m51s    11115            0.0          212.9      0.0      0.0      0.0      0.0 payment
		  32m51s    11115            0.0           21.5      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1870,tpcc.go:177,schemachange.go:300,test.go:1249: unexpected node event: 3: dead
	cluster.go:1033,context.go:122,cluster.go:1022,asm_amd64.s:522,panic.go:397,test.go:783,test.go:769,cluster.go:1870,tpcc.go:177,schemachange.go:300,test.go:1249: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1361643-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		3: dead
		4: 4353
		2: 4364
		1: 4828
		Error:  3: dead

@nvanbenschoten
Copy link
Member

F190627 04:09:55.354078 26377651 kv/txn_interceptor_heartbeater.go:360  [n3,txn-hb=00000000] txn committed or aborted but heartbeat loop hasn't been signaled to stop. txn: "sql txn" id=932560c7 key=/Table/59/1/822/9/0 rw=true pri=0.02085096 stat=ABORTED epo=2 ts=1561608563.558083010,2 orig=1561608563.558083010,2 max=1561608476.430435857,0 wto=false seq=1
goroutine 26377651 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000450001, 0xc000450000, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1016 +0xb1
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x61b53e0, 0xc000000004, 0x5a0de0f, 0x21, 0x168, 0xc01d72fb00, 0x11c)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:874 +0x92b
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x3e2a580, 0xc00ea1bec0, 0x4, 0x2, 0x36e091e, 0x51, 0xc0b23e9db8, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2cc
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x3e2a580, 0xc00ea1bec0, 0x1, 0x4, 0x36e091e, 0x51, 0xc0b23e9db8, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:180
github.com/cockroachdb/cockroach/pkg/kv.(*txnHeartbeater).heartbeat(0xc0f5aa2638, 0x3e2a580, 0xc00ea1bec0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_heartbeater.go:360 +0x15b
github.com/cockroachdb/cockroach/pkg/kv.(*txnHeartbeater).heartbeatLoop(0xc0f5aa2638, 0x3e2a580, 0xc00ea1bec0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_heartbeater.go:324 +0x1f8
github.com/cockroachdb/cockroach/pkg/kv.(*txnHeartbeater).startHeartbeatLoopLocked.func1(0x3e2a580, 0xc00ea1bec0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn_interceptor_heartbeater.go:283 +0x3e
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask.func1(0xc0004cbdd0, 0x3e2a580, 0xc00ea1bec0, 0xc0d6297230, 0x29, 0x3e794a0, 0xc0002e0ee0, 0xc04d5660e0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:321 +0xe6
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:316 +0x131

@andreimatei it looks like this old friend is back. Interestingly, this test failed here and in #36024 (comment) on the same night after not showing up for a long time. I wonder if something changed recently to make this possible again.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/90841a6559df9d9a4724e1d30490951bbdb811b4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1364443&tab=buildLog

The test failed on branch=provisional_201906271846_v19.2.0-alpha.20190701, cloud=gce:
	cluster.go:1511,tpcc.go:167,cluster.go:1849,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1364443-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		l
		   5m12s    12671            0.0           21.9      0.0      0.0      0.0      0.0 delivery
		   5m12s    12671            0.0          211.7      0.0      0.0      0.0      0.0 newOrder
		   5m12s    12671            0.0           22.0      0.0      0.0      0.0      0.0 orderStatus
		   5m12s    12671            0.0          203.5      0.0      0.0      0.0      0.0 payment
		   5m12s    12671            0.0           21.6      0.0      0.0      0.0      0.0 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   5m13s    12671            0.0           21.8      0.0      0.0      0.0      0.0 delivery
		   5m13s    12671            0.0          211.0      0.0      0.0      0.0      0.0 newOrder
		   5m13s    12671            0.0           22.0      0.0      0.0      0.0      0.0 orderStatus
		   5m13s    12671            0.0          202.8      0.0      0.0      0.0      0.0 payment
		   5m13s    12671            0.0           21.5      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1870,tpcc.go:177,schemachange.go:300,test.go:1249: unexpected node event: 3: dead
	cluster.go:1033,context.go:122,cluster.go:1022,panic.go:406,test.go:783,test.go:769,cluster.go:1870,tpcc.go:177,schemachange.go:300,test.go:1249: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1364443-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		3: dead
		2: 3701
		4: 3735
		1: 3760
		Error:  3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/537767ac9daa52b0026bb957d7010e3b88b61071

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1364821&tab=buildLog

The test failed on branch=master, cloud=gce:
	test.go:1235: test timed out (6h0m0s)
	cluster.go:1511,tpcc.go:156,tpcc.go:158,schemachange.go:300,test.go:1249: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1364821-schemachange-index-tpcc-w-1000:5 -- ./workload fixtures load tpcc --warehouses=1000  {pgurl:1} returned:
		stderr:
		
		stdout:
		I190628 19:08:57.395870 1 ccl/workloadccl/cliccl/fixtures.go:293  starting load of 9 tables
		I190628 19:08:57.939740 95 ccl/workloadccl/fixture.go:476  loaded 1006 KiB table district in 543.614738ms (10000 rows, 0 index entries, 1.8 MiB)
		I190628 19:09:16.245523 94 ccl/workloadccl/fixture.go:476  loaded 53 KiB table warehouse in 18.848639993s (1000 rows, 0 index entries, 2.8 KiB)
		I190628 19:13:23.794908 99 ccl/workloadccl/fixture.go:476  loaded 126 MiB table new_order in 4m26.398225865s (9000000 rows, 0 index entries, 483 KiB)
		I190628 19:14:12.323332 100 ccl/workloadccl/fixture.go:476  loaded 7.8 MiB table item in 5m14.926630688s (100000 rows, 0 index entries, 25 KiB)
		I190628 19:16:41.378326 98 ccl/workloadccl/fixture.go:476  loaded 1.3 GiB table order in 7m43.98201454s (30000000 rows, 30000000 index entries, 2.9 MiB)
		: signal: killed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/86154ae6ae36e286883d8a6c9a4111966198201d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/index/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1367379&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1511,tpcc.go:167,cluster.go:1849,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1367379-schemachange-index-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		 0.0 delivery
		 1h8m55s     9382            0.0          151.8      0.0      0.0      0.0      0.0 newOrder
		 1h8m55s     9382            0.0           15.3      0.0      0.0      0.0      0.0 orderStatus
		 1h8m55s     9382            0.0          151.8      0.0      0.0      0.0      0.0 payment
		 1h8m55s     9382            0.0           15.2      0.0      0.0      0.0      0.0 stockLevel
		E190630 22:00:37.551762 1 workload/cli/run.go:427  error in payment: ERROR: result is ambiguous (error=unable to dial n1: breaker open [exhausted]) (SQLSTATE 40003)
		 1h8m56s     9466            0.0           15.3      0.0      0.0      0.0      0.0 delivery
		 1h8m56s     9466            1.0          151.7 103079.2 103079.2 103079.2 103079.2 newOrder
		 1h8m56s     9466            1.0           15.3      8.9      8.9      8.9      8.9 orderStatus
		 1h8m56s     9466            7.0          151.8  36507.2  94489.3  94489.3  94489.3 payment
		 1h8m56s     9466            5.0           15.2  34359.7 103079.2 103079.2 103079.2 stockLevel
		: signal: killed
	cluster.go:1870,tpcc.go:177,schemachange.go:300,test.go:1249: unexpected node event: 1: dead
	cluster.go:1033,context.go:122,cluster.go:1022,panic.go:406,test.go:783,test.go:769,cluster.go:1870,tpcc.go:177,schemachange.go:300,test.go:1249: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1367379-schemachange-index-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		1: dead
		3: 4577
		2: 4366
		4: 4514
		Error:  1: dead

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 30, 2019
Fixes cockroachdb#36024.
Fixes cockroachdb#36094.

8b5bafb ensured that all transaction state was propagated by DistSender on
errors. In doing so, it touched that fact that DistSender drops all but the
first error that it sees. It ensured that even though this was the case, the
error metadata from these dropped errors would still be propagated (see
`pErr.UpdateTxn(resp.pErr.GetTxn())`).

This has an unintended consequence where it was now possible for a non-aborting
transaction retry error to be updated with an ABORTED transaction proto. This
caused confusion in the TxnCoordSender, triggering panics like we see in cockroachdb#36024
and cockroachdb#36094.

This change fixes this by being smarter about which errors get dropped when
concurrent partial batches each hit an error in DistSender. It does this by
prioritizing the most severe errors and merging transaction state into those.
In a lot of ways, this is the DistSender equivalent of 574e805, which is why
they now share code.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 1, 2019
Fixes cockroachdb#36024.
Fixes cockroachdb#36094.

8b5bafb ensured that all transaction state was propagated by DistSender on
errors. In doing so, it touched that fact that DistSender drops all but the
first error that it sees. It ensured that even though this was the case, the
error metadata from these dropped errors would still be propagated (see
`pErr.UpdateTxn(resp.pErr.GetTxn())`).

This has an unintended consequence where it was now possible for a non-aborting
transaction retry error to be updated with an ABORTED transaction proto. This
caused confusion in the TxnCoordSender, triggering panics like we see in cockroachdb#36024
and cockroachdb#36094.

This change fixes this by being smarter about which errors get dropped when
concurrent partial batches each hit an error in DistSender. It does this by
prioritizing the most severe errors and merging transaction state into those.
In a lot of ways, this is the DistSender equivalent of 574e805, which is why
they now share code.

Release note: None
craig bot pushed a commit that referenced this issue Jul 1, 2019
38579: kv: prioritize severe errors when merging partial batches in DistSender r=andreimatei a=nvanbenschoten

Fixes #36024.
Fixes #36094.

8b5bafb ensured that all transaction state was propagated by `DistSender` on errors. In doing so, it touched that fact that `DistSender` drops all but the first error that it sees. It ensured that even though this was the case, the error metadata from these dropped errors would still be propagated (see `pErr.UpdateTxn(resp.pErr.GetTxn())`).

This has an unintended consequence where it was now possible for a non-aborting transaction retry error to be updated with an ABORTED transaction proto. This caused confusion in the `TxnCoordSender`, triggering panics like the ones we see in #36024 and #36094.

This change fixes this by being smarter about which errors get dropped when concurrent partial batches each hit an error in `DistSender`. It does this by prioritizing the most severe errors and merging transaction state into those. In a lot of ways, this is the `DistSender` equivalent of 574e805, which is why they now share code.

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in #38579 Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants