Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: scrub/index-only/tpcc/w=1000 failed #35985

Closed
cockroach-teamcity opened this issue Mar 20, 2019 · 19 comments · Fixed by #37046
Closed

roachtest: scrub/index-only/tpcc/w=1000 failed #35985

cockroach-teamcity opened this issue Mar 20, 2019 · 19 comments · Fixed by #37046
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/3a7ea2d8c9d4a3e0d97f8f106fcf95b3f03765ec

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1187480&tab=buildLog

The test failed on master:
	scrub.go:83,cluster.go:1605,errgroup.go:57: pq: communication error: rpc error: code = Canceled desc = context canceled
	cluster.go:1267,tpcc.go:130,cluster.go:1605,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1187480-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		l
		 1h6m24s       40            0.0            2.7      0.0      0.0      0.0      0.0 delivery
		 1h6m24s       40            0.0           27.1      0.0      0.0      0.0      0.0 newOrder
		 1h6m24s       40            0.0            2.7      0.0      0.0      0.0      0.0 orderStatus
		 1h6m24s       40            0.0           25.8      0.0      0.0      0.0      0.0 payment
		 1h6m24s       40            0.0            2.7      0.0      0.0      0.0      0.0 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		 1h6m25s       40            0.0            2.7      0.0      0.0      0.0      0.0 delivery
		 1h6m25s       40            1.0           27.1 103079.2 103079.2 103079.2 103079.2 newOrder
		 1h6m25s       40            0.0            2.7      0.0      0.0      0.0      0.0 orderStatus
		 1h6m25s       40            0.0           25.8      0.0      0.0      0.0      0.0 payment
		 1h6m25s       40            1.0            2.7    604.0    604.0    604.0    604.0 stockLevel
		: signal: killed
	cluster.go:1626,tpcc.go:140,scrub.go:58,test.go:1214: Goexit() was called

@cockroach-teamcity cockroach-teamcity added this to the 19.1 milestone Mar 20, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Mar 20, 2019
@thoszhang thoszhang self-assigned this Mar 21, 2019
@knz knz self-assigned this Mar 21, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/dfa23c01e4ea39b19ca8b2e5c8a4e7cf9b9445f4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1189954&tab=buildLog

The test failed on master:
	cluster.go:1267,tpcc.go:130,cluster.go:1605,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1189954-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		      0.0      0.0 orderStatus
		   4m24s    62311            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   4m24s    62311            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190321 15:13:11.579140 1 workload/cli/run.go:420  error in payment: dial tcp 10.142.0.181:26257: connect: connection refused
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   4m25s   120973            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   4m25s   120973            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   4m25s   120973            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   4m25s   120973            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   4m25s   120973            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190321 15:13:12.579142 1 workload/cli/run.go:420  error in newOrder: dial tcp 10.142.0.181:26257: connect: connection refused
		: signal: killed
	cluster.go:1626,tpcc.go:140,scrub.go:58,test.go:1214: unexpected node event: 3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7bc9ea5fbe0c0082fdcfd408245a79c62b00edd4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1197065&tab=buildLog

The test failed on master:
	cluster.go:1267,tpcc.go:130,cluster.go:1605,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1197065-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		     0.0      0.0 orderStatus
		   6m44s    75402            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   6m44s    75402            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190325 14:57:40.401105 1 workload/cli/run.go:420  error in newOrder: dial tcp 10.142.0.127:26257: connect: connection refused
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   6m45s   132246            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   6m45s   132246            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   6m45s   132246            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   6m45s   132246            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   6m45s   132246            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190325 14:57:41.401114 1 workload/cli/run.go:420  error in newOrder: dial tcp 10.142.0.127:26257: connect: connection refused
		: signal: killed
	scrub.go:83,cluster.go:1605,errgroup.go:57: dial tcp 35.229.86.89:26257: connect: connection refused
	cluster.go:1626,tpcc.go:140,scrub.go:58,test.go:1214: unexpected node event: 2: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3aadd20bbf0940ef65f8b2cdcda498401ba5d9c6

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1201905&tab=buildLog

The test failed on release-19.1:
	scrub.go:83,cluster.go:1605,errgroup.go:57: pq: communication error: rpc error: code = Canceled desc = context canceled
	cluster.go:1267,tpcc.go:130,cluster.go:1605,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1201905-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		l
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   5m25s       55            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   5m25s       55            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   5m25s       55            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   5m25s       55            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   5m25s       55            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		   5m26s       55            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   5m26s       55            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   5m26s       55            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   5m26s       55            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   5m26s       55            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1626,tpcc.go:140,scrub.go:58,test.go:1216: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3aadd20bbf0940ef65f8b2cdcda498401ba5d9c6

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1206925&tab=buildLog

The test failed on release-19.1:
	scrub.go:83,cluster.go:1631,errgroup.go:57: pq: communication error: rpc error: code = Canceled desc = context canceled
	cluster.go:1293,tpcc.go:130,cluster.go:1631,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1206925-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		     167            0.0            0.7      0.0      0.0      0.0      0.0 stockLevel
		  37m35s      167            0.0            0.9      0.0      0.0      0.0      0.0 delivery
		  37m35s      167            0.0            7.8      0.0      0.0      0.0      0.0 newOrder
		  37m35s      167            0.0            0.8      0.0      0.0      0.0      0.0 orderStatus
		  37m35s      167            0.0            6.7      0.0      0.0      0.0      0.0 payment
		  37m35s      167            0.0            0.7      0.0      0.0      0.0      0.0 stockLevel
		  37m36s      167            0.0            0.9      0.0      0.0      0.0      0.0 delivery
		  37m36s      167            0.0            7.8      0.0      0.0      0.0      0.0 newOrder
		  37m36s      167            0.0            0.8      0.0      0.0      0.0      0.0 orderStatus
		  37m36s      167            0.0            6.7      0.0      0.0      0.0      0.0 payment
		  37m36s      167            0.0            0.7      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1652,tpcc.go:140,scrub.go:58,test.go:1223: Goexit() was called
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:774,test.go:760,cluster.go:1652,tpcc.go:140,scrub.go:58,test.go:1223: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1206925-scrub-index-only-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/23f9707873abbd2de91a42055535529d7ff296ce

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1209900&tab=buildLog

The test failed on release-19.1:
	scrub.go:83,cluster.go:1631,errgroup.go:57: dial tcp 35.243.230.156:26257: connect: connection refused
	cluster.go:1293,tpcc.go:130,cluster.go:1631,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1209900-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		.0      0.0 newOrder
		  10m50s   131747            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		  10m50s   131747            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  10m50s   131747            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190330 14:58:25.742141 1 workload/cli/run.go:420  error in newOrder: dial tcp 10.142.0.33:26257: connect: connection refused
		  10m51s   199449            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		  10m51s   199449            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		  10m51s   199449            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		  10m51s   199449            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  10m51s   199449            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		E190330 14:58:26.742155 1 workload/cli/run.go:420  error in payment: dial tcp 10.142.0.33:26257: connect: connection refused
		: signal: killed
	cluster.go:1652,tpcc.go:140,scrub.go:58,test.go:1223: Goexit() was called
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:774,test.go:760,cluster.go:1652,tpcc.go:140,scrub.go:58,test.go:1223: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1209900-scrub-index-only-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		1: dead
		2: 3995
		3: 3849
		4: 3023
		Error:  1: dead

@tbg
Copy link
Member

tbg commented Apr 1, 2019

F190330 14:58:19.643070 286537 server/server_engine_health.go:56  [n1] disk stall detected: unable to write to <no-attributes>=/mnt/data1/cockroach within 2m0s 

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   16.04 MB   0.5      0.0     0.0      0.0       0.1      0.1       0.0   0.0      0.0     20.4         5         7    0.769       0      0
  L3     17/0   60.12 MB   0.9      0.3     0.1      0.2       0.3      0.1       0.3   2.9     12.1     12.3        22        22    0.979     13M    20K
  L4     56/0   459.25 MB   1.0      0.3     0.0      0.3       0.3      0.0       2.1 106.6     33.6     34.6        10        83    0.122   8225K      0
  L5    551/0    4.53 GB   1.0      3.6     0.0      3.6       3.9      0.3       5.4 187.0     18.4     19.8       200       221    0.904    215M    17K
  L6   2917/0   45.34 GB   0.0     57.7     1.2     56.5      57.5      1.0      44.4  48.4     70.2     69.9       842      1716    0.491    506M    16K
 Sum   3542/0   50.40 GB   0.0     61.9     1.3     60.6      62.0      1.5      52.2   1.2     58.7     58.9      1079      2049    0.527    743M    55K
 Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000       0      0
Uptime(secs): 4202.2 total, 591.2 interval
Flush(GB): cumulative 0.107, interval 0.000
AddFile(GB): cumulative 50.232, interval 0.000
AddFile(Total Files): cumulative 1986, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 774076707, interval 0
Cumulative compaction: 62.04 GB write, 15.12 MB/s write, 61.87 GB read, 15.08 MB/s read, 1078.8 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
goroutine 286537 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc000057b01, 0xc000057b60, 0x5364900, 0x1e)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1020 +0xd4
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x5b01e80, 0xc000000004, 0x53649ae, 0x1e, 0x38, 0xc00ce4b200, 0x8d4)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:878 +0x93d
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x3a022e0, 0xc0008d9bf0, 0x4, 0x2, 0x0, 0x0, 0xc02ff0bf20, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2d5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x3a022e0, 0xc0008d9bf0, 0x1, 0x4, 0x0, 0x0, 0xc02ff0bf20, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:71 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Shout(0x3a022e0, 0xc0008d9bf0, 0x4, 0xc02ff0bf20, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:89 +0xa5
github.com/cockroachdb/cockroach/pkg/server.guaranteedExitFatal(0x3a022e0, 0xc0008d9bf0, 0x3323574, 0x37, 0xc0d046c7a0, 0x3, 0x3)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server_engine_health.go:56 +0xe9
github.com/cockroachdb/cockroach/pkg/server.assertEngineHealth.func1.1()
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server_engine_health.go:68 +0x149
created by time.goFunc
	/usr/local/go/src/time/sleep.go:172 +0x44

@tbg
Copy link
Member

tbg commented Apr 1, 2019

cc @ajkr

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b88a6ce86bfe507e14e1e80fdaefd219b5f0b046

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1222890&tab=buildLog

The test failed on release-19.1:
	scrub.go:83,cluster.go:1667,errgroup.go:57: read tcp 172.17.0.2:38340->35.227.66.202:26257: read: connection reset by peer
	cluster.go:1329,tpcc.go:130,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1222890-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		     0.0 newOrder
		   55m5s    52821            0.0            2.0      0.0      0.0      0.0      0.0 orderStatus
		   55m5s    52821            0.0           18.3      0.0      0.0      0.0      0.0 payment
		   55m5s    52821            0.0            1.9      0.0      0.0      0.0      0.0 stockLevel
		E190405 15:21:31.719143 1 workload/cli/run.go:420  error in newOrder: dial tcp 10.142.0.106:26257: connect: connection refused
		   55m6s   103640            0.0            2.0      0.0      0.0      0.0      0.0 delivery
		   55m6s   103640            0.0           19.6      0.0      0.0      0.0      0.0 newOrder
		   55m6s   103640            0.0            2.0      0.0      0.0      0.0      0.0 orderStatus
		   55m6s   103640            0.0           18.3      0.0      0.0      0.0      0.0 payment
		   55m6s   103640            0.0            1.9      0.0      0.0      0.0      0.0 stockLevel
		E190405 15:21:32.719177 1 workload/cli/run.go:420  error in newOrder: dial tcp 10.142.0.106:26257: connect: connection refused
		: signal: killed
	cluster.go:1688,tpcc.go:140,scrub.go:58,test.go:1228: Goexit() was called
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:776,test.go:762,cluster.go:1688,tpcc.go:140,scrub.go:58,test.go:1228: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1222890-scrub-index-only-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		1: dead
		4: 3741
		3: 3938
		2: 3628
		Error:  1: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/682c2f2f466bbf768545ca4687822206a63983ad

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1231772&tab=buildLog

The test failed on master:
	cluster.go:1329,tpcc.go:169,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1231772-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		n.go:428  error in newOrder: dial tcp 10.142.0.195:26257: connect: connection refused
		  10m11s    13633            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		  10m11s    13633            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		  10m11s    13633            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		  10m11s    13633            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  10m11s    13633            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		  10m12s    13633            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		  10m12s    13633            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		  10m12s    13633            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		  10m12s    13633            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  10m12s    13633            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	scrub.go:83,cluster.go:1667,errgroup.go:57: dial tcp 35.231.14.158:26257: connect: connection refused
	cluster.go:1688,tpcc.go:179,scrub.go:58,test.go:1237: unexpected node event: 2: dead
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:179,scrub.go:58,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1231772-scrub-index-only-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		1: dead
		3: 3115
		4: 3140
		2: dead
		Error:  2: dead

@ajkr
Copy link
Contributor

ajkr commented Apr 10, 2019

ACK, I will repro this and see if the changes in #34897 can help here as well.

While those W-Amp numbers look bad, they may also be wrong considering the compaction stats show Cumulative compaction: 62.04 GB write. So the bottom level, which is 45.34 GB, shouldn't be able to have write-amp of 48.4, I think...

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/134478e4dde16919eb86efb81fb22d8cce8a9701

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1234680&tab=buildLog

The test failed on release-19.1:
	cluster.go:1329,tpcc.go:169,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1234680-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		run.go:428  error in payment: dial tcp 10.142.0.34:26257: connect: connection refused
		   8m42s    13914            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   8m42s    13914            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   8m42s    13914            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   8m42s    13914            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   8m42s    13914            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		   8m43s    13914            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   8m43s    13914            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   8m43s    13914            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   8m43s    13914            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   8m43s    13914            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:1225: test timed out (11h49m6.648238536s)
	scrub.go:83,cluster.go:1667,errgroup.go:57: pq: server is not accepting clients
	cluster.go:1688,tpcc.go:179,scrub.go:58,test.go:1237: unexpected node event: 3: dead
	cluster.go:1405,cluster.go:1424,cluster.go:1528,cluster.go:968,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:179,scrub.go:58,test.go:1237: context canceled

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/bf399d2677783dc1eea7f5ede6d4561f95c0ea10

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1234662&tab=buildLog

The test failed on master:
	cluster.go:1329,tpcc.go:169,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1234662-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		n.go:428  error in newOrder: dial tcp 10.142.0.104:26257: connect: connection refused
		  24m15s     8978            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		  24m15s     8978            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		  24m15s     8978            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		  24m15s     8978            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  24m15s     8978            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		  24m16s     8978            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		  24m16s     8978            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		  24m16s     8978            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		  24m16s     8978            0.0            0.0      0.0      0.0      0.0      0.0 payment
		  24m16s     8978            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	scrub.go:83,cluster.go:1667,errgroup.go:57: pq: server is not accepting clients
	cluster.go:1688,tpcc.go:179,scrub.go:58,test.go:1237: unexpected node event: 3: dead
	cluster.go:1405,cluster.go:1424,cluster.go:1528,cluster.go:968,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:179,scrub.go:58,test.go:1237: context canceled

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/509c5b130fb1ad0042beb74e083817aa68e4fc92

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1237068&tab=buildLog

The test failed on release-19.1:
	cluster.go:1329,tpcc.go:169,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1237068-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		d
		1h52m40s    11005            0.0            3.1      0.0      0.0      0.0      0.0 delivery
		1h52m40s    11005            0.0           30.6      0.0      0.0      0.0      0.0 newOrder
		1h52m40s    11005            0.0            3.1      0.0      0.0      0.0      0.0 orderStatus
		1h52m40s    11005            0.0           30.0      0.0      0.0      0.0      0.0 payment
		1h52m40s    11005            0.0            3.0      0.0      0.0      0.0      0.0 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		1h52m41s    11005            0.0            3.1      0.0      0.0      0.0      0.0 delivery
		1h52m41s    11005            0.0           30.6      0.0      0.0      0.0      0.0 newOrder
		1h52m41s    11005            0.0            3.1      0.0      0.0      0.0      0.0 orderStatus
		1h52m41s    11005            0.0           30.0      0.0      0.0      0.0      0.0 payment
		1h52m41s    11005            0.0            3.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1688,tpcc.go:179,scrub.go:58,test.go:1237: unexpected node event: 2: dead
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:179,scrub.go:58,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1237068-scrub-index-only-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		2: dead
		1: 3052
		3: 3258
		4: 3115
		Error:  2: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/509c5b130fb1ad0042beb74e083817aa68e4fc92

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1237002&tab=buildLog

The test failed on release-19.1:
	cluster.go:1329,tpcc.go:169,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1237002-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		run.go:428  error in newOrder: dial tcp 10.128.0.9:26257: connect: connection refused
		   6m43s    14049            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   6m43s    14049            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   6m43s    14049            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   6m43s    14049            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   6m43s    14049            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		   6m44s    14049            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   6m44s    14049            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   6m44s    14049            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   6m44s    14049            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   6m44s    14049            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1688,tpcc.go:179,scrub.go:58,test.go:1237: unexpected node event: 3: dead
	cluster.go:953,context.go:90,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:179,scrub.go:58,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1237002-scrub-index-only-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		3: dead
		4: 3090
		2: 3102
		1: 3338
		Error:  3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/9938cb1a2cca4c0350244f76845f0c61391d44a7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1249130&tab=buildLog

The test failed on release-19.1:
	cluster.go:1329,tpcc.go:168,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1249130-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		d
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   9m21s    13653            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   9m21s    13653            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   9m21s    13653            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   9m21s    13653            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   9m21s    13653            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		   9m22s    13653            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   9m22s    13653            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   9m22s    13653            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   9m22s    13653            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   9m22s    13653            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	scrub.go:83,cluster.go:1667,errgroup.go:57: pq: server is not accepting clients
	cluster.go:1688,tpcc.go:178,scrub.go:58,test.go:1237: unexpected node event: 2: dead
	cluster.go:1405,cluster.go:1424,cluster.go:1528,cluster.go:968,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:178,scrub.go:58,test.go:1237: context canceled

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/dd7c697e986fc528da7b12c6c10dcce7f64a486c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1252804&tab=buildLog

The test failed on release-19.1:
	cluster.go:1329,tpcc.go:168,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1252804-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		un.go:428  error in payment: dial tcp 10.142.0.115:26257: connect: connection refused
		   5m19s    12971            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   5m19s    12971            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   5m19s    12971            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   5m19s    12971            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   5m19s    12971            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		   5m20s    12971            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   5m20s    12971            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   5m20s    12971            0.0            0.0      0.0      0.0      0.0      0.0 orderStatus
		   5m20s    12971            0.0            0.0      0.0      0.0      0.0      0.0 payment
		   5m20s    12971            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1688,tpcc.go:178,scrub.go:58,test.go:1237: unexpected node event: 3: dead
	cluster.go:953,context.go:89,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:178,scrub.go:58,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1252804-scrub-index-only-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		3: dead
		4: 4112
		2: 4005
		1: 4041
		Error:  3: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/46f8608c4fe2d94b771beb37bcee19136040fd74

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/index-only/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1253450&tab=buildLog

The test failed on master:
	scrub.go:83,cluster.go:1667,errgroup.go:57: read tcp 172.17.0.2:46506->35.243.153.113:26257: read: connection reset by peer
	cluster.go:1329,tpcc.go:168,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1253450-scrub-index-only-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		d
		    7m4s    14535            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		    7m4s    14535            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		    7m4s    14535            0.0            0.4      0.0      0.0      0.0      0.0 orderStatus
		    7m4s    14535            0.0            0.0      0.0      0.0      0.0      0.0 payment
		    7m4s    14535            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		    7m5s    14535            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		    7m5s    14535            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		    7m5s    14535            0.0            0.4      0.0      0.0      0.0      0.0 orderStatus
		    7m5s    14535            0.0            0.0      0.0      0.0      0.0      0.0 payment
		    7m5s    14535            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1688,tpcc.go:178,scrub.go:58,test.go:1237: Goexit() was called
	cluster.go:953,context.go:89,cluster.go:942,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:178,scrub.go:58,test.go:1237: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1253450-scrub-index-only-tpcc-w-1000 --oneshot --ignore-empty-nodes: exit status 1 5: skipped
		1: dead
		3: 3881
		2: 3445
		4: 3141
		Error:  1: dead

@tbg
Copy link
Member

tbg commented Apr 23, 2019

@bdarnell what are we going to do with this test? The last log is all fundamental overload badness: slow latches, slow and failed heartbeats, and eventually an OOM. If there's anything pathological in here, chances are good that this is the hard way to find it.

@bdarnell
Copy link
Contributor

The scrub test should be changed to use a smaller data set (I'll do that today). Separately, we should have an overloaded tpcc config to uncover issues like these, but not until we can make it non-flaky.

bdarnell added a commit to bdarnell/cockroach that referenced this issue Apr 23, 2019
The scrub roachtest was previously running tpcc-1000 on a cluster of
12 total vcpus, which is not enough (it needs ~double that). This
exposed a lot of interesting issues like cockroachdb#35986, but it's only
incidental to the main purpose of this test (and it's also flaky due
to uninteresting problems associated with overloading).

Switch the test to tpcc-100 so it can be stable; we'll reintroduce a
test dedicated to overload conditions in the future (when we can make
it stable).

Fixes cockroachdb#35985
Fixes cockroachdb#37017

Release note: None
craig bot pushed a commit that referenced this issue Apr 23, 2019
37046: roachtest: Shrink scrub workloads r=lucy-zhang a=bdarnell

The scrub roachtest was previously running tpcc-1000 on a cluster of
12 total vcpus, which is not enough (it needs ~double that). This
exposed a lot of interesting issues like #35986, but it's only
incidental to the main purpose of this test (and it's also flaky due
to uninteresting problems associated with overloading).

Switch the test to tpcc-100 so it can be stable; we'll reintroduce a
test dedicated to overload conditions in the future (when we can make
it stable).

Fixes #35985
Fixes #37017

Release note: None

Co-authored-by: Ben Darnell <[email protected]>
@craig craig bot closed this as completed in #37046 Apr 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants