Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/crdb-chaos/rangefeed=true failed #34600

Closed
cockroach-teamcity opened this issue Feb 5, 2019 · 33 comments · Fixed by #35777
Closed

roachtest: cdc/crdb-chaos/rangefeed=true failed #34600

cockroach-teamcity opened this issue Feb 5, 2019 · 33 comments · Fixed by #35777
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/82026117d83262e87873aad52b8eca2dd0bea335

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1126329&tab=buildLog

The test failed on provisional_201902052032_v2.2.0-alpha.20190211:
	test.go:748,cdc.go:744,cdc.go:215,cdc.go:433: max latency was more than allowed: 21m15.099334977s vs 10m0s

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Feb 5, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Feb 5, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/40e403544d60b1a44b8b1ed961a817c77d67aa31

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1131411&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:610,cdc.go:122,cluster.go:1564,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1131411-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		   1.9      0.0      0.0      0.0      0.0 delivery
		  17m11s     3929            0.0           19.2      0.0      0.0      0.0      0.0 newOrder
		  17m11s     3929            0.0            1.9      0.0      0.0      0.0      0.0 orderStatus
		  17m11s     3929            0.0           18.9      0.0      0.0      0.0      0.0 payment
		  17m11s     3929            0.0            1.9      0.0      0.0      0.0      0.0 stockLevel
		E190209 07:30:05.445925 1 workload/cli/run.go:402  error in newOrder: dial tcp 10.142.0.109:26257: connect: connection refused
		  17m12s     3946            0.0            1.9      0.0      0.0      0.0      0.0 delivery
		  17m12s     3946            0.0           19.2      0.0      0.0      0.0      0.0 newOrder
		  17m12s     3946            0.0            1.9      0.0      0.0      0.0      0.0 orderStatus
		  17m12s     3946            0.0           18.9      0.0      0.0      0.0      0.0 payment
		  17m12s     3946            0.0            1.9      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1585,cdc.go:210,cdc.go:430,test.go:1206: unexpected node event: 2: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b80d241d5693d6ba2bde01ae9167b49b04b47226

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1133193&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:600,cdc.go:117,cdc.go:432,test.go:1206: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1133193-cdc-crdb-chaos-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:2} returned:
		stderr:
		
		stdout:
		Error:  exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/10f8010fa5778e740c057905e2d7664b5fd5d647

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1135549&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:612,cdc.go:124,cluster.go:1564,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1135549-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		    1.8      0.0      0.0      0.0      0.0 delivery
		  22m35s     5044            0.0           18.6      0.0      0.0      0.0      0.0 newOrder
		  22m35s     5044            0.0            1.8      0.0      0.0      0.0      0.0 orderStatus
		  22m35s     5044            0.0           18.0      0.0      0.0      0.0      0.0 payment
		  22m35s     5044            0.0            1.8      0.0      0.0      0.0      0.0 stockLevel
		E190213 07:38:12.898689 1 workload/cli/run.go:402  error in payment: dial tcp 10.142.0.187:26257: connect: connection refused
		  22m36s     5057            0.0            1.8      0.0      0.0      0.0      0.0 delivery
		  22m36s     5057            0.0           18.6      0.0      0.0      0.0      0.0 newOrder
		  22m36s     5057            0.0            1.8      0.0      0.0      0.0      0.0 orderStatus
		  22m36s     5057            0.0           18.0      0.0      0.0      0.0      0.0 payment
		  22m36s     5057            0.0            1.8      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1585,cdc.go:212,cdc.go:432,test.go:1212: unexpected node event: 2: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/acba091f04f3d8ecabf51009bf394951fbd3643c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1137872&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:612,cdc.go:124,cluster.go:1564,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1137872-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		    2855            4.0            1.8     12.1     35.7     35.7     35.7 stockLevel
		  11m51s     2855            2.0            1.9     41.9     44.0     44.0     44.0 delivery
		  11m51s     2855           10.0           19.1     28.3     35.7     35.7     35.7 newOrder
		  11m51s     2855            0.0            1.9      0.0      0.0      0.0      0.0 orderStatus
		  11m51s     2855           24.0           18.5     15.2     23.1     24.1     24.1 payment
		  11m51s     2855            2.0            1.8     11.0     52.4     52.4     52.4 stockLevel
		  11m52s     2855            4.0            1.9     44.0     48.2     48.2     48.2 delivery
		  11m52s     2855            9.0           19.0     32.5     39.8     39.8     39.8 newOrder
		  11m52s     2855            2.0            1.9      5.0      6.0      6.0      6.0 orderStatus
		  11m52s     2855           20.0           18.5     14.2     21.0     26.2     26.2 payment
		  11m52s     2855            3.0            1.8     11.5     27.3     27.3     27.3 stockLevel
		: signal: killed
	cluster.go:1585,cdc.go:212,cdc.go:432,test.go:1212: unexpected status: failed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/d7c56dcb87c8f187e50303c8e32a64836c42515c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1139797&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:600,cdc.go:117,cdc.go:432,test.go:1212: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1139797-cdc-crdb-chaos-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:3} returned:
		stderr:
		
		stdout:
		Error:  exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/4fc4d63ddddeb507564bfe53c6cd79549ff2fd27

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1141650&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:600,cdc.go:117,cdc.go:432,test.go:1212: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1141650-cdc-crdb-chaos-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:1} returned:
		stderr:
		
		stdout:
		I190216 07:01:38.027860 89 ccl/workloadccl/fixture.go:506  loaded warehouse (0s, 100 rows, 0 index entries, 5.1 KiB)
		I190216 07:01:50.550349 90 ccl/workloadccl/fixture.go:506  loaded district (13s, 1000 rows, 0 index entries, 99 KiB)
		Error:  exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8e9a1e310e3e8e37f091b7ca8bd204084ad9e2e5

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1142461&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:600,cdc.go:117,cdc.go:432,test.go:1212: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1142461-cdc-crdb-chaos-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:1} returned:
		stderr:
		
		stdout:
		Error:  exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/bd80a74f882a583d6bb2a04dfdb57b49254bc7ba

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1143393&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:600,cdc.go:117,cdc.go:432,test.go:1212: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1143393-cdc-crdb-chaos-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:1} returned:
		stderr:
		
		stdout:
		Error:  exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/2e16b7357d5a15d87cd284d5f2c12e424ed29ca1

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1146277&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:612,cdc.go:124,cluster.go:1564,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1146277-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		    1.7      0.0      0.0      0.0      0.0 delivery
		   6m23s     1074            0.0           18.0      0.0      0.0      0.0      0.0 newOrder
		   6m23s     1074            1.0            1.8      3.7      3.7      3.7      3.7 orderStatus
		   6m23s     1074            0.0           17.9      0.0      0.0      0.0      0.0 payment
		   6m23s     1074            0.0            1.8      0.0      0.0      0.0      0.0 stockLevel
		E190221 07:09:31.763372 1 workload/cli/run.go:402  error in payment: dial tcp 10.142.0.134:26257: connect: connection refused
		   6m24s     1090            0.0            1.7      0.0      0.0      0.0      0.0 delivery
		   6m24s     1090            0.0           17.9      0.0      0.0      0.0      0.0 newOrder
		   6m24s     1090            0.0            1.8      0.0      0.0      0.0      0.0 orderStatus
		   6m24s     1090            0.0           17.9      0.0      0.0      0.0      0.0 payment
		   6m24s     1090            0.0            1.8      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1585,cdc.go:212,cdc.go:432,test.go:1212: unexpected node event: 2: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/763e90b043b5d732856d3ecf1d7b0d6aa33e3b26

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1147715&tab=buildLog

The test failed on provisional_201902192209_v19.1.0-beta.20190225:
	cdc.go:743,cdc.go:214,cdc.go:432,test.go:1211: max latency was more than allowed: 21m29.204785754s vs 10m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/46d1ab4c06edfd37d875114972c55b44105acd83

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1147903&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:612,cdc.go:124,cluster.go:1564,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1147903-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		    1518            0.0            1.7      0.0      0.0      0.0      0.0 stockLevel
		   7m11s     1518            2.0            1.7     39.8   6174.0   6174.0   6174.0 delivery
		   7m11s     1518           77.0           17.2   2013.3   5368.7   6174.0   7784.6 newOrder
		   7m11s     1518            1.0            1.7      7.1      7.1      7.1      7.1 orderStatus
		   7m11s     1518           25.0           16.8     19.9   7247.8   7784.6   7784.6 payment
		   7m11s     1518            0.0            1.7      0.0      0.0      0.0      0.0 stockLevel
		   7m12s     1518            1.0            1.7     48.2     48.2     48.2     48.2 delivery
		   7m12s     1518           15.0           17.2     35.7     41.9     44.0     44.0 newOrder
		   7m12s     1518            0.0            1.7      0.0      0.0      0.0      0.0 orderStatus
		   7m12s     1518           20.0           16.8     16.3     19.9     19.9     19.9 payment
		   7m12s     1518            2.0            1.7     11.5     17.8     17.8     17.8 stockLevel
		: signal: killed
	cluster.go:1585,cdc.go:212,cdc.go:432,test.go:1211: unexpected status: failed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/d0a93d286ee42ceb94f986b6a06a1afd52cf914e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1149020&tab=buildLog

The test failed on master:
	cdc.go:743,cdc.go:214,cdc.go:432,test.go:1211: max latency was more than allowed: 11m14.784317807s vs 10m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/eaad50f0ea64985a0d7da05abb00cc9f321c5fa9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1149743&tab=buildLog

The test failed on master:
	cdc.go:743,cdc.go:214,cdc.go:432,test.go:1211: max latency was more than allowed: 23m32.353864023s vs 10m0s

@tbg
Copy link
Member

tbg commented Feb 28, 2019

Recent failures are like #35114 (comment).

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/71681f60031c101f17339236e2ba75f7a684d1a1

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1155904&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:612,cdc.go:124,cluster.go:1564,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1155904-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		l
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   2m49s      604            5.0            2.1     54.5     54.5     54.5     54.5 delivery
		   2m49s      604            9.0           20.4     31.5     37.7     37.7     37.7 newOrder
		   2m49s      604            3.0            1.9      7.1      7.6      7.6      7.6 orderStatus
		   2m49s      604           20.0           20.3     14.7     17.8     24.1     24.1 payment
		   2m49s      604            4.0            1.9     12.1     19.9     19.9     19.9 stockLevel
		   2m50s      604            3.0            2.1     46.1     48.2     48.2     48.2 delivery
		   2m50s      604           14.0           20.3     32.5     39.8     44.0     44.0 newOrder
		   2m50s      604            3.0            1.9      6.3      8.4      8.4      8.4 orderStatus
		   2m50s      604           29.0           20.4     15.2     22.0     22.0     22.0 payment
		   2m50s      604            1.0            1.9     12.6     12.6     12.6     12.6 stockLevel
		: signal: killed
	cluster.go:1585,cdc.go:212,cdc.go:432,test.go:1211: unexpected status: failed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/71681f60031c101f17339236e2ba75f7a684d1a1

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1155867&tab=buildLog

The test failed on master:
	cdc.go:743,cdc.go:214,cdc.go:432,test.go:1211: max latency was more than allowed: 26m6.618677228s vs 10m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/032c4980720abc1bdd71e4428e4111e6e6383297

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1158877&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:600,cdc.go:117,cdc.go:432,test.go:1211: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1158877-cdc-crdb-chaos-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:2} returned:
		stderr:
		
		stdout:
		I190302 07:08:16.418993 1 ccl/workloadccl/cliccl/fixtures.go:292  starting load of 9 tables
		Error:  exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/de1793532332fb64fca27cafe92d2481d900a5a0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1160394&tab=buildLog

The test failed on master:
	cluster.go:1226,cdc.go:600,cdc.go:117,cdc.go:432,test.go:1211: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1160394-cdc-crdb-chaos-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:3} returned:
		stderr:
		
		stdout:
		I190304 07:13:33.304671 1 ccl/workloadccl/cliccl/fixtures.go:292  starting load of 9 tables
		Error:  exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/959dcf7de0f94cfcfa0062387b109adebd1f11da

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1163702&tab=buildLog

The test failed on master:
	cluster.go:1244,cdc.go:612,cdc.go:124,cluster.go:1582,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1163702-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		   0.0 newOrder
		  23m47s     5241            0.0            1.6      0.0      0.0      0.0      0.0 orderStatus
		  23m47s     5241            0.0           16.5      0.0      0.0      0.0      0.0 payment
		  23m47s     5241            0.0            1.6      0.0      0.0      0.0      0.0 stockLevel
		E190306 07:43:37.930046 1 workload/cli/run.go:419  error in newOrder: dial tcp 10.142.0.101:26257: connect: connection refused
		  23m48s     5273            0.0            1.6      0.0      0.0      0.0      0.0 delivery
		  23m48s     5273            0.0           16.9      0.0      0.0      0.0      0.0 newOrder
		  23m48s     5273            0.0            1.6      0.0      0.0      0.0      0.0 orderStatus
		  23m48s     5273            0.0           16.5      0.0      0.0      0.0      0.0 payment
		  23m48s     5273            0.0            1.6      0.0      0.0      0.0      0.0 stockLevel
		E190306 07:43:38.995501 1 workload/cli/run.go:419  error in stockLevel: dial tcp 10.142.0.101:26257: connect: connection refused
		: signal: killed
	cluster.go:1603,cdc.go:212,cdc.go:432,test.go:1214: unexpected node event: 1: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5f7de3c35348e847dc0d4ab1ba8d76574e1706c2

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1169980&tab=buildLog

The test failed on master:
	cdc.go:743,cdc.go:214,cdc.go:432,test.go:1214: max latency was more than allowed: 10m51.688034502s vs 10m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a119a3a158725c9e3f9b8084d9398601c0e67007

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1170795&tab=buildLog

The test failed on master:
	cluster.go:1244,cdc.go:600,cdc.go:117,cdc.go:432,test.go:1214: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1170795-cdc-crdb-chaos-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:2} returned:
		stderr:
		
		stdout:
		I190311 06:14:39.079797 1 ccl/workloadccl/cliccl/fixtures.go:292  starting load of 9 tables
		I190311 06:14:39.805304 59 ccl/workloadccl/fixture.go:527  loaded district (1s, 1000 rows, 0 index entries, 99 KiB)
		Error:  exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/9d058d53c8a82fceb2205f1827c26f1bf36c32ba

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1172386&tab=buildLog

The test failed on master:
	cluster.go:1244,cdc.go:601,cdc.go:117,cdc.go:432,test.go:1214: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1172386-cdc-crdb-chaos-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:1} returned:
		stderr:
		
		stdout:
		I190312 06:09:15.027397 1 ccl/workloadccl/cliccl/fixtures.go:292  starting load of 9 tables
		I190312 06:09:28.516380 48 ccl/workloadccl/fixture.go:527  loaded district (13s, 1000 rows, 0 index entries, 99 KiB)
		Error:  exit status 255
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a512e390f7f2f2629f3f811bab5866c46e3e5713

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1174765&tab=buildLog

The test failed on provisional_201903122203_v19.1.0-beta.20190318:
	cdc.go:756,cdc.go:215,cdc.go:433,test.go:1214: max latency was more than allowed: 14m21.284138172s vs 10m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/57e825a7940495b67e0cc7213a5fabc24e12be0e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1176948&tab=buildLog

The test failed on master:
	cluster.go:1251,cdc.go:625,cdc.go:125,cluster.go:1589,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1176948-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		    1.7      0.0      0.0      0.0      0.0 delivery
		  20m25s     4216            0.0           17.1      0.0      0.0      0.0      0.0 newOrder
		  20m25s     4216            0.0            1.7      0.0      0.0      0.0      0.0 orderStatus
		  20m25s     4216            0.0           16.6      0.0      0.0      0.0      0.0 payment
		  20m25s     4216            0.0            1.6      0.0      0.0      0.0      0.0 stockLevel
		E190314 06:41:31.560174 1 workload/cli/run.go:419  error in payment: dial tcp 10.142.0.115:26257: connect: connection refused
		  20m26s     4233            0.0            1.7      0.0      0.0      0.0      0.0 delivery
		  20m26s     4233            0.0           17.1      0.0      0.0      0.0      0.0 newOrder
		  20m26s     4233            0.0            1.7      0.0      0.0      0.0      0.0 orderStatus
		  20m26s     4233            0.0           16.6      0.0      0.0      0.0      0.0 payment
		  20m26s     4233            0.0            1.6      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1610,cdc.go:213,cdc.go:433,test.go:1214: unexpected node event: 2: dead

@tbg
Copy link
Member

tbg commented Mar 14, 2019

cc @nvanbenschoten / @danhhz

190314 06:41:18.513779 470 server/status/runtime.go:464  [n2] runtime stats: 5.0 GiB RSS, 1203 goroutines, 184 MiB/344 MiB/675 MiB GO alloc/idle/total, 3.6 GiB/4.4 GiB CGO alloc/total, 49900.9 CGO/sec, 82.3/21.8 %(u/s)time, 0.0 %gc (4x), 7.2 MiB/78 MiB (r/w)net
I190314 06:41:21.153683 322988 storage/store_snapshot.go:818  [n2,raftsnapshot,s2,r357/2:/Table/56/1/7{5/6/-4…-7/6/-2…}] streamed snapshot to (n1,s1):1: kv pairs: 588072, log entries: 74, rate-limit: 8.0 MiB/sec, 5.34s
I190314 06:41:28.516211 470 server/status/runtime.go:464  [n2] runtime stats: 5.0 GiB RSS, 1208 goroutines, 301 MiB/302 MiB/675 MiB GO alloc/idle/total, 3.6 GiB/4.4 GiB CGO alloc/total, 43185.3 CGO/sec, 72.4/23.3 %(u/s)time, 0.0 %gc (2x), 7.5 MiB/35 MiB (r/w)net
I190314 06:41:28.755717 326561 storage/store_snapshot.go:775  [n2,raftsnapshot,s2,r30/2:/Table/53/2{-/37/10/…}] sending Raft snapshot 4da2c536 at applied index 16358
E190314 06:41:28.766573 309986 util/log/crash_reporting.go:202  [n2,s2,r28/2:/Table/53/1/{39/6/-…-79/1/-…},rangefeed] a panic has occurred!
panic: resolved timestamp regression, was 1552545674.481467572,0, recomputed as 1552545640.548937542,0 [recovered]
	panic: resolved timestamp regression, was 1552545674.481467572,0, recomputed as 1552545640.548937542,0

goroutine 309986 [running]:
panic(0x2d7c2e0, 0xc00678ab30)
	/usr/local/go/src/runtime/panic.go:556 +0x2cb fp=0xc0195c9520 sp=0xc0195c9490 pc=0x72a5bb
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc000942000, 0x39db560, 0xc008899dd0)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:185 +0x11f fp=0xc0195c9580 sp=0xc0195c9520 pc=0x108479f
runtime.call32(0x0, 0x33dc130, 0xc00dcf7430, 0x1800000018)
	/usr/local/go/src/runtime/asm_amd64.s:522 +0x3b fp=0xc0195c95b0 sp=0xc0195c9580 pc=0x75919b
panic(0x2d7c2e0, 0xc00678ab30)
	/usr/local/go/src/runtime/panic.go:513 +0x1b9 fp=0xc0195c9640 sp=0xc0195c95b0 pc=0x72a4a9
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*resolvedTimestamp).recompute(0xc007307880, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/resolved_timestamp.go:189 +0x236 fp=0xc0195c96c8 sp=0xc0195c9640 pc=0x18bacb6
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*resolvedTimestamp).assertNoChange(0xc007307880)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/resolved_timestamp.go:200 +0x57 fp=0xc0195c9760 sp=0xc0195c96c8 pc=0x18bad37
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*resolvedTimestamp).ConsumeLogicalOp(0xc007307880, 0x0, 0x0, 0x0, 0xc01ddc4f60, 0x0, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/resolved_timestamp.go:134 +0x89 fp=0xc0195c97b0 sp=0xc0195c9760 pc=0x18ba559
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*Processor).consumeLogicalOps(0xc0073077c0, 0x39db4a0, 0xc00bb05d80, 0xc00fb062d0, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/processor.go:535 +0x21b fp=0xc0195c98a8 sp=0xc0195c97b0 pc=0x18b771b
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*Processor).consumeEvent(0xc0073077c0, 0x39db4a0, 0xc00bb05d80, 0xc00fb062d0, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/processor.go:487 +0x251 fp=0xc0195c9918 sp=0xc0195c98a8 pc=0x18b74d1
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*Processor).Start.func1(0x39db4a0, 0xc00bb05d80)
	/go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/processor.go:245 +0x9bd fp=0xc0195c9f88 sp=0xc0195c9918 pc=0x18bd85d
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc018bf1b80, 0xc000942000, 0xc008899d40)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:200 +0xe1 fp=0xc0195c9fc8 sp=0xc0195c9f88 pc=0x1086921
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc0195c9fd0 sp=0xc0195c9fc8 pc=0x75aeb1
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:193 +0xa8

@danhhz
Copy link
Contributor

danhhz commented Mar 14, 2019

Woah, interesting

@danhhz
Copy link
Contributor

danhhz commented Mar 14, 2019

I just saw this on a run of cdc/tpcc-1000 I was doing with #35745. @ajwerner any chance this could be #35130?

E190314 19:02:30.978867 960980 util/log/crash_reporting.go:202  [n1,s1,r317/1:/Table/60/1/"{<\xad\…-@\x8c\…},rangefeed] a panic has occurred!
panic: resolved timestamp regression, was 1552590117.199945509,0, recomputed as 1552590102.203202946,1 [recovered]
        panic: resolved timestamp regression, was 1552590117.199945509,0, recomputed as 1552590102.203202946,1

goroutine 960980 [running]:
panic(0x2d79ae0, 0xc03b6d54d0)
        /usr/local/go/src/runtime/panic.go:556 +0x2cb fp=0xc01a3b9520 sp=0xc01a3b9490 pc=0x72a5bb
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Recover(0xc0001e06c0, 0x39d8180, 0xc015da5860)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:185 +0x11f fp=0xc01a3b9580 sp=0xc01a3b9520 pc=0x108452f
runtime.call32(0x0, 0x33d92b0, 0xc025c1db60, 0x1800000018)
        /usr/local/go/src/runtime/asm_amd64.s:522 +0x3b fp=0xc01a3b95b0 sp=0xc01a3b9580 pc=0x75919b
panic(0x2d79ae0, 0xc03b6d54d0)
        /usr/local/go/src/runtime/panic.go:513 +0x1b9 fp=0xc01a3b9640 sp=0xc01a3b95b0 pc=0x72a4a9
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*resolvedTimestamp).recompute(0xc012b10ac0, 0x2)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/resolved_timestamp.go:189 +0x236 fp=0xc01a3b96c8 sp=0xc01a3b9640 pc=0x18ba476
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*resolvedTimestamp).assertNoChange(0xc012b10ac0)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/resolved_timestamp.go:200 +0x57 fp=0xc01a3b9760 sp=0xc01a3b96c8 pc=0x18ba4f7
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*resolvedTimestamp).ConsumeLogicalOp(0xc012b10ac0, 0x0, 0x0, 0x0, 0xc035c5dc80, 0x0, 0x0, 0x2)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/resolved_timestamp.go:134 +0x89 fp=0xc01a3b97b0 sp=0xc01a3b9760 pc=0x18b9d19
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*Processor).consumeLogicalOps(0xc012b10a00, 0x39d80c0, 0xc0273b2400, 0xc03f2b0900, 0x9, 0x10)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/processor.go:535 +0x21b fp=0xc01a3b98a8 sp=0xc01a3b97b0 pc=0x18b6edb
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*Processor).consumeEvent(0xc012b10a00, 0x39d80c0, 0xc0273b2400, 0xc03f2b0900, 0x9, 0x10, 0x0, 0x0, 0x0, 0x0, ...)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/processor.go:487 +0x251 fp=0xc01a3b9918 sp=0xc01a3b98a8 pc=0x18b6c91
github.com/cockroachdb/cockroach/pkg/storage/rangefeed.(*Processor).Start.func1(0x39d80c0, 0xc0273b2400)
        /go/src/github.com/cockroachdb/cockroach/pkg/storage/rangefeed/processor.go:245 +0x9bd fp=0xc01a3b9f88 sp=0xc01a3b9918 pc=0x18bd01d
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1(0xc02e787a30, 0xc0001e06c0, 0xc015da5800)
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:200 +0xe1 fp=0xc01a3b9fc8 sp=0xc01a3b9f88 pc=0x10866b1
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc01a3b9fd0 sp=0xc01a3b9fc8 pc=0x75aeb1
created by github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker
        /go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:193 +0xa8

@danhhz
Copy link
Contributor

danhhz commented Mar 14, 2019

Hmmm the last failure claims to have happened at 57e825a which doesn't include that PR.

@danhhz
Copy link
Contributor

danhhz commented Mar 14, 2019

And mine doesn't either. Okay no clue what's going on here, but I've never seen this panic before 24h ago

@ajwerner
Copy link
Contributor

Yeah I just landed the referenced change. I don't know all that much about the resolved timestamp, how does it relate to the closed timestamp? Before that change it was absolutely possible for a closed timestamp to "regress" for a given range in the face of lease transfer but any consumer of closed timestamps can safely use the maximum seen value.

@danhhz
Copy link
Contributor

danhhz commented Mar 14, 2019

Resolved timestamp = closed timestamp + no intents outstanding. RangeFeeds do a scan of all unresolved intents on the range when they start up, and then when they get a "Logical Op" that resolves an intent, they remove it from being tracked. Similarly when they get a logical op that creates an intent or advances the timestamp of an intent, they put it in the min heap or move it. This or a closed timestamp can advance the resolved timestamp, at which time RangeFeed will emit it. I'd start with https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/rangefeed/resolved_timestamp.go which is pretty well factored out.

@nvanbenschoten nvanbenschoten self-assigned this Mar 14, 2019
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 14, 2019
Informs cockroachdb#34600.

It is critical that `unresolvedIntentQueue` properly handle negative
reference counts to avoid leaking references before a rangefeed has
finished its resolved timestamp initialization scan. However, once this
scan is complete, reference counts on transactions should never drop
below zero. Such an occurrence would indicate that an intent was lost
either during the initial scan or somewhere in the logical ops stream.

Based on the stacktraces in cockroachdb#34600, I believe this is what is happening
because we can see that an `MVCCCommitIntentOp` is triggering the
assertion. This commit will make this more explicit and should hopefully
also fire more because it won't rely on the intent with a negative
refcount being the oldest intent tracked to fire.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 15, 2019
Fixes cockroachdb#34600.

Before this commit, rangefeed made the assumption that an aborted intent
implied an aborted transaction (after the rangefeed was in a steady-state).
It used this assumption to eagerly discard the bookkeeping for a transaction
after its first MVCCAbortIntent operation was seen as an optimization.

This was incorrect. A transaction can "abort" an intent even if it commits.
This is possible if the intent was only written at a previous epoch and not
written in the epoch that ended up finalizing the transaction. This commit
removes this assumption, which I confirmed fixes the resolved timestamp
regression using the new assertion from cockroachdb#35772.

I believe this could have two effects:
- it could trigger the assertion failure from cockroachdb#34600 if the transaction
  experiencing the incorrect assumption was the oldest being tracked at
  the time of the assumption.
- it could cause resolved timestamp stalls and prevent all future forward
  progress of resolved timestamps if the transaction experiencing the incorrect
  assumption was not the oldest being tracked at the time of the assumption.

Release note: None
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/70e3468e7ed5fb30b10eaaf972acbb0f16099d01

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1178890&tab=buildLog

The test failed on master:
	cluster.go:1267,cdc.go:625,cdc.go:125,cluster.go:1605,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1178890-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		    1.0      0.0      0.0      0.0      0.0 delivery
		   7m47s     1630            0.0           10.9      0.0      0.0      0.0      0.0 newOrder
		   7m47s     1630            0.0            1.1      0.0      0.0      0.0      0.0 orderStatus
		   7m47s     1630            0.0           10.4      0.0      0.0      0.0      0.0 payment
		   7m47s     1630            0.0            0.7      0.0      0.0      0.0      0.0 stockLevel
		   7m48s     1651            0.0            1.0      0.0      0.0      0.0      0.0 delivery
		   7m48s     1651            0.0           10.8      0.0      0.0      0.0      0.0 newOrder
		   7m48s     1651            0.0            1.1      0.0      0.0      0.0      0.0 orderStatus
		   7m48s     1651            0.0           10.4      0.0      0.0      0.0      0.0 payment
		   7m48s     1651            0.0            0.7      0.0      0.0      0.0      0.0 stockLevel
		E190315 06:26:06.344892 1 workload/cli/run.go:420  error in payment: dial tcp 10.142.0.127:26257: connect: connection refused
		: signal: killed
	cluster.go:1626,cdc.go:213,cdc.go:433,test.go:1214: unexpected node event: 2: dead

@tbg
Copy link
Member

tbg commented Mar 15, 2019

Here it goes again, but looks like @nvanbenschoten already figured this out: #35777

E190315 06:26:02.753584 22498 util/log/crash_reporting.go:202  [n2,s2,r78/3:/Table/53/1/"\x{b8wz\…-e0i\x…},rangefeed] a panic has occurred!
panic: resolved timestamp regression, was 1552631155.802869173,0, recomputed as 1552631153.637038233,0 [recovered]
	panic: resolved timestamp regression, was 1552631155.802869173,0, recomputed as 1552631153.637038233,0

goroutine 22498 [running]:

craig bot pushed a commit that referenced this issue Mar 15, 2019
35772: rangefeed: improve assertion when txn refcount becomes negative r=nvanbenschoten a=nvanbenschoten

Informs #34600.

It is critical that `unresolvedIntentQueue` properly handle negative
reference counts to avoid leaking references before a rangefeed has
finished its resolved timestamp initialization scan. However, once this
scan is complete, reference counts on transactions should never drop
below zero. Such an occurrence would indicate that an intent was lost
either during the initial scan or somewhere in the logical ops stream.

Based on the stacktraces in #34600, I believe this is what is happening
because we can see that an `MVCCCommitIntentOp` is triggering the
assertion. This commit will make this more explicit and should hopefully
also fire more because it won't rely on the intent with a negative
refcount being the oldest intent tracked to fire.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 15, 2019
Fixes cockroachdb#34600.

Before this commit, rangefeed made the assumption that an aborted intent
implied an aborted transaction (after the rangefeed was in a steady-state).
It used this assumption to eagerly discard the bookkeeping for a transaction
after its first MVCCAbortIntent operation was seen as an optimization.

This was incorrect. A transaction can "abort" an intent even if it commits.
This is possible if the intent was only written at a previous epoch and not
written in the epoch that ended up finalizing the transaction. This commit
removes this assumption, which I confirmed fixes the resolved timestamp
regression using the new assertion from cockroachdb#35772.

I believe this could have two effects:
- it could trigger the assertion failure from cockroachdb#34600 if the transaction
  experiencing the incorrect assumption was the oldest being tracked at
  the time of the assumption.
- it could cause resolved timestamp stalls and prevent all future forward
  progress of resolved timestamps if the transaction experiencing the incorrect
  assumption was not the oldest being tracked at the time of the assumption.

Release note: None
craig bot pushed a commit that referenced this issue Mar 15, 2019
35777: rangefeed: don't discard ref count for txn after intent abort r=nvanbenschoten a=nvanbenschoten

Fixes #34600.

Before this commit, rangefeed made the assumption that an aborted intent
implied an aborted transaction (after the rangefeed was in a steady-state).
It used this assumption to eagerly discard the bookkeeping for a transaction
after its first MVCCAbortIntent operation was seen as an optimization.

This was incorrect. A transaction can "abort" an intent even if it commits.
This is possible if the intent was only written at a previous epoch and not
written in the epoch that ended up finalizing the transaction. This commit
removes this assumption, which I confirmed fixes the resolved timestamp
regression using the new assertion from #35772.

I believe this could have two effects:
- it could trigger the assertion failure from #34600 if the transaction
  experiencing the incorrect assumption was the oldest being tracked at
  the time of the assumption.
- it could cause resolved timestamp stalls and prevent all future forward
  progress of resolved timestamps if the transaction experiencing the incorrect
  assumption was not the oldest being tracked at the time of the assumption.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in #35777 Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants