Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/initial-scan/rangefeed=true failed #37112

Closed
cockroach-teamcity opened this issue Apr 25, 2019 · 35 comments
Closed

roachtest: cdc/initial-scan/rangefeed=true failed #37112

cockroach-teamcity opened this issue Apr 25, 2019 · 35 comments
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/25dd36f0139bf65b80758deeeccf35ee17ebd622

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1260051&tab=buildLog

The test failed on release-19.1:
	cdc.go:176,cluster.go:1761,errgroup.go:57: read tcp 172.17.0.2:53168->35.229.112.5:26257: read: connection reset by peer
	cluster.go:1423,cdc.go:738,cdc.go:135,cluster.go:1761,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1260051-cdc-initial-scan-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1782,cdc.go:223,cdc.go:485,test.go:1245: Goexit() was called

@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Apr 25, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Apr 25, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/73765b6d168fb999466756b112fd590747a3a8c4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1266059&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cdc.go:176,cluster.go:1814,errgroup.go:57: read tcp 172.17.0.2:39810->34.73.229.13:26257: read: connection reset by peer
	cluster.go:1476,cdc.go:743,cdc.go:135,cluster.go:1814,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1266059-cdc-initial-scan-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1835,cdc.go:223,cdc.go:490,test.go:1253: Goexit() was called

@andreimatei
Copy link
Contributor

cdc.go:176,cluster.go:1814,errgroup.go:57: read tcp 172.17.0.2:39810->34.73.229.13:26257: read: connection reset by peer
comes from running this statement against the cluster:
SET CLUSTER SETTING kv.closed_timestamp.target_duration='10s

Probably the cluster had go something wrong with it... Dan, would you mind taking this one?

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7b2651400b2003d0a381cba9dbfc0b7bc0dfee00

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1293898&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1474,cdc.go:732,cdc.go:128,cdc.go:490,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1293898-cdc-initial-scan-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:3} returned:
		stderr:
		
		stdout:
		t-status reply 0
		debug3: receive packet: type 96
		debug2: channel 0: rcvd eof
		debug2: channel 0: output open -> drain
		debug2: channel 0: obuf empty
		debug2: channel 0: close_write
		debug2: channel 0: output drain -> closed
		debug3: receive packet: type 97
		debug2: channel 0: rcvd close
		debug3: channel 0: will not send data after close
		debug2: channel 0: almost dead
		debug2: channel 0: gc: notify user
		debug2: channel 0: gc: user detached
		debug2: channel 0: send close
		debug3: send packet: type 97
		debug2: channel 0: is dead
		debug2: channel 0: garbage collecting
		debug1: channel 0: free: client-session, nchannels 1
		debug3: channel 0: status: The following connections are open:
		  #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1)
		
		debug3: send packet: type 1
		debug1: fd 0 clearing O_NONBLOCK
		debug1: fd 1 clearing O_NONBLOCK
		debug3: fd 2 is not O_NONBLOCK
		Transferred: sent 3288, received 4692 bytes, in 346.4 seconds
		Bytes per second: sent 9.5, received 13.5
		debug1: Exit status 1
		: exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/923a3b2a6f4a6492883141092280d1041de1381a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1295056&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1474,cdc.go:732,cdc.go:128,cdc.go:490,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1295056-cdc-initial-scan-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:3} returned:
		stderr:
		
		stdout:
		 open -> drain
		debug2: channel 0: obuf empty
		debug2: channel 0: close_write
		debug2: channel 0: output drain -> closed
		debug3: receive packet: type 98
		debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
		debug3: receive packet: type 97
		debug2: channel 0: rcvd close
		debug3: channel 0: will not send data after close
		debug2: channel 0: almost dead
		debug2: channel 0: gc: notify user
		debug2: channel 0: gc: user detached
		debug2: channel 0: send close
		debug3: send packet: type 97
		debug2: channel 0: is dead
		debug2: channel 0: garbage collecting
		debug1: channel 0: free: client-session, nchannels 1
		debug3: channel 0: status: The following connections are open:
		  #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1)
		
		debug3: send packet: type 1
		debug1: fd 0 clearing O_NONBLOCK
		debug1: fd 1 clearing O_NONBLOCK
		debug3: fd 2 is not O_NONBLOCK
		Transferred: sent 3288, received 4676 bytes, in 316.2 seconds
		Bytes per second: sent 10.4, received 14.8
		debug1: Exit status 1
		: exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/cab299a0ef983f8b4ffe5d724e44587d9665d3a3

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1295811&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1474,cdc.go:732,cdc.go:128,cdc.go:490,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1295811-cdc-initial-scan-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:2} returned:
		stderr:
		
		stdout:
		 open -> drain
		debug2: channel 0: obuf empty
		debug2: channel 0: close_write
		debug2: channel 0: output drain -> closed
		debug3: receive packet: type 98
		debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
		debug3: receive packet: type 97
		debug2: channel 0: rcvd close
		debug3: channel 0: will not send data after close
		debug2: channel 0: almost dead
		debug2: channel 0: gc: notify user
		debug2: channel 0: gc: user detached
		debug2: channel 0: send close
		debug3: send packet: type 97
		debug2: channel 0: is dead
		debug2: channel 0: garbage collecting
		debug1: channel 0: free: client-session, nchannels 1
		debug3: channel 0: status: The following connections are open:
		  #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1)
		
		debug3: send packet: type 1
		debug1: fd 0 clearing O_NONBLOCK
		debug1: fd 1 clearing O_NONBLOCK
		debug3: fd 2 is not O_NONBLOCK
		Transferred: sent 3252, received 4680 bytes, in 307.0 seconds
		Bytes per second: sent 10.6, received 15.2
		debug1: Exit status 1
		: exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/58c567a325056033b326cb9c4ed9ba490e8956da

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1296592&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1474,cdc.go:732,cdc.go:128,cdc.go:490,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1296592-cdc-initial-scan-rangefeed-true:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:3} returned:
		stderr:
		
		stdout:
		-status reply 0
		debug3: receive packet: type 96
		debug2: channel 0: rcvd eof
		debug2: channel 0: output open -> drain
		debug2: channel 0: obuf empty
		debug2: channel 0: close_write
		debug2: channel 0: output drain -> closed
		debug3: receive packet: type 97
		debug2: channel 0: rcvd close
		debug3: channel 0: will not send data after close
		debug2: channel 0: almost dead
		debug2: channel 0: gc: notify user
		debug2: channel 0: gc: user detached
		debug2: channel 0: send close
		debug3: send packet: type 97
		debug2: channel 0: is dead
		debug2: channel 0: garbage collecting
		debug1: channel 0: free: client-session, nchannels 1
		debug3: channel 0: status: The following connections are open:
		  #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1)
		
		debug3: send packet: type 1
		debug1: fd 0 clearing O_NONBLOCK
		debug1: fd 1 clearing O_NONBLOCK
		debug3: fd 2 is not O_NONBLOCK
		Transferred: sent 3252, received 4680 bytes, in 314.2 seconds
		Bytes per second: sent 10.4, received 14.9
		debug1: Exit status 1
		: exit status 1
		: exit status 1

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity

This comment has been minimized.

@nvanbenschoten
Copy link
Member

Previous two issues addressed by #37701.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/c280de40c2bcab93c41fe82bef8353a5ecd95ac4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1311970&tab=buildLog

The test failed on branch=master, cloud=gce:
	cdc.go:176,cluster.go:1854,errgroup.go:57: read tcp 172.17.0.2:57810->35.231.190.109:26257: read: connection reset by peer
	cluster.go:1516,cdc.go:747,cdc.go:135,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1311970-cdc-initial-scan-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1875,cdc.go:223,cdc.go:490,test.go:1251: Goexit() was called

@danhhz
Copy link
Contributor

danhhz commented May 28, 2019

I could be blind, but I don't see an error anywhere for this last failure

@tbg tbg mentioned this issue Jun 4, 2019
7 tasks
@tbg
Copy link
Member

tbg commented Jun 4, 2019

[06:29:25]
--- FAIL: cdc/initial-scan/rangefeed=true (739.24s)
[06:29:25]
	cdc.go:176,cluster.go:1854,errgroup.go:57: read tcp 172.17.0.2:57810->35.231.190.109:26257: read: connection reset by peer
[06:29:25]
	cluster.go:1516,cdc.go:747,cdc.go:135,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1311970-cdc-initial-scan-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
[06:29:25]
		stderr:
[06:29:25]
		
[06:29:25]
		stdout:
[06:29:25]
		: signal: killed
[06:29:25]
	cluster.go:1875,cdc.go:223,cdc.go:490,test.go:1251: Goexit() was called

I had to fish this from the main build log. It should be in the test file. I'll fix this if @andreimatei isn't already fixing it with #30977.

Now for the actual error, the first one matters and is

if _, err := db.Exec(
`SET CLUSTER SETTING kv.closed_timestamp.target_duration='10s'`,
); err != nil {
t.Fatal(err)
}

Unfortunately I don't know what's up there, the node is running. It's never setting the cluster setting though (that would be logged). So this is either a networking issue between the test runner and roachtest or some failure in the connection handling inside of CRDB, or some failure further down the stack, but "connection reset by peer" I think would occur if we just closed the connection on a client, perhaps because some context timeout misfires. @andreimatei you know the v3 code a bit -- is that something we have?

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5a88de2233e1405c0553f2d5380fd24218fac3d2

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1324169&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cdc.go:173,cluster.go:1851,errgroup.go:57: read tcp 172.17.0.2:36394->35.231.106.90:26257: read: connection reset by peer
	cluster.go:1513,cdc.go:744,cdc.go:132,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1324169-cdc-initial-scan-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1872,cdc.go:220,cdc.go:487,test.go:1248: Goexit() was called

@andreimatei
Copy link
Contributor

but "connection reset by peer" I think would occur if we just closed the connection on a client, perhaps because some context timeout misfires. @andreimatei you know the v3 code a bit -- is that something we have?

No, I don't think that's something we have. I don't think the server ever closes a connection without the client having asked for it.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5c0b1644f9f9fe65bfb8cf3f7a5af2595bd859a8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1341218&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1513,cdc.go:744,cdc.go:132,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1341218-cdc-initial-scan-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		hprod/install/cluster_synced.go:133
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).pgurls.func1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1349
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1477
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1333
		pgurls
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).pgurls.func1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1350
		github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1477
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1333: 
		I190616 06:26:07.637686 9 cluster_synced.go:1559  command failed
		: exit status 1
	cluster.go:1872,cdc.go:220,cdc.go:487,test.go:1248: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/0854bf6d9dd30b4893c19a6c0c3a08809c3748c8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1351925&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cdc.go:173,cluster.go:1851,errgroup.go:57: read tcp 172.17.0.2:33404->35.231.31.131:26257: read: connection reset by peer
	cluster.go:1513,cdc.go:744,cdc.go:132,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1351925-cdc-initial-scan-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1872,cdc.go:220,cdc.go:487,test.go:1251: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/02056538e63bd33adbc24efee05ff94385c19fc8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1351907&tab=buildLog

The test failed on branch=master, cloud=gce:
	cdc.go:173,cluster.go:1851,errgroup.go:57: read tcp 172.17.0.2:45988->34.74.29.162:26257: read: connection reset by peer
	cluster.go:1513,cdc.go:744,cdc.go:132,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1351907-cdc-initial-scan-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1872,cdc.go:220,cdc.go:487,test.go:1251: Goexit() was called

@cockroach-teamcity

This comment has been minimized.

@jordanlewis
Copy link
Member

Latest failure is bogus (ubuntu server flake)

@cockroach-teamcity

This comment has been minimized.

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on master@bf8d6db2ef9b66d301996cbb8ebbb15b7c978d1a:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191220-1656773/cdc/initial-scan/rangefeed=true/run_1
	cluster.go:1786,cdc.go:791,cdc.go:136,cluster.go:2145,errgroup.go:57: error with attached stack trace:
		    main.execCmd
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:399
		    main.(*cluster).RunL
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1820
		    main.(*cluster).Run
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1784
		    main.(*tpccWorkload).run
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:791
		    main.cdcBasicTest.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:136
		    main.(*monitor).Go.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2145
		    github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup.(*Group).Go.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - error with embedded safe details: %s returned:
		    stderr:
		    %s
		    stdout:
		    %s
		    -- arg 1: <string>
		    -- arg 2: <string>
		    -- arg 3: <string>
		  - /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1656773-1576827948-21-n4cpu16:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		    stderr:
		    
		    stdout:
		        2.6      0.0      0.0      0.0      0.0 delivery
		       18.0s      230            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		       18.0s      230            0.0            2.4      0.0      0.0      0.0      0.0 orderStatus
		       18.0s      230           37.0            2.1  14495.5  15032.4  15032.4  15032.4 payment
		       18.0s      230            0.0            2.3      0.0      0.0      0.0      0.0 stockLevel
		    E191220 08:11:25.207097 1 workload/cli/run.go:444  error in payment: dial tcp 10.128.0.105:26257: connect: connection refused
		       19.0s      379            0.0            2.4      0.0      0.0      0.0      0.0 delivery
		       19.0s      379            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		       19.0s      379            0.0            2.3      0.0      0.0      0.0      0.0 orderStatus
		       19.0s      379            0.0            2.0      0.0      0.0      0.0      0.0 payment
		       19.0s      379            0.0            2.2      0.0      0.0      0.0      0.0 stockLevel:
		  - context canceled
Repro

Artifacts: /cdc/initial-scan/rangefeed=true

make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on provisional_202001090312_v20.1.0-alpha.20200113@617cb39e3ced263cb4c253aa2e3edd172368f7cf:

The test failed on branch=provisional_202001090312_v20.1.0-alpha.20200113, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200109-1680508/cdc/initial-scan/rangefeed=true/run_1
	cdc.go:912,cdc.go:226,cdc.go:535,test_runner.go:716: initial scan did not complete
Repro

Artifacts: /cdc/initial-scan/rangefeed=true

make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on release-19.1@3d6bb038f3afcf805fb48dddcffa342195d3d7e0:

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200109-1680663/cdc/initial-scan/rangefeed=true/run_1
	cdc.go:912,cdc.go:226,cdc.go:535,test_runner.go:716: initial scan did not complete
Repro

Artifacts: /cdc/initial-scan/rangefeed=true

make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on release-19.2@94eeef00ebd472e3ab052a6de07b428059351b8d:

The test failed on branch=release-19.2, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200109-1680681/cdc/initial-scan/rangefeed=true/run_1
	cdc.go:912,cdc.go:226,cdc.go:535,test_runner.go:716: initial scan did not complete
Repro

Artifacts: /cdc/initial-scan/rangefeed=true

make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on release-19.2@486a8ac25e0470211f6f7659b26c82950ed013be:

The test failed on branch=release-19.2, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200110-1683261/cdc/initial-scan/rangefeed=true/run_1
	cdc.go:912,cdc.go:226,cdc.go:535,test_runner.go:716: initial scan did not complete
Repro

Artifacts: /cdc/initial-scan/rangefeed=true

make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on master@413fd1e46c2936029458db495efa242acb1b7d52:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200110-1683243/cdc/initial-scan/rangefeed=true/run_1
	cdc.go:912,cdc.go:226,cdc.go:535,test_runner.go:716: initial scan did not complete
Repro

Artifacts: /cdc/initial-scan/rangefeed=true

make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on release-19.2@07057cf68ec19eb0777f611fedaaefde72fe673a:

The test failed on branch=release-19.2, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200111-1684749/cdc/initial-scan/rangefeed=true/run_1
	cdc.go:912,cdc.go:226,cdc.go:535,test_runner.go:716: initial scan did not complete
Repro

Artifacts: /cdc/initial-scan/rangefeed=true

make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on master@9a89b2e038d2fc67e9e155742e3e18316bf08d11:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200111-1684731/cdc/initial-scan/rangefeed=true/run_1
	cdc.go:912,cdc.go:226,cdc.go:535,test_runner.go:716: initial scan did not complete
Repro

Artifacts: /cdc/initial-scan/rangefeed=true

make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on master@07ebd0230cea54110f249420fd215adc1d2d5ecf:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20200113-1685619/cdc/initial-scan/rangefeed=true/run_1
	cdc.go:912,cdc.go:226,cdc.go:535,test_runner.go:716: initial scan did not complete
Repro

Artifacts: /cdc/initial-scan/rangefeed=true

make stressrace TESTS=cdc/initial-scan/rangefeed=true PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS='-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@danhhz
Copy link
Contributor

danhhz commented Jan 13, 2020

These changefeed roachtests all seem to have started failing around the same time. Definitely something to look into here.

@danhhz
Copy link
Contributor

danhhz commented Jan 13, 2020

I suspect these failures are fallout from the roachprod change that accidentally started running everything in roachtest as geo-distributed.

@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on master@3bb183403b4bc75c25fbcfc69d7d35de76d2b984:

		    -- arg 1: <string>
		    -- arg 2: <string>
		    -- arg 3: <string>
		  - /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1699769-1579420121-21-n4cpu16:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		    stderr:
		    
		    stdout:
		         0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		       12.0s      134            0.0            3.5      0.0      0.0      0.0      0.0 orderStatus
		       12.0s      134            0.0            0.0      0.0      0.0      0.0      0.0 payment
		       12.0s      134            0.0            3.2      0.0      0.0      0.0      0.0 stockLevel
		    E200119 08:14:23.724399 1 workload/cli/run.go:444  error in payment: dial tcp 10.128.0.122:26257: connect: connection refused
		    _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		       13.0s      136            0.0            3.4      0.0      0.0      0.0      0.0 delivery
		       13.0s      136            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		       13.0s      136            0.0            3.2      0.0      0.0      0.0      0.0 orderStatus
		       13.0s      136           18.0            1.4   9663.7   9663.7   9663.7   9663.7 payment
		       13.0s      136            0.0            3.0      0.0      0.0      0.0      0.0 stockLevel
		    E200119 08:14:24.824503 1 workload/cli/run.go:444  error in payment: dial tcp 10.128.0.122:26257: connect: connection refused
		       14.0s      153            0.0            3.1      0.0      0.0      0.0      0.0 delivery
		       14.0s      153            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		       14.0s      153            0.0            3.0      0.0      0.0      0.0      0.0 orderStatus
		       14.0s      153            0.0            1.3      0.0      0.0      0.0      0.0 payment
		       14.0s      153            0.0            2.8      0.0      0.0      0.0      0.0 stockLevel
		    E200119 08:14:25.832502 1 workload/cli/run.go:444  error in payment: dial tcp 10.128.0.122:26257: connect: connection refused
		       15.0s      203            0.0            2.9      0.0      0.0      0.0      0.0 delivery
		       15.0s      203            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		       15.0s      203            0.0            2.8      0.0      0.0      0.0      0.0 orderStatus
		       15.0s      203            0.0            1.2      0.0      0.0      0.0      0.0 payment
		       15.0s      203            0.0            2.6      0.0      0.0      0.0      0.0 stockLevel
		       16.0s      205            0.0            2.7      0.0      0.0      0.0      0.0 delivery
		       16.0s      205            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		       16.0s      205            0.0            2.6      0.0      0.0      0.0      0.0 orderStatus
		       16.0s      205           15.0            2.1  12884.9  12884.9  13421.8  13421.8 payment
		       16.0s      205            0.0            2.4      0.0      0.0      0.0      0.0 stockLevel
		    E200119 08:14:27.624206 1 workload/cli/run.go:444  error in stockLevel: dial tcp 10.128.0.122:26257: connect: connection refused
		    _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		       17.0s      218            1.0            2.6   4026.5   4026.5   4026.5   4026.5 delivery
		       17.0s      218            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		       17.0s      218            1.0            2.5  11811.2  11811.2  11811.2  11811.2 orderStatus
		       17.0s      218          180.7           12.6  13958.6  13958.6  13958.6  14495.5 payment
		       17.0s      218            1.0            2.4   1677.7   1677.7   1677.7   1677.7 stockLevel
		    E200119 08:14:28.625124 1 workload/cli/run.go:444  error in payment: dial tcp 10.128.0.122:26257: connect: connection refused
		       18.0s      239            0.0            2.5      0.0      0.0      0.0      0.0 delivery
		       18.0s      239            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		       18.0s      239            0.0            2.4      0.0      0.0      0.0      0.0 orderStatus
		       18.0s      239           70.0           15.8  15032.4  15032.4  15032.4  15032.4 payment
		       18.0s      239            0.0            2.2      0.0      0.0      0.0      0.0 stockLevel
		    E200119 08:14:29.625839 1 workload/cli/run.go:444  error in stockLevel: dial tcp 10.128.0.122:26257: connect: connection refused:
		  - context canceled

Repro

Artifacts: /cdc/initial-scan/rangefeed=true
roachdash

powered by pkg/cmd/internal/issues

@tbg tbg added the branch-master Failures and bugs on the master branch. label Jan 22, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).cdc/initial-scan/rangefeed=true failed on master@2739821b911d777fa2a927295d699b559360a802:

		    main.(*cluster).Run
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1933
		    main.(*tpccWorkload).install
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:787
		    main.cdcBasicTest
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:129
		    main.registerCDC.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:537
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:741
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - error with embedded safe details: output in %s
		    -- arg 1: <string>
		  - output in run_082822.518_n4_workload_fixtures_load_tpcc:
		  - error with attached stack trace:
		    main.execCmd
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:406
		    main.(*cluster).RunL
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2019
		    main.(*cluster).RunE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2000
		    main.(*cluster).Run
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1933
		    main.(*tpccWorkload).install
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:787
		    main.cdcBasicTest
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:129
		    main.registerCDC.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cdc.go:537
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:741
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - error with embedded safe details: %s returned:
		    stderr:
		    %s
		    stdout:
		    %s
		    -- arg 1: <string>
		    -- arg 2: <string>
		    -- arg 3: <string>
		  - /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1747374-1581581035-22-n4cpu16:4 -- ./workload fixtures load tpcc --warehouses=100 --checks=false {pgurl:2} returned:
		    stderr:
		    I200213 08:28:24.509368 1 ccl/workloadccl/cliccl/fixtures.go:279  starting restore of 9 tables
		    Error: restoring fixture: sql: expected 6 destination arguments in Scan, not 7
		    Error:  exit status 1
		    
		    stdout::
		  - exit status 1

More

Artifacts: /cdc/initial-scan/rangefeed=true

See this test on roachdash
powered by pkg/cmd/internal/issues

@irfansharif
Copy link
Contributor

restoring fixture: sql: expected 6 destination arguments in Scan, not 7

Most recent failure was fixed in #45078.

@nvanbenschoten
Copy link
Member

It looks like hasn't failed for an unexplained reason in two months. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

7 participants