Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: version/mixed/nodes=5 failed #37425

Closed
cockroach-teamcity opened this issue May 9, 2019 · 8 comments
Closed

roachtest: version/mixed/nodes=5 failed #37425

cockroach-teamcity opened this issue May 9, 2019 · 8 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/979b47cb3c6cd55d0d4c142bd97cb569a1813c2a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=version/mixed/nodes=5 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1281674&tab=buildLog

		+    raw mvcc_key/value: 03ffff00159d0f6fd74cf80b09 d6b653ea03080f12019a1a02ffff22060801100118012802
		+1557418019.162269158,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.162269158 +0000 UTC
		+    value:"(\x99\x8fr\x03\b\x0e\x12\x01\x99\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd6f5a5e609 28998f7203080e1201991a02ffff22060801100118012802
		+1557418019.152208615,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.152208615 +0000 UTC
		+    value:"{\xe4\x89\xf8\x03\b\r\x12\x01\x98\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd65c22e709 7be489f803080d1201981a02ffff22060801100118012802
		+1557418019.145018899,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.145018899 +0000 UTC
		+    value:"\xc0>\xfc\x84\x03\b\f\x12\x01\x97\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd5ee6e1309 c03efc8403080c1201971a02ffff22060801100118012802
		+1557418019.137911896,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.137911896 +0000 UTC
		+    value:"\x98\xeb-\b\x03\b\v\x12\x01\x96\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd581fc5809 98eb2d0803080b1201961a02ffff22060801100118012802
		+1557418019.125652547,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.125652547 +0000 UTC
		+    value:"f\xc4\xf1\x90\x03\b\n\x12\x01\x95\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd4c6ec4309 66c4f19003080a1201951a02ffff22060801100118012802
		+1557418019.115769273,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.115769273 +0000 UTC
		+    value:"5\xb9\xf7\x1a\x03\b\t\x12\x01\x94\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd4301db909 35b9f71a0308091201941a02ffff22060801100118012802
		+1557418019.103144318,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.103144318 +0000 UTC
		+    value:"A\xeaN\xe1\x03\b\b\x12\x01\x93\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd36f797e09 41ea4ee10308081201931a02ffff22060801100118012802
		+1557418019.096804361,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.096804361 +0000 UTC
		+    value:"-\x03\xd38\x03\b\a\x12\x01\x88\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd30ebc0909 2d03d3380308071201881a02ffff22060801100118012802
		+1557418019.089005694,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.089005694 +0000 UTC
		+    value:"\x85\xba\x88\n\x03\b\x06\x12\x04\x04tse\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd297bc7e09 85ba880a0308061204047473651a02ffff22060801100118012802
		+1557418019.074994573,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.074994573 +0000 UTC
		+    value:"G_\xd1\x01\x03\b\x05\x12\x04\x04tsd\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd1c1f18d09 475fd1010308051204047473641a02ffff22060801100118012802
		+1557418019.066234948,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.066234948 +0000 UTC
		+    value:"\x93Q\xe5m\x03\b\x04\x12\v\x04\x00liveness.\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd13c484409 9351e56d030804120b04006c6976656e6573732e1a02ffff22060801100118012802
		+1557418019.055142703,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.055142703 +0000 UTC
		+    value:"\x86%|G\x03\b\x03\x12\v\x04\x00liveness-\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fd093072f09 86257c47030803120b04006c6976656e6573732d1a02ffff22060801100118012802
		+1557418019.041255938,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.041255938 +0000 UTC
		+    value:"\u008b\xeb\x80\x03\b\x02\x12\x01\x04\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fcfbf220209 c28beb800308021201041a02ffff22060801100118012802
		+1557418019.020520939,0 /Meta2/Max
		+    ts:2019-05-09 16:06:59.020520939 +0000 UTC
		+    value:"\xe1\xe1\x1d9\x03\b\x01\x12\x00\x1a\x02\xff\xff\"\x06\b\x01\x10\x01\x18\x01(\x02"
		+    raw mvcc_key/value: 03ffff00159d0f6fce82bdeb09 e1e11d3903080112001a02ffff22060801100118012802

@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone May 9, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels May 9, 2019
@tbg
Copy link
Member

tbg commented May 11, 2019

Uh-oh. What's going on here? Looks like the test first fails because... unknown? Then the dead node detection finds that n4 is dead (log suggests it was just down when the test failed). Then we run the consistency checker as part of the test harness and find... stuff on r1. Surprisingly it prints a diff, I thought it wouldn't do that, but it's good that it did.

@tbg tbg self-assigned this May 11, 2019
@tbg
Copy link
Member

tbg commented May 11, 2019

PS this is on 19.1, so the predecessor should be 2.1

@tbg
Copy link
Member

tbg commented May 21, 2019

The test workload should have run for 2h30m according to this formula

stageDuration := 10 * time.Minute
buffer := 10 * time.Minute
if local {
t.l.Printf("local mode: speeding up test\n")
stageDuration = 10 * time.Second
buffer = time.Minute
}
loadDuration := " --duration=" + (time.Duration(3*nodes+2)*stageDuration + buffer).String()

The workload output indicates that it ran only for ~43m. The reason is that this cockroach stop command failed

16:47:58 cluster.go:275: > /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod stop teamcity-1281674-version-mixed-nodes-5:4
teamcity-1281674-version-mixed-nodes-5: stopping and waiting

It finally fails ~2m30s later:

0: exit status 255:
I190509 16:49:58.813569 1 cluster_synced.go:1529 command failed

The stop command is actually preceded by a quit command which succeeds in ~5 seconds because that's how the test tries to work around ./cockroach quit not terminating properly:

if err := c.RunE(ctx, c.Node(node), "./cockroach quit --insecure --port="+port); err != nil {
return err
}
// NB: we still call Stop to make sure the process is dead when we try
// to restart it (or we'll catch an error from the RocksDB dir being
// locked). This won't happen unless run with --local due to timing.
// However, it serves as a reminder that `./cockroach quit` doesn't yet
// work well enough -- ideally all listeners and engines are closed by
// the time it returns to the client.
//
// TODO(tschottdorf): should return an error. I doubt that we want to
// call these *testing.T-style methods on goroutines.
c.Stop(ctx, c.Node(node))

Looking at the logs for n4, we see that the server did not stop cleanly; the log output continues. However, it seems that the network listener was closed since we're seeing these grpc errors (trying to connect to itself):

W190509 16:48:41.148205 27 vendor/google.golang.org/grpc/clientconn.go:1293 grpc: addrConn.createTransport failed to connect to {teamcity-1281674-version-mixed-nodes-5-0004:26257 0 }. Err :connection error: desc = "transport: failed to write client preface: io: read/write on closed pipe". Reconnecting...

This also explains why the dead node detection fired after the: it uses lsof to get the TCP listener, which is presumably gone at this point, so it misses the fact that the process is indeed still around.

The deadlock on ./quit smells like some variant of #31692.

Ok, on to more interesting things. Looking into why there was a diff in the first place (this is a fast consistency check, which isn't supposed to even iterate over the contents of the range, so how could it produce a diff?) I found that crdb_internal.check_consistency was setting WithDiff to true, which isn't an option that makes sense with the quick check (for the reason just stated):

b.AddRawRequest(&roachpb.CheckConsistencyRequest{
RequestHeader: roachpb.RequestHeader{
Key: c.from,
EndKey: c.to,
},
Mode: c.mode,
WithDiff: true,
})

Oooh... I see what's going on here. n5 (the "failing" follower with all of the data) is running v.2.1, which doesn't know about the stats only mode. All it knows is that it's asked to compute a checksum and create a diff, which it happily does. But it's also returning zero stats because it doesn't have that field in the proto at all. The leaseholder is running v19.1 and sees zero stats, and additionally its own result didn't populate a diff because it wasn't asked to. Ergo, we see exactly the failure that presents itself here: the leaseholder thinks n5 had zero persisted stats but at the same time had lots of data that "we" didn't have.

I should've thought harder about mixed versions when I refactored this stuff. Luckily, there's a version I can bump to eliminate this problem:

tbg added a commit to tbg/cockroach that referenced this issue May 21, 2019
In cockroachdb#35861, I made changes to the consistency checksum computation that
were not backwards-compatible. When a 19.1 node asks a 2.1 node for a
fast SHA, the 2.1 node would run a full computation and return a
corresponding SHA which wouldn't match with the leaseholder's.

Bump ReplicaChecksumVersion to make sure that we don't attempt to
compare SHAs across these two releases.

Fixes cockroachdb#37425.

Release note (bug fix): Fixed a potential source of (faux) replica
inconsistencies that can be reported while running a mixed v19.1 / v2.1
cluster. This error (in that situation only) is benign and can be
resolved by upgrading to the latest v19.1 patch release. Every time this
error occurs a "checkpoint" is created which will occupy a large amount
of disk space and which needs to be removed manually (see <store
directory>/auxiliary/checkpoints).
tbg added a commit to tbg/cockroach that referenced this issue May 21, 2019
This regression tests cockroachdb#37425, which exposed an incompatibility between
v19.1 and v2.1.

`./bin/roachtest run --local version/mixed/nodes=3` ran successfully
after these changes.

I took the opportunity to address a TODO in FailOnReplicaDivergence.

Release note: None
tbg added a commit to tbg/cockroach that referenced this issue May 21, 2019
This regression tests cockroachdb#37425, which exposed an incompatibility between
v19.1 and v2.1.

`./bin/roachtest run --local version/mixed/nodes=3` ran successfully
after these changes.

I took the opportunity to address a TODO in FailOnReplicaDivergence.

Release note: None
craig bot pushed a commit that referenced this issue May 21, 2019
37668: storage: fix and test a bogus source of replica divergence errors r=nvanbenschoten a=tbg

An incompatibility in the consistency checks was introduced between v2.1 and v19.1.
See individual commit messages and #37425 for details.

Release note (bug fix): Fixed a potential source of (faux) replica
inconsistencies that can be reported while running a mixed v19.1 / v2.1
cluster. This error (in that situation only) is benign and can be
resolved by upgrading to the latest v19.1 patch release. Every time this
error occurs a "checkpoint" is created which will occupy a large amount
of disk space and which needs to be removed manually (see <store
directory>/auxiliary/checkpoints).

Release note (bug fix): Fixed a case in which `./cockroach quit` would
return success even though the server process was still running in a
severely degraded state.

37701: workloadcccl: fix two regressions in fixtures make/load r=nvanbenschoten a=danhhz

The SQL database for all the tables in the BACKUPs created by `fixtures
make` used to be "csv" (an artifact of the way we made them), but as
of #37343 it's the name of the generator. This seems better so change
`fixtures load` to match.

The same PR also (accidentally) started adding foreign keys in the
BACKUPs, but since there's one table per BACKUP (another artifact of the
way we used to make fixtures), we can't restore the foreign keys. It'd
be nice to switch to one BACKUP with all tables and get the foreign
keys, but the UX of the postLoad hook becomes tricky and I don't have
time right now to sort it all out. So, revert to the previous behavior
(no fks in fixtures) for now.

Release note: None

Co-authored-by: Tobias Schottdorf <[email protected]>
Co-authored-by: Daniel Harrison <[email protected]>
tbg added a commit to tbg/cockroach that referenced this issue May 22, 2019
In cockroachdb#35861, I made changes to the consistency checksum computation that
were not backwards-compatible. When a 19.1 node asks a 2.1 node for a
fast SHA, the 2.1 node would run a full computation and return a
corresponding SHA which wouldn't match with the leaseholder's.

Bump ReplicaChecksumVersion to make sure that we don't attempt to
compare SHAs across these two releases.

Fixes cockroachdb#37425.

Release note (bug fix): Fixed a potential source of (faux) replica
inconsistencies that can be reported while running a mixed v19.1 / v2.1
cluster. This error (in that situation only) is benign and can be
resolved by upgrading to the latest v19.1 patch release. Every time this
error occurs a "checkpoint" is created which will occupy a large amount
of disk space and which needs to be removed manually (see <store
directory>/auxiliary/checkpoints).
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1810a4eaa07b412b2d0899d25bb16a28a2746d48

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=version/mixed/nodes=5 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1300948&tab=buildLog

		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\xc0yf3\x03"
		+    raw mvcc_key/value: c78af801868d88880015a11172915a8b1f09 c079663303
		+1558546129.809476383,0 /Table/63/2/99982/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\xd2\xcc\xc9\xdd\x03"
		+    raw mvcc_key/value: c78af801868e88880015a11172915a8b1f09 d2ccc9dd03
		+1558546129.809476383,0 /Table/63/2/99983/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"jp\xae\xb8\x03"
		+    raw mvcc_key/value: c78af801868f88880015a11172915a8b1f09 6a70aeb803
		+1558546129.809476383,0 /Table/63/2/99984/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"b\n\xe9q\x03"
		+    raw mvcc_key/value: c78af801869088880015a11172915a8b1f09 620ae97103
		+1558546129.809476383,0 /Table/63/2/99985/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"ڶ\x8e\x14\x03"
		+    raw mvcc_key/value: c78af801869188880015a11172915a8b1f09 dab68e1403
		+1558546129.809476383,0 /Table/63/2/99986/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\xc8\x03!\xfa\x03"
		+    raw mvcc_key/value: c78af801869288880015a11172915a8b1f09 c80321fa03
		+1558546129.809476383,0 /Table/63/2/99987/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"p\xbfF\x9f\x03"
		+    raw mvcc_key/value: c78af801869388880015a11172915a8b1f09 70bf469f03
		+1558546129.809476383,0 /Table/63/2/99988/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\xedh~&\x03"
		+    raw mvcc_key/value: c78af801869488880015a11172915a8b1f09 ed687e2603
		+1558546129.809476383,0 /Table/63/2/99989/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"U\xd4\x19C\x03"
		+    raw mvcc_key/value: c78af801869588880015a11172915a8b1f09 55d4194303
		+1558546129.809476383,0 /Table/63/2/99990/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"Ga\xb6\xad\x03"
		+    raw mvcc_key/value: c78af801869688880015a11172915a8b1f09 4761b6ad03
		+1558546129.809476383,0 /Table/63/2/99991/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\xff\xdd\xd1\xc8\x03"
		+    raw mvcc_key/value: c78af801869788880015a11172915a8b1f09 ffddd1c803
		+1558546129.809476383,0 /Table/63/2/99992/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\xa7\xbe\xc1\x9e\x03"
		+    raw mvcc_key/value: c78af801869888880015a11172915a8b1f09 a7bec19e03
		+1558546129.809476383,0 /Table/63/2/99993/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\x1f\x02\xa6\xfb\x03"
		+    raw mvcc_key/value: c78af801869988880015a11172915a8b1f09 1f02a6fb03
		+1558546129.809476383,0 /Table/63/2/99994/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\r\xb7\t\x15\x03"
		+    raw mvcc_key/value: c78af801869a88880015a11172915a8b1f09 0db7091503
		+1558546129.809476383,0 /Table/63/2/99995/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\xb5\vnp\x03"
		+    raw mvcc_key/value: c78af801869b88880015a11172915a8b1f09 b50b6e7003
		+1558546129.809476383,0 /Table/63/2/99996/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"(\xdcV\xc9\x03"
		+    raw mvcc_key/value: c78af801869c88880015a11172915a8b1f09 28dc56c903
		+1558546129.809476383,0 /Table/63/2/99997/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\x90`1\xac\x03"
		+    raw mvcc_key/value: c78af801869d88880015a11172915a8b1f09 906031ac03
		+1558546129.809476383,0 /Table/63/2/99998/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\x82՞B\x03"
		+    raw mvcc_key/value: c78af801869e88880015a11172915a8b1f09 82d59e4203
		+1558546129.809476383,0 /Table/63/2/99999/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:":i\xf9'\x03"
		+    raw mvcc_key/value: c78af801869f88880015a11172915a8b1f09 3a69f92703
		+1558546129.809476383,0 /Table/63/2/100000/0/0
		+    ts:2019-05-22 17:28:49.809476383 +0000 UTC
		+    value:"\x92!\x11\xd0\x03"
		+    raw mvcc_key/value: c78af80186a088880015a11172915a8b1f09 922111d003
		
	cluster.go:1112,asm_amd64.s:522,panic.go:397,test.go:788,test.go:774,cluster.go:1869,version.go:230,version.go:243,test.go:1251: read tcp 172.17.0.2:57200->34.73.92.72:26257: read: connection reset by peer

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/699f675c73f8420802f92e46f65e6dce52abc12f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=version/mixed/nodes=5 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1306268&tab=buildLog

		
		r29 (/Table/58) is inconsistent: RANGE_INCONSISTENT stats: {ContainsEstimates:false LastUpdateNanos:1558716424951934352 IntentAge:0 GCBytesAge:23573669093 LiveBytes:19253733 LiveCount:60000 KeyBytes:2975785 KeyCount:60000 ValBytes:55960736 ValCount:128602 IntentBytes:0 IntentCount:0 SysBytes:797 SysCount:10 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n5,s5):5 is inconsistent: expected checksum 28c6623d326cf5cef9557a15b9c2c0b3eff7abc4f4228fe4792e196089ec4fa153126b7c9215b2e4c48e2769d930189501a5c38d100106a2c6c4046122f3e13d, got 5fafdc694bd3874a8106d8648359e4475294649430e24be2d163f5852cd2f144fb6ae9ceae9e9a00312e7883be1a8191e56d7b8b0716de3236a8f13b1e851edc
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1558716424951934352 IntentAge:0 GCBytesAge:23573669093 LiveBytes:19253733 LiveCount:60000 KeyBytes:2975785 KeyCount:60000 ValBytes:55960736 ValCount:128602 IntentBytes:0 IntentCount:0 SysBytes:797 SysCount:10 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n3,s3):3 is inconsistent: expected checksum 28c6623d326cf5cef9557a15b9c2c0b3eff7abc4f4228fe4792e196089ec4fa153126b7c9215b2e4c48e2769d930189501a5c38d100106a2c6c4046122f3e13d, got 5fafdc694bd3874a8106d8648359e4475294649430e24be2d163f5852cd2f144fb6ae9ceae9e9a00312e7883be1a8191e56d7b8b0716de3236a8f13b1e851edc
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1558716424951934352 IntentAge:0 GCBytesAge:23573669093 LiveBytes:19253733 LiveCount:60000 KeyBytes:2975785 KeyCount:60000 ValBytes:55960736 ValCount:128602 IntentBytes:0 IntentCount:0 SysBytes:797 SysCount:10 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		
		r32 (/Table/61) is inconsistent: RANGE_INCONSISTENT stats: {ContainsEstimates:false LastUpdateNanos:1558715247630578679 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:472 SysCount:7 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n3,s3):3 is inconsistent: expected checksum bf2e5a720c5fe8ba6d78954b6694add9d9549f666fb861ed261997178ca4368624075c86feac7fa4400c86ff9af05b75a90c2c505629f24a4bbef68c1aaa1628, got 323f268ef631d04bf75adc78161f337fbe0edaf71ac67d86bcbda990b3b8bc34ab8c4f2ddcc38ad8c4b1a6c5c48616558880f88ad0c083bc51fe362415582a4b
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1558715247630578679 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:472 SysCount:7 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n2,s2):2 is inconsistent: expected checksum bf2e5a720c5fe8ba6d78954b6694add9d9549f666fb861ed261997178ca4368624075c86feac7fa4400c86ff9af05b75a90c2c505629f24a4bbef68c1aaa1628, got 323f268ef631d04bf75adc78161f337fbe0edaf71ac67d86bcbda990b3b8bc34ab8c4f2ddcc38ad8c4b1a6c5c48616558880f88ad0c083bc51fe362415582a4b
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1558715247630578679 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:472 SysCount:7 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/db98d5fb943e0a45b3878bdf042838408e9aee40

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=version/mixed/nodes=5 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308281&tab=buildLog

		
		r29 (/Table/58) is inconsistent: RANGE_INCONSISTENT stats: {ContainsEstimates:false LastUpdateNanos:1558799793808859066 IntentAge:0 GCBytesAge:0 LiveBytes:11507435 LiveCount:195039 KeyBytes:7950614 KeyCount:195039 ValBytes:3556821 ValCount:195039 IntentBytes:0 IntentCount:0 SysBytes:577 SysCount:8 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n5,s5):4 is inconsistent: expected checksum 856b04198fab48c276d63c27e206ec4c98c9022372dd21352938af8002d21d1d91e29236b532b110af41865dde7a10802938d825f6a8ab2693b354051885db1f, got e2f0d45bc5d1369f73551c60249f6866507ae6806eb4952b023c2a40ab1d06fa0f6209eb54fcadf795226d5117972cbe327d8c2693fb0e35ebb209a9b05bc4ff
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1558799793808859066 IntentAge:0 GCBytesAge:0 LiveBytes:11507435 LiveCount:195039 KeyBytes:7950614 KeyCount:195039 ValBytes:3556821 ValCount:195039 IntentBytes:0 IntentCount:0 SysBytes:577 SysCount:8 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n2,s2):2 is inconsistent: expected checksum 856b04198fab48c276d63c27e206ec4c98c9022372dd21352938af8002d21d1d91e29236b532b110af41865dde7a10802938d825f6a8ab2693b354051885db1f, got e2f0d45bc5d1369f73551c60249f6866507ae6806eb4952b023c2a40ab1d06fa0f6209eb54fcadf795226d5117972cbe327d8c2693fb0e35ebb209a9b05bc4ff
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1558799793808859066 IntentAge:0 GCBytesAge:0 LiveBytes:11507435 LiveCount:195039 KeyBytes:7950614 KeyCount:195039 ValBytes:3556821 ValCount:195039 IntentBytes:0 IntentCount:0 SysBytes:577 SysCount:8 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		
		r30 (/Table/59) is inconsistent: RANGE_INCONSISTENT stats: {ContainsEstimates:false LastUpdateNanos:1558799793755342267 IntentAge:0 GCBytesAge:645434033 LiveBytes:4564481 LiveCount:129382 KeyBytes:3320787 KeyCount:129382 ValBytes:2328677 ValCount:164392 IntentBytes:0 IntentCount:0 SysBytes:4028 SysCount:12 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n2,s2):2 is inconsistent: expected checksum 9678053f69c1d97221e0b5a98f09a5ece305cf1992d48a8dbc1848754decef154695f8b7510cf97346ebed259f7611b6b416184c9488501a09df3ccc7a5f6a37, got bde8e476ce7050efbf0ef104f27dd21516f75da9a4886db76b1fa5fa130d9ccee6d5b1ffe79556b79561ef347af340301fdd3b3030585b5e845ca3e1d8880a2d
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1558799793755342267 IntentAge:0 GCBytesAge:645434033 LiveBytes:4564481 LiveCount:129382 KeyBytes:3320787 KeyCount:129382 ValBytes:2328677 ValCount:164392 IntentBytes:0 IntentCount:0 SysBytes:4028 SysCount:12 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n4,s4):4 is inconsistent: expected checksum 9678053f69c1d97221e0b5a98f09a5ece305cf1992d48a8dbc1848754decef154695f8b7510cf97346ebed259f7611b6b416184c9488501a09df3ccc7a5f6a37, got bde8e476ce7050efbf0ef104f27dd21516f75da9a4886db76b1fa5fa130d9ccee6d5b1ffe79556b79561ef347af340301fdd3b3030585b5e845ca3e1d8880a2d
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1558799793755342267 IntentAge:0 GCBytesAge:645434033 LiveBytes:4564481 LiveCount:129382 KeyBytes:3320787 KeyCount:129382 ValBytes:2328677 ValCount:164392 IntentBytes:0 IntentCount:0 SysBytes:4028 SysCount:12 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/61715f0f96f519d599eec6541bbee7394d63209a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=version/mixed/nodes=5 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1312952&tab=buildLog

		
		r33 (/Table/62) is inconsistent: RANGE_INCONSISTENT stats: {ContainsEstimates:false LastUpdateNanos:1559152157980092871 IntentAge:0 GCBytesAge:763390360 LiveBytes:225693 LiveCount:8678 KeyBytes:1298958 KeyCount:42518 ValBytes:213345 ValCount:76358 IntentBytes:137 IntentCount:11 SysBytes:726 SysCount:10 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n4,s4):4 is inconsistent: expected checksum bc60c527e3ca86ccbc6a7ecd4f84b018a2f65a4cfa8485faa9afc66b3639ba55315c56b2f932cbf57ba3373d67fdbcbe639d029b249e2558bc8d62470ef4bab1, got 4444d67bc9c691af1b6c25963661d12a84fd21a680c42000be62421ebbce95b0c31a3b7d6a526b1a8351658f254d17dbe93a7bae0acb777f4611374e2a9cb5ce
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1559152157980092871 IntentAge:0 GCBytesAge:763390360 LiveBytes:225693 LiveCount:8678 KeyBytes:1298958 KeyCount:42518 ValBytes:213345 ValCount:76358 IntentBytes:137 IntentCount:11 SysBytes:726 SysCount:10 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n2,s2):2 is inconsistent: expected checksum bc60c527e3ca86ccbc6a7ecd4f84b018a2f65a4cfa8485faa9afc66b3639ba55315c56b2f932cbf57ba3373d67fdbcbe639d029b249e2558bc8d62470ef4bab1, got 4444d67bc9c691af1b6c25963661d12a84fd21a680c42000be62421ebbce95b0c31a3b7d6a526b1a8351658f254d17dbe93a7bae0acb777f4611374e2a9cb5ce
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1559152157980092871 IntentAge:0 GCBytesAge:763390360 LiveBytes:225693 LiveCount:8678 KeyBytes:1298958 KeyCount:42518 ValBytes:213345 ValCount:76358 IntentBytes:137 IntentCount:11 SysBytes:726 SysCount:10 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		
		r35 (/Table/64) is inconsistent: RANGE_INCONSISTENT stats: {ContainsEstimates:false LastUpdateNanos:1559152157925190517 IntentAge:0 GCBytesAge:16891920984 LiveBytes:8429986 LiveCount:25278 KeyBytes:1552616 KeyCount:25278 ValBytes:35311334 ValCount:112563 IntentBytes:0 IntentCount:0 SysBytes:1615 SysCount:21 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n4,s4):4 is inconsistent: expected checksum 81d4f81cd697da8e07cf1c374bb8fe957ecf0123c1ce96e247de29f66c9cb481f37c69e95469f9b78efe2f1fa50c4ab36c763181e27fb9f5718e9e8c9e96de10, got 478c69674815b1b014330e586ef941abed7179897672da82d58406ce219bbd63a4b298e64235abd7c2899aae75f3185b5d7018cbf1adc78a9b506b990f1150df
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1559152157925190517 IntentAge:0 GCBytesAge:16891920984 LiveBytes:8429986 LiveCount:25278 KeyBytes:1552616 KeyCount:25278 ValBytes:35311334 ValCount:112563 IntentBytes:0 IntentCount:0 SysBytes:1615 SysCount:21 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		replica (n3,s3):3 is inconsistent: expected checksum 81d4f81cd697da8e07cf1c374bb8fe957ecf0123c1ce96e247de29f66c9cb481f37c69e95469f9b78efe2f1fa50c4ab36c763181e27fb9f5718e9e8c9e96de10, got 478c69674815b1b014330e586ef941abed7179897672da82d58406ce219bbd63a4b298e64235abd7c2899aae75f3185b5d7018cbf1adc78a9b506b990f1150df
		persisted stats: exp {ContainsEstimates:false LastUpdateNanos:1559152157925190517 IntentAge:0 GCBytesAge:16891920984 LiveBytes:8429986 LiveCount:25278 KeyBytes:1552616 KeyCount:25278 ValBytes:35311334 ValCount:112563 IntentBytes:0 IntentCount:0 SysBytes:1615 SysCount:21 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}, got {ContainsEstimates:false LastUpdateNanos:0 IntentAge:0 GCBytesAge:0 LiveBytes:0 LiveCount:0 KeyBytes:0 KeyCount:0 ValBytes:0 ValCount:0 IntentBytes:0 IntentCount:0 SysBytes:0 SysCount:0 XXX_NoUnkeyedLiteral:{} XXX_sizecache:0}
		

@tbg
Copy link
Member

tbg commented May 30, 2019

#37868

@tbg tbg closed this as completed May 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

2 participants