-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: version/mixed/nodes=3 failed #37737
Comments
The heap profile is interesting. It shows a majority of memory (about 4GB, but this may be distorted by the MemProfileRate) spent dealing with consistency checks: @tbg you may be interested in this. Do we have any limit to the amount of memory that can be used for consistency checks at a given time? Also, this might already have been addressed by #37722. This SHA is from before that backport. |
SHA: https://github.com/cockroachdb/cockroach/commits/db98d5fb943e0a45b3878bdf042838408e9aee40 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308281&tab=buildLog
|
The above build is 19.1 did have #37668:
The symptoms look identical, though. Going to take a look soon. |
Ugh. Looks like there's also roachtest jank to fight, I thought I had fixed this? The interesting output is only in the teamcity log. It used to get printed in the |
No, we don't as far as I know. Before #37722 but with the code from #37722 in roachtest (which I assume was active here) we'd run full consistency checks (accidentally) that additionally would request diffs, i.e. would write all of the range data into a buffer that they'd then hold. |
Just ran this again on my laptop. This doesn't "just repro" unfortunately. Feels that this must be something straightforward. Will sleep on it. |
Got a "just repro" on a real cluster. Taking a look. |
In the repro, the first node is 19.1 and the others 2.1 and I'm getting failures exactly from those ranges whose leaseholders are on n1, exactly as you expect from the bug that I fixed. I must be missing something obvious.
|
Oh... I see this now. The checksum version check is basically useless because it's at propose time. n1 will ask itself to evaluate |
SHA: https://github.com/cockroachdb/cockroach/commits/db9c1217a6967fcac2d135cf0f24a4265dc76d77 Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1310296&tab=buildLog
|
The check was sitting at evaluation time, where it is useless. It needs to sit in the below-Raft code that actually computes the checksums. This flew under the radar for quite some time, but was found in: cockroachdb#37737 (comment) thanks to the aggressive consistency checks we run in version/mixed/nodes=3. Release note: None
The check was sitting at evaluation time, where it is useless. It needs to sit in the below-Raft code that actually computes the checksums. This flew under the radar for quite some time, but was found in: cockroachdb#37737 (comment) thanks to the aggressive consistency checks we run in version/mixed/nodes=3. Release note: None
See cockroachdb#37737 (comment). Release note: None
The check was sitting at evaluation time, where it is useless. It needs to sit in the below-Raft code that actually computes the checksums. This flew under the radar for quite some time, but was found in: cockroachdb#37737 (comment) thanks to the aggressive consistency checks we run in version/mixed/nodes=3. Release note: None
See cockroachdb#37737 (comment). Release note: None
SHA: https://github.com/cockroachdb/cockroach/commits/61715f0f96f519d599eec6541bbee7394d63209a Parameters: To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1312952&tab=buildLog
|
37868: storage: fix checksum version check r=ajwerner a=tbg The check was sitting at evaluation time, where it is useless. It needs to sit in the below-Raft code that actually computes the checksums. This flew under the radar for quite some time, but was found in: #37737 (comment) thanks to the aggressive consistency checks we run in version/mixed/nodes=3. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>
The check was sitting at evaluation time, where it is useless. It needs to sit in the below-Raft code that actually computes the checksums. This flew under the radar for quite some time, but was found in: cockroachdb#37737 (comment) thanks to the aggressive consistency checks we run in version/mixed/nodes=3. Release note: None
SHA: https://github.com/cockroachdb/cockroach/commits/1810a4eaa07b412b2d0899d25bb16a28a2746d48
Parameters:
To repro, try:
Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1300948&tab=buildLog
The text was updated successfully, but these errors were encountered: