-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: avoid panic on TestMergeQueue/non-collocated failure #64199
Merged
craig
merged 1 commit into
cockroachdb:master
from
nvanbenschoten:nvanbenschoten/rangeMergeTestFlake
Apr 27, 2021
Merged
kv: avoid panic on TestMergeQueue/non-collocated failure #64199
craig
merged 1 commit into
cockroachdb:master
from
nvanbenschoten:nvanbenschoten/rangeMergeTestFlake
Apr 27, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Informs cockroachdb#63009. Informs cockroachdb#64056. In cockroachdb#63009/cockroachdb#64056, we saw that this test could flake with a nil pointer panic. I don't know quite what's going on here, but when working on a patch for cockroachdb#62700, I managed to hit this panic reliably by accidentally breaking all range merges. After a bit of debugging, it became clear that we were always hitting a panic in the `reset` stage of `TestMergeQueue/sticky-bit` because the previous subtest, `TestMergeQueue/non-collocated`, was moving the RHS range to a different node, failing to merge the two range, and failing itself. This soft failure was being drowned out by the hard failure in the next subtest. This commit replaces the crash with a failure that looks something like the following when range merges are completely disabled: ``` --- FAIL: TestMergeQueue (0.34s) test_log_scope.go:73: test logs captured to: /var/folders/8k/436yf8s97cl_27vlh270yb8c0000gp/T/logTestMergeQueue627909827 test_log_scope.go:74: use -show-logs to present logs inline --- FAIL: TestMergeQueue/both-empty (0.00s) client_merge_test.go:4183: ranges unexpectedly unmerged expected startKey /Table/Max, but got "\xfa\x00\x00" --- FAIL: TestMergeQueue/lhs-undersize (0.00s) client_merge_test.go:4192: ranges unexpectedly unmerged expected startKey /Table/Max, but got "\xfa\x00\x00" --- FAIL: TestMergeQueue/combined-threshold (0.00s) client_merge_test.go:4214: ranges unexpectedly unmerged expected startKey /Table/Max, but got "\xfa\x00\x00" --- FAIL: TestMergeQueue/non-collocated (0.03s) client_merge_test.go:4236: replica doesn't exist --- FAIL: TestMergeQueue/sticky-bit (0.00s) client_merge_test.go:4243: right-hand side range not found --- FAIL: TestMergeQueue/sticky-bit-expiration (0.00s) client_merge_test.go:4268: right-hand side range not found ``` I expect that under stress on master, we will see the `TestMergeQueue/non-collocated` subtest fail.
irfansharif
approved these changes
Apr 26, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to hand it over to Aayush.
bors r+ |
bors r+ |
Build failed: |
Unrelated flake. bors r+ |
Build succeeded: |
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this pull request
May 2, 2021
Informs cockroachdb#63009. Informs cockroachdb#64056. In cockroachdb#64199, we found that the flake was likely due to the non-collocated subtest, so this commit un-skips the parent test and skips only this single subtest.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this pull request
Jul 8, 2021
Informs cockroachdb#63009. Informs cockroachdb#64056. In cockroachdb#64199, we found that the flake was likely due to the non-collocated subtest, so this commit un-skips the parent test and skips only this single subtest.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Informs #63009.
Informs #64056.
In #63009/#64056, we saw that this test could flake with a nil pointer panic.
I don't know quite what's going on here, but when working on a patch for #62700,
I managed to hit this panic reliably by accidentally breaking all range merges.
After a bit of debugging, it became clear that we were always hitting a panic in
the
reset
stage ofTestMergeQueue/sticky-bit
because the previous subtest,TestMergeQueue/non-collocated
, was moving the RHS range to a different node,failing to merge the two range, and failing itself. This soft failure was being
drowned out by the hard failure in the next subtest.
This commit replaces the crash with a failure that looks something like the
following when range merges are completely disabled:
I expect that under stress on master, we will see the
TestMergeQueue/non-collocated
subtest fail.The fact that
TestMergeQueue/non-collocated
is the test failing means that we may want to have @aayushshah15 take over this investigation, since he's made changes in that area recently. What do you two think?