-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stability: unexpected panic: llrb: inverted range #6027
Comments
just happened to three nodes on the gamma cluster:
|
@nvanbenschoten, would you mind taking a peek? |
This has hit me several times, so let me know if you need help reproducing. On Thu, Apr 14, 2016 at 5:15 AM Tobias Schottdorf [email protected]
|
I think Nathan should have an idea pretty quickly, or at least be able to On Thu, Apr 14, 2016 at 6:52 AM Cuong Do [email protected] wrote:
-- Tobias |
Trivially reproducible on the gamma cluster.
Or with
From a modified
Could it just be a matter of using the "pretty" value? |
No, the keys are actually inverted ( |
Interesting. I added the following recover and logging:
Which dumps:
And the full log, including dump of all keys/descs. |
I'm pretty sure the offending descriptor is simply an empty zero value; that would make According to the stack trace, this is happening as a result of a RangeKeyMismatchError. I think that a request is being directed to an uninitialized replica, and it makes it as far as If this theory is correct then this patch should make the error go away:
|
Sure, I'll give it a shot. |
so far so good. it usually takes less time than this to crash. Thanks Ben. |
Cool. I'll work on a test and figure out the right place for this check (I think it should have been caught earlier than |
👍 On Fri, Apr 15, 2016 at 12:54 PM Ben Darnell [email protected]
-- Tobias |
The store now returns NotLeaderError instead of a malformed RangeKeyMismatchError when a request is addressed to an uninitialized replica. The replica that caused the creation of the uninitialized replica is used as our best guess for the current leader. Guard against regressions by checking the validity of the descriptor passed to NewRangeKeyMismatchError. Also add an admittedly complex regression test. Fixes cockroachdb#6027
The store now returns NotLeaderError instead of a malformed RangeKeyMismatchError when a request is addressed to an uninitialized replica. The replica that caused the creation of the uninitialized replica is used as our best guess for the current leader. Guard against regressions by checking the validity of the descriptor passed to NewRangeKeyMismatchError. Also add an admittedly complex regression test. Fixes cockroachdb#6027
Build sha: 08d6640
Beta cluster has been having issues, so some of them may be the direct cause of this (eg: #5998, #6000, #6020)
I just restarted node 4 (
ec2-52-91-234-29.compute-1.amazonaws.com
) with a recent sha (08d6640). About 15s later, node 3 (ec2-54-209-150-36.compute-1.amazonaws.com
) crashed.Node 4 shows the following:
Node 3 died with:
Node 3 log:
node3.log.parse.txt.gz
Node 4 log:
node4.log.parse.txt
I'll see if I can add debugging here and try again.
The text was updated successfully, but these errors were encountered: