-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Don't swallow ErrEpochAlreadyIncremented #36942
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While looking at this issue I was surprised how subtle the guarantee that our status.Expiration
becomes the base of the liveness CPut
is. Can you think of any assertions we could add to make sure this is true?
At the very least, leave a comment on requestLeaseAsync
to highlight the role of the status
(I don't think we use much of it, so it could be tempting to refactor things there, introducing bugs by accident)
Reviewed 3 of 3 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @bdarnell)
pkg/storage/replica_range_lease.go, line 311 at r1 (raw file):
The explanation of why we can't just go ahead is awkwardly broken up over these two paragraphs. Consider editing from this line on (and giving another pass for clarity to my text):
without having its epoch incremented, i.e. that it was in fact live at status.Timestamp. It would be incorrect to simply proceed to sending our lease request since our lease.Start may precede the start timestamp of the predecessor lease (i.e. likely that put into place by whoever beat us to the epoch increment), and so under this lease this node's timestamp cache would not necessarily reflect all reads served by the other leaseholder.
...
ErrEpochAlreadyIncremented is not an unusual error to observe, so we don't log it.
CI failed because of a known flake (now skipped): #36948 |
IncrementEpoch was failing to distinguish between the request that caused the increment and another caller making the same increment. This is incorrect since a successful increment tells you *when* the node was confirmed dead, while relying on another node's increment leaves this uncertain. In rare cases involving a badly overloaded cluster, this could result in inconsistencies (non-serializable transactions) due to incorrect timestamp cache management. Fixes cockroachdb#35986 Release note (bug fix): Fix a rare inconsistency that could occur on badly overloaded clusters.
473f106
to
8de884a
Compare
I've updated the new comment, using part of your new text (minus the parenthetical about the node who completed the IncrementEpoch having a lease - if that node managed to get a lease, our lease would fail because PrevLease wouldn't match. The problematic case is when the node whose epoch we're trying to increment still has the lease, and the node that wins the race to perform the IncrementEpoch loses the race to RequestLease). I'm not sure what specifically you're looking for in a requestLeaseAsync comment, but later I'm going to do another pass over this file while it's fresh in my mind to document everything I can. |
bors r=tbg |
36942: storage: Don't swallow ErrEpochAlreadyIncremented r=tbg a=bdarnell IncrementEpoch was failing to distinguish between the request that caused the increment and another caller making the same increment. This is incorrect since a successful increment tells you *when* the node was confirmed dead, while relying on another node's increment leaves this uncertain. In rare cases involving a badly overloaded cluster, this could result in inconsistencies (non-serializable transactions) due to incorrect timestamp cache management. Fixes #35986 Release note (bug fix): Fix a rare inconsistency that could occur on badly overloaded clusters. Co-authored-by: Ben Darnell <[email protected]>
Build succeeded |
Comment looks really good now, thanks! |
IncrementEpoch was failing to distinguish between the request that
caused the increment and another caller making the same increment.
This is incorrect since a successful increment tells you when the
node was confirmed dead, while relying on another node's increment
leaves this uncertain.
In rare cases involving a badly overloaded cluster, this could result
in inconsistencies (non-serializable transactions) due to incorrect
timestamp cache management.
Fixes #35986
Release note (bug fix): Fix a rare inconsistency that could occur on
badly overloaded clusters.