-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: promote expiration-based lease to epoch without sequence number change #117840
kv: promote expiration-based lease to epoch without sequence number change #117840
Conversation
Test changes to just the leaseholder or just the start time, instead of both at the same time. This ensures that either lead to a sequence number change. Epic: None Release note: None
This commit is a small refactor so that we can eliminate the second call to Lease.Equivalent in RequestLease. There is already one in evalNewLease which is shared between RequestLease and TransferLease. Epic: None Release note: None
…hange Fixes cockroachdb#117630. Fixes cockroachdb#90656. Fixes cockroachdb#98553. Informs cockroachdb#61986. Informs cockroachdb#115191. This commit updates the post-lease transfer promotion of expiration-based leases to epoch-based leases to not change the sequence number of the lease. This avoids invalidating all requests proposed under the original expiration-based lease, which can lead to `RETRY_ASYNC_WRITE_FAILURE` errors. The change accomplishes this by updating the `Lease.Equivalent` method to consider an expiration-based lease to be equivalent to an epoch-based lease that is held by the same replica and has the same start time. Doing so requires some care, because lease equivalency is checked below Raft and needs to remain deterministic across binary versions. This change requires a cluster version check, so it cannot be backported. Release note (bug fix): Improved an interaction during range lease transfers which could previously cause `RETRY_ASYNC_WRITE_FAILURE` errors to be returned to clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change requires a cluster version check, so it cannot be backported.
That's too bad. We should document for support purposes that these can also be avoided by setting kv.transfer_expiration_leases_first.enabled = false
, with caveats about the unavailability risk.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 10 of 10 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/kvserverpb/proposer_kv.proto
line 140 at r3 (raw file):
// - in v24.2, we can set this field to true unconditionally for lease request // proposals. It must still be consulted during lease application, because // the raft proposal may have been performed by an older node.
It's annoying that we need 4 releases to migrate this, but it seems prudent. I was thinking of ways to get it down to 3, but it's riskier and there's no real rush I guess.
pkg/roachpb/data.go
line 1885 at r3 (raw file):
// NB: Lease.Equivalent is NOT symmetric. For expiration-based // leases, a lease is equivalent to another with an equal or // later expiration, but not an earlier expiration.
nit: should we spell out the expiration->epoch case here, which is asymmetric as well?
pkg/roachpb/data.go
line 1939 at r3 (raw file):
// For expiration-based leases, extensions are considered equivalent. // This is one case where Equivalent is not commutative and, as such,
nit: ditto this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change requires a cluster version check, so it cannot be backported.
That's too bad. We should document for support purposes that these can also be avoided by setting
kv.transfer_expiration_leases_first.enabled = false
, with caveats about the unavailability risk.
We put #116092 in place in earlier releases to eliminate the risk of these RETRY_ASYNC_WRITE_FAILURE
retries for COPY. Beyond that specific use case, I'm not sure how much more prescriptive we can be towards support. kv.transfer_expiration_leases_first.enabled
is an important protection and we don't want users switching that to false indefinitely, so I don't think we want to document that in a runbook as the "solution" to RETRY_ASYNC_WRITE_FAILURE
errors.
Still, I think there are cases where this setting can temporarily help and it's better to keep support in the loop, so I wrote up https://cockroachlabs.atlassian.net/wiki/spaces/CKB/pages/3332505662/RETRY+ASYNC+WRITE+FAILURE+during+lease+transfers about this.
TFTR!
bors r=erikgrinaker
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/roachpb/data.go
line 1885 at r3 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
nit: should we spell out the expiration->epoch case here, which is asymmetric as well?
Done.
pkg/roachpb/data.go
line 1939 at r3 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
nit: ditto this comment.
Done.
Build succeeded: |
I had missed a `git push` immediately before merging. This updates two comments. Epic: None Release note: None
117969: batcheval: add `PushTxnResponse.AmbiguousAbort` r=erikgrinaker a=erikgrinaker Extracted from #117612. --- This indicates to the caller that the `ABORTED` status of the pushed transaction is ambiguous, and the transaction may in fact have been committed and GCed already. This information is also plumbed through the `IntentResolver` txn push APIs. Touches #104309. Epic: none Release note: None 117992: roachpb: address review comments from #117840 r=nvanbenschoten a=nvanbenschoten I had missed a `git push` immediately before merging #117840. This updates two comments. Epic: None Release note: None Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
Fixes #117630.
Fixes #90656.
Fixes #98553.
Informs #61986.
Informs #115191.
This commit updates the post-lease transfer promotion of expiration-based leases to epoch-based leases to not change the sequence number of the lease. This avoids invalidating all requests proposed under the original expiration-based lease, which can lead to
RETRY_ASYNC_WRITE_FAILURE
errors.The change accomplishes this by updating the
Lease.Equivalent
method to consider an expiration-based lease to be equivalent to an epoch-based lease that is held by the same replica and has the same start time. Doing so requires some care, because lease equivalency is checked below Raft and needs to remain deterministic across binary versions.This change requires a cluster version check, so it cannot be backported.
Release note (bug fix): Improved an interaction during range lease transfers which could previously cause
RETRY_ASYNC_WRITE_FAILURE
errors to be returned to clients.