-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: post-lease transfer promotion from expiration-based lease to epoch-based lease causes failed writes #117630
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
A-read-committed
Related to the introduction of Read Committed
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
P-2
Issues/test failures with a fix SLA of 3 months
T-kv
KV Team
Comments
nvanbenschoten
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv
Anything in KV that doesn't belong in a more specific category.
T-kv
KV Team
labels
Jan 10, 2024
3 tasks
nvanbenschoten
added
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
P-2
Issues/test failures with a fix SLA of 3 months
A-read-committed
Related to the introduction of Read Committed
labels
Jan 16, 2024
This is related to #61986, because without this change, read summaries shipped during the lease transfer will be invalidated by the lease sequence increment during the expiration-based lease to epoch-based lease promotion. |
craig bot
pushed a commit
that referenced
this issue
Jan 19, 2024
117840: kv: promote expiration-based lease to epoch without sequence number change r=erikgrinaker a=nvanbenschoten Fixes #117630. Fixes #90656. Fixes #98553. Informs #61986. Informs #115191. This commit updates the post-lease transfer promotion of expiration-based leases to epoch-based leases to not change the sequence number of the lease. This avoids invalidating all requests proposed under the original expiration-based lease, which can lead to `RETRY_ASYNC_WRITE_FAILURE` errors. The change accomplishes this by updating the `Lease.Equivalent` method to consider an expiration-based lease to be equivalent to an epoch-based lease that is held by the same replica and has the same start time. Doing so requires some care, because lease equivalency is checked below Raft and needs to remain deterministic across binary versions. This change requires a cluster version check, so it cannot be backported. Release note (bug fix): Improved an interaction during range lease transfers which could previously cause `RETRY_ASYNC_WRITE_FAILURE` errors to be returned to clients. 117899: backupccl: skip `TestBackupRestoreAppend` under `deadlock` r=rail a=rickystewart These tests are likely to time out. Epic: CRDB-8308 Release note: None 117940: backupccl,sql: skip a couple more tests under duress r=rail a=rickystewart These tests are all timing out. For the failures that seem suspect in some way, I have filed GitHub issues. Epic: CRDB-8308 Release note: None 117950: copy: skip TestCopyFromRetries for now r=yuzefovich a=yuzefovich We recently expanded this test and it became flaky. Skip it until we stabilize it. Informs: #117912. Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jan 19, 2024
Closes cockroachdb#115191. Depends on cockroachdb#61986. This commit switches the two nightly Read Committed variants of the TPC-C roachtest to run without transaction retry loops, using the `--txn-retries` flag introduced in cockroachdb#117096. With cockroachdb#117630 and cockroachdb#61986 resolved (the latter of which is still in review and under development), these tests both pass. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jan 24, 2024
Closes cockroachdb#115191. Depends on cockroachdb#61986. This commit switches the two nightly Read Committed variants of the TPC-C roachtest to run without transaction retry loops, using the `--txn-retries` flag introduced in cockroachdb#117096. With cockroachdb#117630 and cockroachdb#61986 resolved (the latter of which is still in review and under development), these tests both pass. Release note: None
craig bot
pushed a commit
that referenced
this issue
Feb 2, 2024
118000: roachtest: run Read Committed variants of TPC-C without txn retry loops r=nvanbenschoten a=nvanbenschoten Closes #115191. Depends on #61986. This commit switches the two nightly Read Committed variants of the TPC-C roachtest to run without transaction retry loops, using the `--txn-retries` flag introduced in #117096. With #117630 and #61986 resolved (the latter of which is still in review and under development), these tests both pass. Release note: None 118600: gceworker bootstrap: fix checksum for go download r=rickystewart a=jlinder On the last go version upgrade to cockroach, the wrong checksum was entered in the gceworker bootstrap script for the downloaded tar file. This fixes it to be the correct checksum. Epic: none Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: James H. Linder <[email protected]>
wenyihu6
pushed a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 21, 2024
Closes cockroachdb#115191. Depends on cockroachdb#61986. This commit switches the two nightly Read Committed variants of the TPC-C roachtest to run without transaction retry loops, using the `--txn-retries` flag introduced in cockroachdb#117096. With cockroachdb#117630 and cockroachdb#61986 resolved (the latter of which is still in review and under development), these tests both pass. Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv
Anything in KV that doesn't belong in a more specific category.
A-read-committed
Related to the introduction of Read Committed
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
P-2
Issues/test failures with a fix SLA of 3 months
T-kv
KV Team
In fc2d180, we adjusted the lease transfer protocol to pass through an expiration-based lease before being upgraded to an expiration-based lease. This works, but it means that the lease sequence advances twice during a lease transfer. As a result, writes proposed under the expiration-based lease can be re-ordered with the lease promotion and will fail below raft.
This can lead to
RETRY_ASYNC_WRITE_FAILURE
errors and may be the underlying cause of #90656 and #98553.We should fix this, ideally by promoting from an expiration-based lease to an epoch-based lease without advancing the lease sequence number.
Jira issue: CRDB-35282
The text was updated successfully, but these errors were encountered: