-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: use tscache summary to eliminate txn retries due to lease transfers and range merges #61986
Labels
A-kv-distribution
Relating to rebalancing and leasing.
A-kv-transactions
Relating to MVCC and the transactional model.
A-read-committed
Related to the introduction of Read Committed
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
P-3
Issues/test failures with no fix SLA
T-kv
KV Team
Comments
nvanbenschoten
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv-distribution
Relating to rebalancing and leasing.
A-kv-transactions
Relating to MVCC and the transactional model.
labels
Mar 14, 2021
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Mar 26, 2021
Fixes cockroachdb#60929. Relates to cockroachdb#61986. Relates to cockroachdb#61989. This commit fixes a closed timestamp violation that could allow a value/intent write at a timestamp below a range's closed timestamp. This could allow for serializability violations if it allowed a follower read to miss a write and could lead to a panic in the rangefeed processor if a rangefeed was watching at the right time, as we saw in cockroachdb#60929. In cockroachdb#60929, we found that this bug was caused by a range merge and a lease transfer racing in such a way that the closed timestamp could later be violated by a write to the subsumed portion of the joint range. The root cause of this was an opportunistic optimization made in 7037b54 to consider a range's lease start time as an input to its closed timestamp computation. This optimization did not account for the possibility of serving writes to a newly subsumed keyspace below a range's lease start time if that keyspace was merged into a range under its current lease and with a freeze time below the current lease start time. This bug is fixed by removing the optimization, which was on its way out to allow for cockroachdb#61986 anyway. Note that removing this optimization does not break `TestClosedTimestampCanServeThroughoutLeaseTransfer`, because the v2 closed timestamp system does not allow for closed timestamp regressions, even across leaseholders. This was one of the many benefits of the new system.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Mar 30, 2021
Fixes cockroachdb#60929. Relates to cockroachdb#61986. Relates to cockroachdb#61989. This commit fixes a closed timestamp violation that could allow a value/intent write at a timestamp below a range's closed timestamp. This could allow for serializability violations if it allowed a follower read to miss a write and could lead to a panic in the rangefeed processor if a rangefeed was watching at the right time, as we saw in cockroachdb#60929. In cockroachdb#60929, we found that this bug was caused by a range merge and a lease transfer racing in such a way that the closed timestamp could later be violated by a write to the subsumed portion of the joint range. The root cause of this was an opportunistic optimization made in 7037b54 to consider a range's lease start time as an input to its closed timestamp computation. This optimization did not account for the possibility of serving writes to a newly subsumed keyspace below a range's lease start time if that keyspace was merged into a range under its current lease and with a freeze time below the current lease start time. This bug is fixed by removing the optimization, which was on its way out to allow for cockroachdb#61986 anyway. Note that removing this optimization does not break `TestClosedTimestampCanServeThroughoutLeaseTransfer`, because the v2 closed timestamp system does not allow for closed timestamp regressions, even across leaseholders. This was one of the many benefits of the new system.
craig bot
pushed a commit
that referenced
this issue
Mar 30, 2021
62570: kv: don't consider lease start time as closed timestamp r=nvanbenschoten a=nvanbenschoten Fixes #60929. Relates to #61986. Relates to #61989. This commit fixes a closed timestamp violation that could allow a value/intent write at a timestamp below a range's closed timestamp. This could allow for serializability violations if it allowed a follower read to miss a write and could lead to a panic in the rangefeed processor if a rangefeed was watching at the right time, as we saw in #60929. In #60929, we found that this bug was caused by a range merge and a lease transfer racing in such a way that the closed timestamp could later be violated by a write to the subsumed portion of the joint range. The root cause of this was an opportunistic optimization made in 7037b54 to consider a range's lease start time as an input to its closed timestamp computation. This optimization did not account for the possibility of serving writes to a newly subsumed keyspace below a range's lease start time if that keyspace was merged into a range under its current lease and with a freeze time below the current lease start time. This bug is fixed by removing the optimization, which was on its way out to allow for #61986 anyway. Note that removing this optimization does not break `TestClosedTimestampCanServeThroughoutLeaseTransfer`, because the v2 closed timestamp system does not allow for closed timestamp regressions, even across leaseholders. This was one of the many benefits of the new system. Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Mar 30, 2021
Fixes cockroachdb#60929. Relates to cockroachdb#61986. Relates to cockroachdb#61989. This commit fixes a closed timestamp violation that could allow a value/intent write at a timestamp below a range's closed timestamp. This could allow for serializability violations if it allowed a follower read to miss a write and could lead to a panic in the rangefeed processor if a rangefeed was watching at the right time, as we saw in cockroachdb#60929. In cockroachdb#60929, we found that this bug was caused by a range merge and a lease transfer racing in such a way that the closed timestamp could later be violated by a write to the subsumed portion of the joint range. The root cause of this was an opportunistic optimization made in 7037b54 to consider a range's lease start time as an input to its closed timestamp computation. This optimization did not account for the possibility of serving writes to a newly subsumed keyspace below a range's lease start time if that keyspace was merged into a range under its current lease and with a freeze time below the current lease start time. This bug is fixed by removing the optimization, which was on its way out to allow for cockroachdb#61986 anyway. Note that removing this optimization does not break `TestClosedTimestampCanServeThroughoutLeaseTransfer`, because the v2 closed timestamp system does not allow for closed timestamp regressions, even across leaseholders. This was one of the many benefits of the new system.
This will help with |
Yes, this should eliminate the majority of those errors after cooperative lease transfers. |
nvanbenschoten
added
the
A-read-committed
Related to the introduction of Read Committed
label
Jun 5, 2023
andrewbaptist
added
the
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
label
Dec 18, 2023
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Dec 20, 2023
Informs cockroachdb#61986. Now that the use of ReadSummaries has been fully phased in, we don't need this logic. However, ReadSummaries are still low resolution, so we'll bump the incoming leaseholder's timestamp cache to the maximum timestamp in the outgoing leaseholder's timestamp cache. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jan 25, 2024
Informs cockroachdb#61986. This commit adds a new `Serialize` function to `tscache.Cache` implementations. This serialization uses the `readsummary/rspb.Segment` representation added in 0950a1e. Serialization of the production `sklImpl` uses the Segment merging logic added in e4fc6f1 in order to merge together a partial serializations of each individual `sklPage` in the data structure. Timestamp cache serialization will be used to address cockroachdb#61986. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jan 29, 2024
… range merges Fixes cockroachdb#61986. Fixes cockroachdb#117486. Unblocks cockroachdb#118000. This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge. This loss of precision was a source of transaction retries for three reasons: 1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN` reason. 2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. 3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels. This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries: - `kv.lease_transfer_read_summary.local_budget` - `kv.lease_transfer_read_summary.global_budget` It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path. Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
craig bot
pushed a commit
that referenced
this issue
Jan 31, 2024
118299: kv/tscache: implement timestamp cache serialization r=nvanbenschoten a=nvanbenschoten Informs #61986. This commit adds a new `Serialize` function to `tscache.Cache` implementations. This serialization uses the `readsummary/rspb.Segment` representation added in 0950a1e. Serialization of the production `sklImpl` uses the Segment merging logic added in e4fc6f1 in order to merge together a partial serializations of each individual `sklPage` in the data structure. Timestamp cache serialization will be used to address #61986. Release note: None 118543: catalog/lease: detect if synchronous lease releases are successful r=fqazi a=fqazi Previously, for unit testing, we added support for synchronously releasing leases. If the context was cancelled when releasing a lease synchronously, it was possible for the lease to be erased from memory and not from storage. As a result, reacquisition could hit an error when session-based leasing is enabled. To address this, this patch re-orders operations so that we clear storage first for synchronous lease release, followed by the in-memory copy. Fixes: #118522, fixes #118523, fixes #118521, fixes #118550 Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Faizan Qazi <[email protected]>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jan 31, 2024
… range merges Fixes cockroachdb#61986. Fixes cockroachdb#117486. Unblocks cockroachdb#118000. This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge. This loss of precision was a source of transaction retries for three reasons: 1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN` reason. 2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. 3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels. This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries: - `kv.lease_transfer_read_summary.local_budget` - `kv.lease_transfer_read_summary.global_budget` It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path. Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
craig bot
pushed a commit
that referenced
this issue
Feb 1, 2024
115746: kv: log on excessive latch hold duration r=lyang24 a=lyang24 This commit aims to help observability by logging request holding latches over threshold. long_latch_hold_duration is a new cluster setting that is introduced to set the latch holding time threshold, latches held over the threshold will be logged at most every second. To achieve logging spanlatch.manager now contains a pointer to cluster setting. Fixes: #114609 Release note: None 118300: kv: ship high-resolution tscache summaries during lease transfers and range merges r=nvanbenschoten a=nvanbenschoten Fixes #61986. Fixes #117486. Unblocks #118000. This commit uses the new `tscache.Cache.Serialize` method introduced in #118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge. This loss of precision was a source of transaction retries for three reasons: 1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN` reason. 2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. 3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels. This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries: - `kv.lease_transfer_read_summary.local_budget` - `kv.lease_transfer_read_summary.global_budget` It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path. Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors. Co-authored-by: lyang24 <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Feb 2, 2024
118000: roachtest: run Read Committed variants of TPC-C without txn retry loops r=nvanbenschoten a=nvanbenschoten Closes #115191. Depends on #61986. This commit switches the two nightly Read Committed variants of the TPC-C roachtest to run without transaction retry loops, using the `--txn-retries` flag introduced in #117096. With #117630 and #61986 resolved (the latter of which is still in review and under development), these tests both pass. Release note: None 118600: gceworker bootstrap: fix checksum for go download r=rickystewart a=jlinder On the last go version upgrade to cockroach, the wrong checksum was entered in the gceworker bootstrap script for the downloaded tar file. This fixes it to be the correct checksum. Epic: none Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: James H. Linder <[email protected]>
cockroach-dev-inf
pushed a commit
that referenced
this issue
Feb 15, 2024
Informs #115191. This commit adds a `--txn-retries` flag to `tpcc`, allowing users of the workload to disable transaction retry loops for 40001 errors. For Serializable transactions (controllable with `--isolation-level`), this quickly leads to errors being thrown by the workload. For Read Committed transaction, the workload eventually hits an error due to a lease transfer. Combined with a prototype fix for #61986, Read Committed transactions survive for at least a few minutes without error. Release note: None
wenyihu6
pushed a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 21, 2024
Informs cockroachdb#61986. This commit adds a new `Serialize` function to `tscache.Cache` implementations. This serialization uses the `readsummary/rspb.Segment` representation added in 0950a1e. Serialization of the production `sklImpl` uses the Segment merging logic added in e4fc6f1 in order to merge together a partial serializations of each individual `sklPage` in the data structure. Timestamp cache serialization will be used to address cockroachdb#61986. Release note: None
wenyihu6
pushed a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 21, 2024
… range merges Fixes cockroachdb#61986. Fixes cockroachdb#117486. Unblocks cockroachdb#118000. This commit uses the new `tscache.Cache.Serialize` method introduced in cockroachdb#118299 to ship high-resolution summaries of the timestamp cache during lease transfers and ranges merges. In doing so, it eliminates the loss of precision that occurs in an incoming leaseholder's timestamp cache when it receives a lease transfer or range merge. This loss of precision was a source of transaction retries for three reasons: 1. txn tombstone marker keys would have their timestamp advanced, leading to TransactionAbortedError with the `ABORT_REASON_NEW_LEASE_PREVENTS_TXN` reason. 2. txn push marker keys would have their timestamp advanced, leading to transactions having their commit timestamp pushed, which could lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. 3. global keys would have their timestamp advanced as if they had been read, also leading to transactions having their commit timestamp pushed if they wrote to those keys, which could also lead to TransactionRetryError with the `RETRY_SERIALIZABLE` reason. The first issue here is the most severe, both because it can not be refreshed away and because it affects transactions of all isolation levels. This commit introduces two new cluster settings to control the maximum size of these timestamp cache read summaries: - `kv.lease_transfer_read_summary.local_budget` - `kv.lease_transfer_read_summary.global_budget` It configures the local keyspace budget to 4MB and the global keyspace budget to 0B. This default configuration should be sufficient to eliminate the first two sources of retries described above. The third has not been observed as a serious issue in practice, so we configure the global budget to 0 so that we can hit a serialization fast-path. Release note (ops change): Transaction replay protection state is now passed between the outgoing and incoming leaseholder for a range during a lease transfer. This avoids cases where lease transfers can cause transactions to throw TransactionAbortedError(ABORT_REASON_NEW_LEASE_PREVENTS_TXN) errors.
wenyihu6
pushed a commit
to wenyihu6/cockroach
that referenced
this issue
Feb 21, 2024
Closes cockroachdb#115191. Depends on cockroachdb#61986. This commit switches the two nightly Read Committed variants of the TPC-C roachtest to run without transaction retry loops, using the `--txn-retries` flag introduced in cockroachdb#117096. With cockroachdb#117630 and cockroachdb#61986 resolved (the latter of which is still in review and under development), these tests both pass. Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-distribution
Relating to rebalancing and leasing.
A-kv-transactions
Relating to MVCC and the transactional model.
A-read-committed
Related to the introduction of Read Committed
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
P-3
Issues/test failures with no fix SLA
T-kv
KV Team
Background / Motivation
In #60521, we began shipping a summary of an outgoing leaseholder's timestamp cache during lease transfers and range merges. The incoming / post-merge leaseholder then applies this summary after assuming control of the keyspace to ensure that no future writes are allowed to invalidate prior reads.
The structure, represented by a new
ReadSummary
proto, includes information likeread on key "a" were served up to time 10
. Because the timestamp cache is used to prevent transaction replays (seeCanCreateTxnRecord
), it also includes information liketxn 1234 already committed
.This replaced an existing mechanism where the incoming leaseholder would conservatively bump its timestamp cache all the way up to the lease start time / merge freeze time across its entire keyspace. The higher fidelity summary allows the new leaseholder to avoid bumping its timestamp cache as aggressively. This can limit the impact of lease transfers on foreground traffic - reducing false read-write contention which causes transaction retries (e.g. see
TestStoreLeaseTransferTimestampCacheRead
) and reducing false positives for transaction replay detection which causes transaction aborts (e.g. seeTestStoreLeaseTransferTimestampCacheTxnRecord
).However, currently, the
ReadSummary
structure still only maintains a low-resolution snapshot of the outgoing leaseholder's timestamp cache. So while #60521 introduced the mechanism to ship a timestamp cache summary (and fixed two bugs in the process), it didn't actually begin taking full advantage of this new mechanism.Proposed Change
Now that the ReadSummary mechanism is in place, we can begin using it to limit transaction retries due to lease transfers and range merges.
First, we'll want to clean up the compatibility code added in #60521, now that we can be sure that all nodes in any cluster that touch our changes here will be aware of the
ReadSummary
and will be using the per-range closed timestamp system. This will allow us to remove code like this and this.Once we do that, we'll want to augment the
ReadSummary
structure with the ability to carry high-resolution information, subject to some memory limits. We'll updateReplica.GetCurrentReadSummary
to collect this information from the leaseholder's timestamp cache. The details of this structure are TBD, as are the details of the memory limits. We'll likely want to prioritize the local keyspace segment over the global keyspace segment, as txn aborts are more disruptive than txn pushes, because they cannot be refreshed away.Once we can represent and capture higher-resolution information in a
ReadSummary
, we'll want to introduce some form of compression of these summaries. This will allow us to achieve the desired scheme of shipping a high-res, high-mem summary on log entries but only persisting a low-res, low-mem summary indefinitely in a range's keyspace:cockroach/pkg/kv/kvserver/kvserverpb/proposer_kv.proto
Lines 177 to 187 in e09b93f
With these changes in place, we should see a stark drop in the disruption of lease transfers and range merges to foreground traffic.
Jira issue: CRDB-2701
gz#15057
gz#19312
Epic CRDB-34172
The text was updated successfully, but these errors were encountered: