-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: release latches before evaluation for read-only requests #66485
Comments
(copying the related comment from https://github.com/cockroachlabs/support/issues/1013#issuecomment-858898819) However, for potentially long running reads that don't want any intents and are not transactional or care about uncertainty (and so don't need intent interleaving during evaluation), we can have a different path that does scan the lock table up front since it will only happen once. I believe this applies to ExportRequest. |
This is relevant for read evaluation cases where we want to release latches before evaluation. The new Reader.PinEngineStateForIterators method would be called before releasing latches. This pinning does not apply to iterators with timestamp hints, similar to how ConsistentIterators does not apply to them. So this does not help ExportRequest evaluation for incremental backups. To address this we would want a guarantee from the caller that the timestamp hints will not change for the lifetime of the Reader that is making a promise of ConsistentIterators. Informs cockroachdb#55461,cockroachdb#66485 Release note: None
I'd like to understand this. You're saying that a read that scans the lock table (scan 1), finds 5 locks which it resolves, and then proceeds to scan the regular key space (scan 2) will need to also scan the locks in parallel with the regular key scan? Because the resolving of the locks will not actually wait for the replication of the cleanup of the respective intents before letting the read do the regular key scan? |
Correct. This is mainly to not overcomplicate the code since the evaluation needs to see one own's intents and do uncertainty checking, that the lock checking does not need to do.
One way to make this work efficiently is to do a scan and keep in-memory state for all the intents that can be relevant, for lock conflicts, read-your-own-writes, uncertainty checking. We would discard the in-memory state if it became too large, so this would be just the fast path. Then use a subset for the lockTable, and all of it for the request evaluation. |
My concern was that we don't want frequent failures on this fast path. I think we can use MVCCStats.IntentCount as a guide. And we're already writing specialized iterators for intent interleaving (e.g. the recent PR for faster ranged intent resolution) so this fits in nicely with that direction. |
This is relevant for read evaluation cases where we want to release latches before evaluation. The new Reader.PinEngineStateForIterators method would be called before releasing latches. This pinning does not apply to iterators with timestamp hints, similar to how ConsistentIterators does not apply to them. So this does not help ExportRequest evaluation for incremental backups. To address this we would want a guarantee from the caller that the timestamp hints will not change for the lifetime of the Reader that is making a promise of ConsistentIterators. Informs cockroachdb#55461,cockroachdb#66485 Release note: None
66845: storage: add Reader method to pin iterators at current engine state r=sumeerbhola a=sumeerbhola This is relevant for read evaluation cases where we want to release latches before evaluation. The new Reader.PinEngineStateForIterators method would be called before releasing latches. This pinning does not apply to iterators with timestamp hints, similar to how ConsistentIterators does not apply to them. So this does not help ExportRequest evaluation for incremental backups. To address this we would want a guarantee from the caller that the timestamp hints will not change for the lifetime of the Reader that is making a promise of ConsistentIterators. Informs #55461,#66485 Release note: None 66885: sql: add ReType to resolveAndRequireType to fix vector engine panic r=cucaroach a=cucaroach Fixes #66708 The vector engine needs exact type coercion, specifically when piping computed column values into downstream operators. Without this fix the computed column was left as an int64 instead of cast back to the required int16 type. Exposed by sqlsmith, kudos to @michae2 for the reduce help Release note: None Co-authored-by: sumeerbhola <[email protected]> Co-authored-by: Tommy Reilly <[email protected]>
This was originally introduced to work around limitations in the `storage.Reader` interface, where only a `RocksDB` instance could be passed to `engine.ExportToSst`. Since then, a lot has changed, and this is no longer needed. Removing this is important, as it appears to undermine cockroachdb#55461 and make cockroachdb#66485 difficult.
This change removes the Engine method from EvalContext. Removing this is important, as its existence appears to undermine cockroachdb#55461 and make cockroachdb#66485 difficult. The first place where this was used was in EndTxn's evaluation function. I don't see any reason for this. In fact, we had a TODO to fix this, which we could have addressed years ago. The second place where this was used was in RecomputeStats's evaluation function. There, it was used for two reasons. First, it was used because `storage.Batch` used to not provide a consistent view of data. They now do. It was also used to evade spanset assertions, which this commit addresses in a better way.
66917: kvcoord: assert sanity when tracking in-flight write r=andreimatei a=andreimatei Release note: None 67093: kv: remove spanset.GetDBEngine r=nvanbenschoten a=nvanbenschoten This was originally introduced to work around limitations in the `storage.Reader` interface, where only a `RocksDB` instance could be passed to `engine.ExportToSst`. Since then, a lot has changed, and this is no longer needed. Removing this is important, as it appears to undermine #55461 and make #66485 difficult. Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
67121: kv: remove EvalContext.Engine r=nvanbenschoten a=nvanbenschoten This change removes the Engine method from EvalContext. Removing this is important, as its existence appears to undermine #55461 and make #66485 difficult. The first place where this was used was in EndTxn's evaluation function. I don't see any reason for this. In fact, we had a TODO to fix this, which we could have addressed years ago. The second place where this was used was in RecomputeStats's evaluation function. There, it was used for two reasons. First, it was used because `storage.Batch` used to not provide a consistent view of data. They now do. It was also used to evade spanset assertions, which this commit addresses in a better way. 67146: c-deps: remove protobuf r=dt a=dt Release note: none. Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: David Taylor <[email protected]>
…tents Related to cockroachdb#80706. Related to cockroachdb#66485. This commit makes a slight modification to `pebbleMVCCScanner` that changes how it determines whether an intent is uncertain or not. Instead of consulting the version timestamp in the `MVCCMetadata` struct and comparing that against the scan's uncertainty interval, the `pebbleMVCCScanner` now scans through the uncertainty interval and uses the intent's provisional value's timestamp to determine uncertainty. The `pebbleMVCCScanner` was actually already doing this scan to compute uncertainty for other committed values in its uncertainty interval if it found that the intent was not uncertain. However, after this change, it also relies on the scan to compute uncertainty for the intent itself. This is safe, because the calling code knows that the intent has a higher timestamp than the scan, so there is no risk that the scan adds the provisional value to its result set. This change is important for two reasons: 1. it avoids the need to store the `local_timestamp` field (introduced in cockroachdb#80706) in the `MVCCMetadata` struct. 2. it moves us closer in the direction of using `MVCCMetadata` values (ts=0, essentially locks protecting provisional values) to determine read-write conflicts but then using versioned provisional values to determine uncertainty. Doing so allows us to decompose a KV scan into a separate lock table scan to detect read-write conflicts and a MVCC scan to accumulate a result set while checking for uncertainty. This will be important for cockroachdb#66485.
…tents Related to cockroachdb#80706. Related to cockroachdb#66485. This commit makes a slight modification to `pebbleMVCCScanner` that changes how it determines whether an intent is uncertain or not. Instead of consulting the version timestamp in the `MVCCMetadata` struct and comparing that against the scan's uncertainty interval, the `pebbleMVCCScanner` now scans through the uncertainty interval and uses the intent's provisional value's timestamp to determine uncertainty. The `pebbleMVCCScanner` was actually already doing this scan to compute uncertainty for other committed values in its uncertainty interval if it found that the intent was not uncertain. However, after this change, it also relies on the scan to compute uncertainty for the intent itself. This is safe, because the calling code knows that the intent has a higher timestamp than the scan, so there is no risk that the scan adds the provisional value to its result set. This change is important for two reasons: 1. it avoids the need to store the `local_timestamp` field (introduced in cockroachdb#80706) in the `MVCCMetadata` struct. 2. it moves us closer in the direction of using `MVCCMetadata` values (ts=0, essentially locks protecting provisional values) to determine read-write conflicts but then using versioned provisional values to determine uncertainty. Doing so allows us to decompose a KV scan into a separate lock table scan to detect read-write conflicts and a MVCC scan to accumulate a result set while checking for uncertainty. This will be important for cockroachdb#66485.
81125: storage: use provisional version value to determine uncertainty of intents r=sumeerbhola a=nvanbenschoten Related to #80706. Related to #66485. This commit makes a slight modification to `pebbleMVCCScanner` that changes how it determines whether an intent is uncertain or not. Instead of consulting the version timestamp in the `MVCCMetadata` struct and comparing that against the scan's uncertainty interval, the `pebbleMVCCScanner` now scans through the uncertainty interval and uses the intent's provisional value's timestamp to determine uncertainty. The `pebbleMVCCScanner` was actually already doing this scan to compute uncertainty for other committed values in its uncertainty interval if it found that the intent was not uncertain. However, after this change, it also relies on the scan to compute uncertainty for the intent itself. This is safe, because the calling code knows that the intent has a higher timestamp than the scan, so there is no risk that the scan adds the provisional value to its result set. This change is important for two reasons: 1. it avoids the need to store the `local_timestamp` field (introduced in #80706) in the `MVCCMetadata` struct. 2. it moves us closer in the direction of using `MVCCMetadata` values (ts=0, essentially locks protecting provisional values) to determine read-write conflicts but then using versioned provisional values to determine uncertainty. Doing so allows us to decompose a KV scan into a separate lock table scan to detect read-write conflicts and a MVCC scan to accumulate a result set while checking for uncertainty. This will be important for #66485. Co-authored-by: Nathan VanBenschoten <[email protected]>
…luation This commit introduces a change to the way certain types of read-only requests are evaluated. Traditionally, read-only requests have held their latches throughout their execution. This commit allows certain qualifying reads to be able to release their latches earlier. At a high level, reads may attempt to resolve all conflicts upfront by performing a sort of "validation" phase before they perform their MVCC scan. This validation phase performs a scan of the lock table keyspace in order to find any conflicting intents that may need to be resolved before the actual evaluation of the request over the MVCC keyspace. If no conflicting intents are found, then (since cockroachdb#76312), the request is guaranteed to be fully isolated against all other concurrent requests and can be allowed to release its latches at this point. This allows the actual evaluation of the read (over the MVCC part of the keyspace) to proceed without latches being held, which is the main motivation of this work. This validation phase could be thought of as an extension to the validation that the concurrency manager already performs when requests are sequenced through it, by trying to detect any conflicting intents that have already been pulled into the in-memory lock table. Additionally, for certain types of requests that can drop their latches early, and do not need to access the `IntentHistory` for any of their parent txn's intents, this commit attempts to make their MVCC scan cheaper by eliminating the need for an `intentInterleavingIterator`. This is enabled by the observation that once the validation phase is complete, the only remaining intents in the read's declared span must be intents belonging to the reader's transaction. So if the reader doesn't need to read an intent that isn't the latest intent on a key, then it doesn't need access to the key's `IntentHistory` (which lives in the lock-table keyspace), and doesn't need to use an `intentInterleavingIterator`. Release note (performance improvement): certain types of reads will now have a far smaller contention footprint with conflicting concurrent writers Resolves cockroachdb#66485 Release justification: high benefit change to existing functionality, part of 22.2 roadmap
…luation This commit introduces a change to the way certain types of read-only requests are evaluated. Traditionally, read-only requests have held their latches throughout their execution. This commit allows certain qualifying reads to be able to release their latches earlier. At a high level, reads may attempt to resolve all conflicts upfront by performing a sort of "validation" phase before they perform their MVCC scan. This validation phase performs a scan of the lock table keyspace in order to find any conflicting intents that may need to be resolved before the actual evaluation of the request over the MVCC keyspace. If no conflicting intents are found, then (since cockroachdb#76312), the request is guaranteed to be fully isolated against all other concurrent requests and can be allowed to release its latches at this point. This allows the actual evaluation of the read (over the MVCC part of the keyspace) to proceed without latches being held, which is the main motivation of this work. This validation phase could be thought of as an extension to the validation that the concurrency manager already performs when requests are sequenced through it, by trying to detect any conflicting intents that have already been pulled into the in-memory lock table. Additionally, for certain types of requests that can drop their latches early, and do not need to access the `IntentHistory` for any of their parent txn's intents, this commit attempts to make their MVCC scan cheaper by eliminating the need for an `intentInterleavingIterator`. This is enabled by the observation that once the validation phase is complete, the only remaining intents in the read's declared span must be intents belonging to the reader's transaction. So if the reader doesn't need to read an intent that isn't the latest intent on a key, then it doesn't need access to the key's `IntentHistory` (which lives in the lock-table keyspace), and doesn't need to use an `intentInterleavingIterator`. Release note (performance improvement): certain types of reads will now have a far smaller contention footprint with conflicting concurrent writers Resolves cockroachdb#66485 Release justification: high benefit change to existing functionality, part of 22.2 roadmap
…luation This commit introduces a change to the way certain types of read-only requests are evaluated. Traditionally, read-only requests have held their latches throughout their execution. This commit allows certain qualifying reads to be able to release their latches earlier. At a high level, reads may attempt to resolve all conflicts upfront by performing a sort of "validation" phase before they perform their MVCC scan. This validation phase performs a scan of the lock table keyspace in order to find any conflicting intents that may need to be resolved before the actual evaluation of the request over the MVCC keyspace. If no conflicting intents are found, then (since cockroachdb#76312), the request is guaranteed to be fully isolated against all other concurrent requests and can be allowed to release its latches at this point. This allows the actual evaluation of the read (over the MVCC part of the keyspace) to proceed without latches being held, which is the main motivation of this work. This validation phase could be thought of as an extension to the validation that the concurrency manager already performs when requests are sequenced through it, by trying to detect any conflicting intents that have already been pulled into the in-memory lock table. Additionally, for certain types of requests that can drop their latches early, and do not need to access the `IntentHistory` for any of their parent txn's intents, this commit attempts to make their MVCC scan cheaper by eliminating the need for an `intentInterleavingIterator`. This is enabled by the observation that once the validation phase is complete, the only remaining intents in the read's declared span must be intents belonging to the reader's transaction. So if the reader doesn't need to read an intent that isn't the latest intent on a key, then it doesn't need access to the key's `IntentHistory` (which lives in the lock-table keyspace), and doesn't need to use an `intentInterleavingIterator`. Release note (performance improvement): certain types of reads will now have a far smaller contention footprint with conflicting concurrent writers Resolves cockroachdb#66485 Release justification: high benefit change to existing functionality, part of 22.2 roadmap
…luation This commit introduces a change to the way certain types of read-only requests are evaluated. Traditionally, read-only requests have held their latches throughout their execution. This commit allows certain qualifying reads to be able to release their latches earlier. At a high level, reads may attempt to resolve all conflicts upfront by performing a sort of "validation" phase before they perform their MVCC scan. This validation phase performs a scan of the lock table keyspace in order to find any conflicting intents that may need to be resolved before the actual evaluation of the request over the MVCC keyspace. If no conflicting intents are found, then (since cockroachdb#76312), the request is guaranteed to be fully isolated against all other concurrent requests and can be allowed to release its latches at this point. This allows the actual evaluation of the read (over the MVCC part of the keyspace) to proceed without latches being held, which is the main motivation of this work. This validation phase could be thought of as an extension to the validation that the concurrency manager already performs when requests are sequenced through it, by trying to detect any conflicting intents that have already been pulled into the in-memory lock table. Additionally, for certain types of requests that can drop their latches early, and do not need to access the `IntentHistory` for any of their parent txn's intents, this commit attempts to make their MVCC scan cheaper by eliminating the need for an `intentInterleavingIterator`. This is enabled by the observation that once the validation phase is complete, the only remaining intents in the read's declared span must be intents belonging to the reader's transaction. So if the reader doesn't need to read an intent that isn't the latest intent on a key, then it doesn't need access to the key's `IntentHistory` (which lives in the lock-table keyspace), and doesn't need to use an `intentInterleavingIterator`. Release note (performance improvement): certain types of reads will now have a far smaller contention footprint with conflicting concurrent writers Resolves cockroachdb#66485 Release justification: high benefit change to existing functionality, part of 22.2 roadmap
…luation This commit introduces a change to the way certain types of read-only requests are evaluated. Traditionally, read-only requests have held their latches throughout their execution. This commit allows certain qualifying reads to be able to release their latches earlier. At a high level, reads may attempt to resolve all conflicts upfront by performing a sort of "validation" phase before they perform their MVCC scan. This validation phase performs a scan of the lock table keyspace in order to find any conflicting intents that may need to be resolved before the actual evaluation of the request over the MVCC keyspace. If no conflicting intents are found, then (since cockroachdb#76312), the request is guaranteed to be fully isolated against all other concurrent requests and can be allowed to release its latches at this point. This allows the actual evaluation of the read (over the MVCC part of the keyspace) to proceed without latches being held, which is the main motivation of this work. This validation phase could be thought of as an extension to the validation that the concurrency manager already performs when requests are sequenced through it, by trying to detect any conflicting intents that have already been pulled into the in-memory lock table. Additionally, for certain types of requests that can drop their latches early, and do not need to access the `IntentHistory` for any of their parent txn's intents, this commit attempts to make their MVCC scan cheaper by eliminating the need for an `intentInterleavingIterator`. This is enabled by the observation that once the validation phase is complete, the only remaining intents in the read's declared span must be intents belonging to the reader's transaction. So if the reader doesn't need to read an intent that isn't the latest intent on a key, then it doesn't need access to the key's `IntentHistory` (which lives in the lock-table keyspace), and doesn't need to use an `intentInterleavingIterator`. Release note (performance improvement): certain types of reads will now have a far smaller contention footprint with conflicting concurrent writers Resolves cockroachdb#66485 Release justification: high benefit change to existing functionality, part of 22.2 roadmap
…luation This commit introduces a change to the way certain types of read-only requests are evaluated. Traditionally, read-only requests have held their latches throughout their execution. This commit allows certain qualifying reads to be able to release their latches earlier. At a high level, reads may attempt to resolve all conflicts upfront by performing a sort of "validation" phase before they perform their MVCC scan. This validation phase performs a scan of the lock table keyspace in order to find any conflicting intents that may need to be resolved before the actual evaluation of the request over the MVCC keyspace. If no conflicting intents are found, then (since cockroachdb#76312), the request is guaranteed to be fully isolated against all other concurrent requests and can be allowed to release its latches at this point. This allows the actual evaluation of the read (over the MVCC part of the keyspace) to proceed without latches being held, which is the main motivation of this work. This validation phase could be thought of as an extension to the validation that the concurrency manager already performs when requests are sequenced through it, by trying to detect any conflicting intents that have already been pulled into the in-memory lock table. Additionally, for certain types of requests that can drop their latches early, and do not need to access the `IntentHistory` for any of their parent txn's intents, this commit attempts to make their MVCC scan cheaper by eliminating the need for an `intentInterleavingIterator`. This is enabled by the observation that once the validation phase is complete, the only remaining intents in the read's declared span must be intents belonging to the reader's transaction. So if the reader doesn't need to read an intent that isn't the latest intent on a key, then it doesn't need access to the key's `IntentHistory` (which lives in the lock-table keyspace), and doesn't need to use an `intentInterleavingIterator`. Release note (performance improvement): certain types of reads will now have a far smaller contention footprint with conflicting concurrent writers Resolves cockroachdb#66485 Release justification: high benefit change to existing functionality, part of 22.2 roadmap
…luation This commit introduces a change to the way certain types of read-only requests are evaluated. Traditionally, read-only requests have held their latches throughout their execution. This commit allows certain qualifying reads to be able to release their latches earlier. At a high level, reads may attempt to resolve all conflicts upfront by performing a sort of "validation" phase before they perform their MVCC scan. This validation phase performs a scan of the lock table keyspace in order to find any conflicting intents that may need to be resolved before the actual evaluation of the request over the MVCC keyspace. If no conflicting intents are found, then (since cockroachdb#76312), the request is guaranteed to be fully isolated against all other concurrent requests and can be allowed to release its latches at this point. This allows the actual evaluation of the read (over the MVCC part of the keyspace) to proceed without latches being held, which is the main motivation of this work. This validation phase could be thought of as an extension to the validation that the concurrency manager already performs when requests are sequenced through it, by trying to detect any conflicting intents that have already been pulled into the in-memory lock table. Additionally, for certain types of requests that can drop their latches early, and do not need to access the `IntentHistory` for any of their parent txn's intents, this commit attempts to make their MVCC scan cheaper by eliminating the need for an `intentInterleavingIterator`. This is enabled by the observation that once the validation phase is complete, the only remaining intents in the read's declared span must be intents belonging to the reader's transaction. So if the reader doesn't need to read an intent that isn't the latest intent on a key, then it doesn't need access to the key's `IntentHistory` (which lives in the lock-table keyspace), and doesn't need to use an `intentInterleavingIterator`. Release note (performance improvement): certain types of reads will now have a far smaller contention footprint with conflicting concurrent writers Resolves cockroachdb#66485 Release justification: high benefit change to existing functionality, part of 22.2 roadmap
…luation This commit introduces a change to the way certain types of read-only requests are evaluated. Traditionally, read-only requests have held their latches throughout their execution. This commit allows certain qualifying reads to be able to release their latches earlier. At a high level, reads may attempt to resolve all conflicts upfront by performing a sort of "validation" phase before they perform their MVCC scan. This validation phase performs a scan of the lock table keyspace in order to find any conflicting intents that may need to be resolved before the actual evaluation of the request over the MVCC keyspace. If no conflicting intents are found, then (since cockroachdb#76312), the request is guaranteed to be fully isolated against all other concurrent requests and can be allowed to release its latches at this point. This allows the actual evaluation of the read (over the MVCC part of the keyspace) to proceed without latches being held, which is the main motivation of this work. This validation phase could be thought of as an extension to the validation that the concurrency manager already performs when requests are sequenced through it, by trying to detect any conflicting intents that have already been pulled into the in-memory lock table. Additionally, for certain types of requests that can drop their latches early, and do not need to access the `IntentHistory` for any of their parent txn's intents, this commit attempts to make their MVCC scan cheaper by eliminating the need for an `intentInterleavingIterator`. This is enabled by the observation that once the validation phase is complete, the only remaining intents in the read's declared span must be intents belonging to the reader's transaction. So if the reader doesn't need to read an intent that isn't the latest intent on a key, then it doesn't need access to the key's `IntentHistory` (which lives in the lock-table keyspace), and doesn't need to use an `intentInterleavingIterator`. Release note (performance improvement): certain types of reads will now have a far smaller contention footprint with conflicting concurrent writers Resolves cockroachdb#66485 Release justification: high benefit change to existing functionality, part of 22.2 roadmap
85993: kvserver: allow certain read-only requests to drop latches before evaluation r=aayushshah15 a=aayushshah15 This commit introduces a change to the way certain types of read-only requests are evaluated. Traditionally, read-only requests have held their latches throughout their execution. This commit allows certain qualifying reads to be able to release their latches earlier. At a high level, reads may attempt to resolve all conflicts upfront by performing a sort of "validation" phase before they perform their MVCC scan. This validation phase performs a scan of the lock table keyspace in order to find any conflicting intents that may need to be resolved before the actual evaluation of the request over the MVCC keyspace. If no conflicting intents are found, then (since #76312), the request is guaranteed to be fully isolated against all other concurrent requests and can be allowed to release its latches at this point. This allows the actual evaluation of the read (over the MVCC part of the keyspace) to proceed without latches being held, which is the main motivation of this work. This validation phase could be thought of as an extension to the validation that the concurrency manager already performs when requests are sequenced through it, by trying to detect any conflicting intents that have already been pulled into the in-memory lock table. Additionally, for certain types of requests that can drop their latches early, and do not need to access the `IntentHistory` for any of their parent txn's intents, this commit attempts to make their MVCC scan cheaper by eliminating the need for an `intentInterleavingIterator`. This is enabled by the observation that once the validation phase is complete, the only remaining intents in the read's declared span must be intents belonging to the reader's transaction. So if the reader doesn't need to read an intent that isn't the latest intent on a key, then it doesn't need access to the key's `IntentHistory` (which lives in the lock-table keyspace), and doesn't need to use an `intentInterleavingIterator`. Release note (performance improvement): certain types of reads will now have a far smaller contention footprint with conflicting concurrent writers Resolves #66485 Release justification: high benefit change to existing functionality, part of 22.2 roadmap Co-authored-by: Aayush Shah <[email protected]>
Related to, but separate from #25579.
Related to #41720 (comment).
Original discussion in https://forum.cockroachlabs.com/t/why-do-we-keep-read-commands-in-the-command-queue/360.
Currently, read-only requests hold latches across evaluation. These are read-only latches, and so they do not conflict with writes at higher timestamps than the read. However, they do still block writes at lower timestamps than the read (at least until we address #25579) and do still block non-MVCC writes like range splits. This is a problem, as the more we allow requests to block on one another, the more we open ourselves up to unexpected tail latency under rare sequences of events like we saw in #66338, where an
Export
(MVCC read), blocked a split (non-MVCC write), blocked a large set of writes (MVCC write).We should re-evaluate this decision and determine whether we can drop latches earlier for reads, before they spend any time reading data. Doing so would rely on #55461. The high-level idea is that we could grab an LSM snapshot and bump the timestamp cache, and then drop the read's latches (and
readOnlyCmdMu
) before proceeding to evaluation.For context, there are two reasons why we didn't do this in the past, and both have to do with wanting to wait to bump the timestamp cache until after evaluation:
These concerns are still both valid to some degree, but things have also changed over the past few years.
Regarding the starvation of writers, this is no longer as severe of a problem as it once was. This is because transactions can now refresh their reads if their write timestamp is bumped. While contention can still prevent a read refresh from succeeding and force a transaction to perform a full restart, this is no longer a given like it once was. Additionally, with the
lockTable
, we will soon have the ability to check for conflicting locks over a span of keys without performing a full scan of the data. This opens up the opportunity to scan thelockTable
upfront, while holding latches, and then only proceed to bump the timestamp cache if no conflicts are found.The second reason still seems legitimate. We don't want to bump the timestamp cache during a scan if we don't know the full width of the scan ahead of time. Doing so would often lead to limited scans (
SELECT * FROM t LIMIT 1
) bumping the timestamp cache across the entire range, which would trip up otherwise uncontended writes. We have recently landed changes to optimistically latch and lock for limited scans that don't know until evaluation how far they will need to scan. This feels related to me, though I don't have a clear picture about how to take advantage of optimistic evaluation to avoid holding latches across evaluation.However, one thing that has changed regarding limited scans is that we now have a concept of a closed timestamp, which prevents writes below some fixed interval that lags present time. If a limited scan is run below this timestamp, as
ExportRequests
often do, there is no benefit for bumping the timestamp cache after evaluation.So with all of these changes, it seems appropriate that we reconsider the synchronization of read-only requests.
/cc. @andreimatei
Jira issue: CRDB-8063
The text was updated successfully, but these errors were encountered: