Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: release latches before evaluation for read-only requests #66485

Closed
nvanbenschoten opened this issue Jun 15, 2021 · 5 comments · Fixed by #85993
Closed

kv: release latches before evaluation for read-only requests #66485

nvanbenschoten opened this issue Jun 15, 2021 · 5 comments · Fixed by #85993
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Jun 15, 2021

Related to, but separate from #25579.
Related to #41720 (comment).
Original discussion in https://forum.cockroachlabs.com/t/why-do-we-keep-read-commands-in-the-command-queue/360.

Currently, read-only requests hold latches across evaluation. These are read-only latches, and so they do not conflict with writes at higher timestamps than the read. However, they do still block writes at lower timestamps than the read (at least until we address #25579) and do still block non-MVCC writes like range splits. This is a problem, as the more we allow requests to block on one another, the more we open ourselves up to unexpected tail latency under rare sequences of events like we saw in #66338, where an Export (MVCC read), blocked a split (non-MVCC write), blocked a large set of writes (MVCC write).

We should re-evaluate this decision and determine whether we can drop latches earlier for reads, before they spend any time reading data. Doing so would rely on #55461. The high-level idea is that we could grab an LSM snapshot and bump the timestamp cache, and then drop the read's latches (and readOnlyCmdMu) before proceeding to evaluation.

For context, there are two reasons why we didn't do this in the past, and both have to do with wanting to wait to bump the timestamp cache until after evaluation:

  1. we don't want to bump the timestamp cache immediately in case the read runs into intents during evaluation and is forced to retry after pushing/waiting. Bumping the timestamp cache early could cause some form of starvation to writers who lay down and intent and then want to re-write it in a subsequent retry. If readers always bumped the timestamp cache before finding the conflicting intent then they could prevent the writer from every completing.
  2. we don't want to bump the timestamp cache for limited scans if we don't know the full extent of what they will (or won't) read. Doing so could cause false-contention.

These concerns are still both valid to some degree, but things have also changed over the past few years.

Regarding the starvation of writers, this is no longer as severe of a problem as it once was. This is because transactions can now refresh their reads if their write timestamp is bumped. While contention can still prevent a read refresh from succeeding and force a transaction to perform a full restart, this is no longer a given like it once was. Additionally, with the lockTable, we will soon have the ability to check for conflicting locks over a span of keys without performing a full scan of the data. This opens up the opportunity to scan the lockTable upfront, while holding latches, and then only proceed to bump the timestamp cache if no conflicts are found.

The second reason still seems legitimate. We don't want to bump the timestamp cache during a scan if we don't know the full width of the scan ahead of time. Doing so would often lead to limited scans (SELECT * FROM t LIMIT 1) bumping the timestamp cache across the entire range, which would trip up otherwise uncontended writes. We have recently landed changes to optimistically latch and lock for limited scans that don't know until evaluation how far they will need to scan. This feels related to me, though I don't have a clear picture about how to take advantage of optimistic evaluation to avoid holding latches across evaluation.

However, one thing that has changed regarding limited scans is that we now have a concept of a closed timestamp, which prevents writes below some fixed interval that lags present time. If a limited scan is run below this timestamp, as ExportRequests often do, there is no benefit for bumping the timestamp cache after evaluation.

So with all of these changes, it seems appropriate that we reconsider the synchronization of read-only requests.

/cc. @andreimatei

Jira issue: CRDB-8063

@nvanbenschoten nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-transactions Relating to MVCC and the transactional model. labels Jun 15, 2021
@jlinder jlinder added the T-kv KV Team label Jun 16, 2021
@sumeerbhola
Copy link
Collaborator

Additionally, with the lockTable, we will soon have the ability to check for conflicting locks over a span of keys without performing a full scan of the data. This opens up the opportunity to scan the lockTable upfront, while holding latches, and then only proceed to bump the timestamp cache if no conflicts are found.

(copying the related comment from https://github.com/cockroachlabs/support/issues/1013#issuecomment-858898819)
The current thinking is not to do this in the general case. There is a cost to scanning the lock table key space in the engine twice (the evaluation needs intent interleaving too). So we will only scan the in-memory lockTable data-structure which is an incomplete picture of the intents.

However, for potentially long running reads that don't want any intents and are not transactional or care about uncertainty (and so don't need intent interleaving during evaluation), we can have a different path that does scan the lock table up front since it will only happen once. I believe this applies to ExportRequest.

sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jun 24, 2021
This is relevant for read evaluation cases where we want to release
latches before evaluation. The new Reader.PinEngineStateForIterators
method would be called before releasing latches.

This pinning does not apply to iterators with timestamp hints,
similar to how ConsistentIterators does not apply to them. So
this does not help ExportRequest evaluation for incremental backups.
To address this we would want a guarantee from the caller that the
timestamp hints will not change for the lifetime of the Reader that
is making a promise of ConsistentIterators.

Informs cockroachdb#55461,cockroachdb#66485

Release note: None
@andreimatei
Copy link
Contributor

The current thinking is not to do this in the general case. There is a cost to scanning the lock table key space in the engine twice (the evaluation needs intent interleaving too). So we will only scan the in-memory lockTable data-structure which is an incomplete picture of the intents.

I'd like to understand this. You're saying that a read that scans the lock table (scan 1), finds 5 locks which it resolves, and then proceeds to scan the regular key space (scan 2) will need to also scan the locks in parallel with the regular key scan? Because the resolving of the locks will not actually wait for the replication of the cleanup of the respective intents before letting the read do the regular key scan?
The way I imagined we'd handle this is by the reader keeping track of the locks that it resolved as part of scan 1, and interpolating them into scan 2 without actually needing to scan the locks again. No?

@sumeerbhola
Copy link
Collaborator

You're saying that a read that scans the lock table (scan 1), finds 5 locks which it resolves, and then proceeds to scan the regular key space (scan 2) will need to also scan the locks in parallel with the regular key scan?

Correct. This is mainly to not overcomplicate the code since the evaluation needs to see one own's intents and do uncertainty checking, that the lock checking does not need to do.

The way I imagined we'd handle this is by the reader keeping track of the locks that it resolved as part of scan 1, and interpolating them into scan 2 without actually needing to scan the locks again. No?

One way to make this work efficiently is to do a scan and keep in-memory state for all the intents that can be relevant, for lock conflicts, read-your-own-writes, uncertainty checking. We would discard the in-memory state if it became too large, so this would be just the fast path. Then use a subset for the lockTable, and all of it for the request evaluation.

@sumeerbhola
Copy link
Collaborator

One way to make this work efficiently is to do a scan and keep in-memory state for all the intents that can be relevant, for lock conflicts, read-your-own-writes, uncertainty checking. We would discard the in-memory state if it became too large, so this would be just the fast path. Then use a subset for the lockTable, and all of it for the request evaluation.

My concern was that we don't want frequent failures on this fast path. I think we can use MVCCStats.IntentCount as a guide. And we're already writing specialized iterators for intent interleaving (e.g. the recent PR for faster ranged intent resolution) so this fits in nicely with that direction.

sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jun 25, 2021
This is relevant for read evaluation cases where we want to release
latches before evaluation. The new Reader.PinEngineStateForIterators
method would be called before releasing latches.

This pinning does not apply to iterators with timestamp hints,
similar to how ConsistentIterators does not apply to them. So
this does not help ExportRequest evaluation for incremental backups.
To address this we would want a guarantee from the caller that the
timestamp hints will not change for the lifetime of the Reader that
is making a promise of ConsistentIterators.

Informs cockroachdb#55461,cockroachdb#66485

Release note: None
craig bot pushed a commit that referenced this issue Jun 28, 2021
66845: storage: add Reader method to pin iterators at current engine state r=sumeerbhola a=sumeerbhola

This is relevant for read evaluation cases where we want to release
latches before evaluation. The new Reader.PinEngineStateForIterators
method would be called before releasing latches.

This pinning does not apply to iterators with timestamp hints,
similar to how ConsistentIterators does not apply to them. So
this does not help ExportRequest evaluation for incremental backups.
To address this we would want a guarantee from the caller that the
timestamp hints will not change for the lifetime of the Reader that
is making a promise of ConsistentIterators.

Informs #55461,#66485

Release note: None

66885: sql: add ReType to resolveAndRequireType to fix vector engine panic r=cucaroach a=cucaroach

Fixes #66708

The vector engine needs exact type coercion, specifically when piping
computed column values into downstream operators.  Without this fix
the computed column was left as an int64 instead of cast back to the
required int16 type.

Exposed by sqlsmith, kudos to @michae2 for the reduce help

Release note: None

Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Tommy Reilly <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 30, 2021
This was originally introduced to work around limitations in the
`storage.Reader` interface, where only a `RocksDB` instance could
be passed to `engine.ExportToSst`. Since then, a lot has changed,
and this is no longer needed.

Removing this is important, as it appears to undermine cockroachdb#55461 and
make cockroachdb#66485 difficult.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 1, 2021
This change removes the Engine method from EvalContext. Removing this is
important, as its existence appears to undermine cockroachdb#55461 and make cockroachdb#66485
difficult.

The first place where this was used was in EndTxn's evaluation function.
I don't see any reason for this. In fact, we had a TODO to fix this, which
we could have addressed years ago.

The second place where this was used was in RecomputeStats's evaluation
function. There, it was used for two reasons. First, it was used because
`storage.Batch` used to not provide a consistent view of data. They now
do. It was also used to evade spanset assertions, which this commit
addresses in a better way.
craig bot pushed a commit that referenced this issue Jul 1, 2021
66917: kvcoord: assert sanity when tracking in-flight write r=andreimatei a=andreimatei

Release note: None

67093: kv: remove spanset.GetDBEngine r=nvanbenschoten a=nvanbenschoten

This was originally introduced to work around limitations in the
`storage.Reader` interface, where only a `RocksDB` instance could
be passed to `engine.ExportToSst`. Since then, a lot has changed,
and this is no longer needed.

Removing this is important, as it appears to undermine #55461 and
make #66485 difficult.

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
craig bot pushed a commit that referenced this issue Jul 6, 2021
67121: kv: remove EvalContext.Engine r=nvanbenschoten a=nvanbenschoten

This change removes the Engine method from EvalContext. Removing this is
important, as its existence appears to undermine #55461 and make #66485
difficult.

The first place where this was used was in EndTxn's evaluation function.
I don't see any reason for this. In fact, we had a TODO to fix this, which
we could have addressed years ago.

The second place where this was used was in RecomputeStats's evaluation
function. There, it was used for two reasons. First, it was used because
`storage.Batch` used to not provide a consistent view of data. They now
do. It was also used to evade spanset assertions, which this commit
addresses in a better way.

67146: c-deps: remove protobuf r=dt a=dt

Release note: none.

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: David Taylor <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 7, 2022
…tents

Related to cockroachdb#80706.
Related to cockroachdb#66485.

This commit makes a slight modification to `pebbleMVCCScanner` that changes how
it determines whether an intent is uncertain or not. Instead of consulting the
version timestamp in the `MVCCMetadata` struct and comparing that against the
scan's uncertainty interval, the `pebbleMVCCScanner` now scans through the
uncertainty interval and uses the intent's provisional value's timestamp to
determine uncertainty.

The `pebbleMVCCScanner` was actually already doing this scan to compute
uncertainty for other committed values in its uncertainty interval if it found
that the intent was not uncertain. However, after this change, it also relies on
the scan to compute uncertainty for the intent itself. This is safe, because the
calling code knows that the intent has a higher timestamp than the scan, so
there is no risk that the scan adds the provisional value to its result set.

This change is important for two reasons:
1. it avoids the need to store the `local_timestamp` field (introduced in cockroachdb#80706)
   in the `MVCCMetadata` struct.
2. it moves us closer in the direction of using `MVCCMetadata` values (ts=0,
   essentially locks protecting provisional values) to determine read-write
   conflicts but then using versioned provisional values to determine uncertainty.
   Doing so allows us to decompose a KV scan into a separate lock table scan to
   detect read-write conflicts and a MVCC scan to accumulate a result set while
   checking for uncertainty. This will be important for cockroachdb#66485.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 9, 2022
…tents

Related to cockroachdb#80706.
Related to cockroachdb#66485.

This commit makes a slight modification to `pebbleMVCCScanner` that changes how
it determines whether an intent is uncertain or not. Instead of consulting the
version timestamp in the `MVCCMetadata` struct and comparing that against the
scan's uncertainty interval, the `pebbleMVCCScanner` now scans through the
uncertainty interval and uses the intent's provisional value's timestamp to
determine uncertainty.

The `pebbleMVCCScanner` was actually already doing this scan to compute
uncertainty for other committed values in its uncertainty interval if it found
that the intent was not uncertain. However, after this change, it also relies on
the scan to compute uncertainty for the intent itself. This is safe, because the
calling code knows that the intent has a higher timestamp than the scan, so
there is no risk that the scan adds the provisional value to its result set.

This change is important for two reasons:
1. it avoids the need to store the `local_timestamp` field (introduced in cockroachdb#80706)
   in the `MVCCMetadata` struct.
2. it moves us closer in the direction of using `MVCCMetadata` values (ts=0,
   essentially locks protecting provisional values) to determine read-write
   conflicts but then using versioned provisional values to determine uncertainty.
   Doing so allows us to decompose a KV scan into a separate lock table scan to
   detect read-write conflicts and a MVCC scan to accumulate a result set while
   checking for uncertainty. This will be important for cockroachdb#66485.
craig bot pushed a commit that referenced this issue May 9, 2022
81125: storage: use provisional version value to determine uncertainty of intents r=sumeerbhola a=nvanbenschoten

Related to #80706.
Related to #66485.

This commit makes a slight modification to `pebbleMVCCScanner` that changes how it determines whether an intent is uncertain or not. Instead of consulting the version timestamp in the `MVCCMetadata` struct and comparing that against the scan's uncertainty interval, the `pebbleMVCCScanner` now scans through the uncertainty interval and uses the intent's provisional value's timestamp to determine uncertainty.

The `pebbleMVCCScanner` was actually already doing this scan to compute uncertainty for other committed values in its uncertainty interval if it found that the intent was not uncertain. However, after this change, it also relies on the scan to compute uncertainty for the intent itself. This is safe, because the calling code knows that the intent has a higher timestamp than the scan, so there is no risk that the scan adds the provisional value to its result set.

This change is important for two reasons:
1. it avoids the need to store the `local_timestamp` field (introduced in #80706) in the `MVCCMetadata` struct.
2. it moves us closer in the direction of using `MVCCMetadata` values (ts=0, essentially locks protecting provisional values) to determine read-write conflicts but then using versioned provisional values to determine uncertainty. Doing so allows us to decompose a KV scan into a separate lock table scan to detect read-write conflicts and a MVCC scan to accumulate a result set while checking for uncertainty. This will be important for #66485.

Co-authored-by: Nathan VanBenschoten <[email protected]>
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Aug 20, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Aug 22, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Aug 22, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Aug 23, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Aug 24, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Aug 24, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Aug 24, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this issue Oct 4, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
craig bot pushed a commit that referenced this issue Nov 17, 2022
85993: kvserver: allow certain read-only requests to drop latches before evaluation r=aayushshah15 a=aayushshah15

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since #76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves #66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap


Co-authored-by: Aayush Shah <[email protected]>
@craig craig bot closed this as completed in 7ff05f1 Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants