Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver, batcheval: pin Engine state during read-only command evaluation #76312

Merged

Conversation

aayushshah15
Copy link
Contributor

@aayushshah15 aayushshah15 commented Feb 9, 2022

This commit makes it such that we eagerly pin the engine state of the Reader
created during the evaluation of read-only requests.

Generally, reads will hold latches throughout the course of their evaluation
(particularly, while they do their MVCCScan). Mainly, this commit paves the
way for us to move to a world where we avoid holding latches during the
MVCCScan. Additionally, it also lets us make MVCC garbage collection latchless
as described in #55293.

There are a few notable changes in this patch:

  1. Pinning the engine state eagerly runs into kv: QueryResolvedTimestamp iterator snapshot may be captured before closed timestamp #70974. To resolve this, the
    closed timestamp of the Replica is now captured at the time the EvalContext
    is created, and not during the command evaluation of
    QueryResolvedTimestampRequest.

  2. EvalContext now has a ImmutableEvalContext embedded into it. The
    ImmutableEvalContext is supposed to encapsulate state that must not change
    after the EvalContext is created. The closed timestamp of the replica is part
    of the ImmutableEvalContext.

  3. Replica no longer fully implements the EvalContext interface. Instead,
    it implements everything but GetClosedTimestamp() (which is implemented by
    ImmutableEvalContext instead).

Relates to #55293
Resolves #55461
Resolves #70974

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15
Copy link
Contributor Author

aayushshah15 commented Feb 9, 2022

Hey @nvanbenschoten, does this match your expectation of what we had discussed in the last few weeks?

This may need a few tests but I wanted to confirm that the approach is close to what you had in mind. I confirmed that pinning the engine state eagerly (and doing nothing else) fails TestNonBlockingReadsAtResolvedTimestamp under stress and then confirmed that capturing the closed timestamp eagerly when we create the EvalContext avoids running into #70974.

cc @arulajmani

@aayushshah15 aayushshah15 force-pushed the 20220209_pinIteratorsForEvaluatingReads branch from 63fd520 to 0e68cf0 Compare February 14, 2022 18:12
@aayushshah15
Copy link
Contributor Author

We'd discussed that we should measure the overhead of the pinning we do in this patch, with and without the additional overhead of calling GetCurrentClosedTimestamp() for each read's command evaluation.

name \ ops/sec  master.txt  pin_with_closedts.txt  pin_without_closedts.txt
.                110k ± 0%              104k ± 0%                 105k ± 0%

name \ p50      master.txt  pin_with_closedts.txt  pin_without_closedts.txt
.                1.10 ± 0%              1.20 ± 0%                 1.10 ± 0%

name \ p95      master.txt  pin_with_closedts.txt  pin_without_closedts.txt
.                5.20 ± 0%              5.50 ± 0%                 5.50 ± 0%

name \ p99      master.txt  pin_with_closedts.txt  pin_without_closedts.txt
.                9.40 ± 0%             10.00 ± 0%                10.00 ± 0%

Looks like the pinning alone has a pretty significant cost (I recall @sumeerbhola had been mildly concerned about something like this?). I'll try diffing the CPU profiles between master and with the iterator pinning in this patch and update here.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the pinning alone has a pretty significant cost

Which benchmark is this?
The difference is that before we were creating a pebble.Iterator when creating the first pebbleIterator and then cloning it for future pebbleIterators. So 1 create + (n-1) clonings
While now we are dong 1 create + n clonings. Most of the time n=1. An optimization would be to store a bool in pebbleReadOnly and pebbleBatch about whether the {pebbleReadOnly,pebbleBatch}.iter has been used before and if not, don't bother cloning it.
There is also an existing bug where we may not close the iter:

	// Setting iter to nil is sufficient since it will be closed by one of the
	// subsequent destroy calls.
	p.iter = nil

This is not true, if no one has created an MVCCIterator or EngineIterator. The bool will also help for fixing this.

Reviewed 4 of 14 files at r1, 4 of 6 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @sumeerbhola)


pkg/kv/kvserver/replica_eval_context.go, line 55 at r2 (raw file):

// GetClosedTimestamp implements the EvalContext interface.
func (ec *evalContextImpl) GetClosedTimestamp(_ context.Context) hlc.Timestamp {

this is implementing ImmutatbleEvalContext, yes?
Odd that it is named immutable, but the closedTS field is protected by a mutex. Who modifies it?

@aayushshah15 aayushshah15 force-pushed the 20220209_pinIteratorsForEvaluatingReads branch 2 times, most recently from 1fa3a1d to 099c920 Compare March 23, 2022 17:16
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which benchmark is this?

I should've mentioned it before, but it was just kv95 running for 5 mins.

./cockroach workload run kv --init --tolerate-errors --concurrency=192 --splits=1000 --ramp=1m --duration=5m --read-percent=95 --min-block-bytes=4096 --max-block-bytes=4096 {pgurl:1-5}

I added the bookkeeping you mentioned to avoid redundantly cloning the iterator an extra time. Here are the numbers with that change (I'm not sure if they're directly comparable to the numbers I posted above, since I don't remember what cluster size I ran the previous numbers on):

name \ ops/sec  master.txt  pin-with-closedts.txt  pin-wo-closedts.txt
.                108k ± 2%              112k ± 1%            114k ± 0%

name \ p50      master.txt  pin-with-closedts.txt  pin-wo-closedts.txt
.                1.17 ± 6%              1.10 ± 0%            1.10 ± 0%

name \ p95      master.txt  pin-with-closedts.txt  pin-wo-closedts.txt
.                5.93 ± 2%              5.58 ± 4%            5.50 ± 0%

name \ p99      master.txt  pin-with-closedts.txt  pin-wo-closedts.txt
.                11.7 ± 3%              11.0 ± 0%            11.0 ± 0%

Surprisingly, it doesn't seem like the any of the locking inside Replica.GetCurrentClosedTimestamp() has much of an effect on the workload, cc @nvanbenschoten.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/replica_eval_context.go, line 55 at r2 (raw file):

Previously, sumeerbhola wrote…

this is implementing ImmutatbleEvalContext, yes?
Odd that it is named immutable, but the closedTS field is protected by a mutex. Who modifies it?

You're right, we didn't need the mutex here.

@aayushshah15 aayushshah15 marked this pull request as ready for review March 23, 2022 17:30
@aayushshah15 aayushshah15 requested review from a team as code owners March 23, 2022 17:30
@aayushshah15 aayushshah15 requested review from nvanbenschoten and removed request for a team March 23, 2022 17:34
@aayushshah15
Copy link
Contributor Author

@nvanbenschoten mentioned that we should run the numbers for with/without the call into Replica.GetCurrentClosedTimestamp again for more iterations.

I ran them again for 10 iterations this time, and here they are. old is with the call to GetCurrentClosedTimestamp and new is without:

name  old ops/sec  new ops/sec  delta
.       109k ± 3%    115k ± 1%  +5.23%  (p=0.000 n=10+9)

name  old p50      new p50      delta
.       1.14 ± 5%    1.10 ± 0%    ~     (p=0.087 n=10+10)

name  old p95      new p95      delta
.       5.80 ± 0%    5.50 ± 0%  -5.17%  (p=0.002 n=8+10)

name  old p99      new p99      delta
.       11.4 ± 7%    11.0 ± 0%  -3.17%  (p=0.003 n=10+8)

@aayushshah15 aayushshah15 force-pushed the 20220209_pinIteratorsForEvaluatingReads branch from c0a1980 to b5a3994 Compare March 26, 2022 21:14
@aayushshah15 aayushshah15 force-pushed the 20220209_pinIteratorsForEvaluatingReads branch from b5a3994 to ac8da77 Compare March 26, 2022 21:42
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Mar 26, 2022
In cockroachdb#76312, we're making it such that a replica's closed timestamp is computed
during the creation of the batch request's evaluation context. This is to
ensure that a replica's closed timestamp is captured before we pin the state of
its storage engine (since otherwise, the closed timestamp could be _newer_ than
the state of the engine the command is evaluating over). See: cockroachdb#70974.

However, computing the replica's current closed timestamp on every command
evaluation can be expensive due to the mutual exclusion involved (it requires
that we hold `Replica.mu` as well as a few locks inside the closed timestamp
side transport). Since only the `QueryResolvedTimestampRequest` and
`SubsumeRequest` need to read a replica's closed timestamp, this commit makes
it such that we elide this work for every other request.

Release note: None
@aayushshah15 aayushshah15 force-pushed the 20220209_pinIteratorsForEvaluatingReads branch from ac8da77 to 5795da6 Compare March 26, 2022 23:31
@nvanbenschoten
Copy link
Member

Thanks for gathering those extra numbers. Is the only difference between old and new the call to GetCurrentClosedTimestamp ? Put differently, is new still pinning iterators? If so, that indicates that calling GetCurrentClosedTimestamp indiscriminately is too expensive, right?

@aayushshah15
Copy link
Contributor Author

aayushshah15 commented Mar 28, 2022

Is the only difference between old and new the call to GetCurrentClosedTimestamp ? Put differently, is new still pinning iterators?

Yep.

If so, that indicates that calling GetCurrentClosedTimestamp indiscriminately is too expensive, right?

Yes, I've added a commit to this PR that makes it such that most requests won't call into GetCurrentClosedTimestamp() anymore. PTAL. For the sake of completion, here are the numbers comparing the first commit of this PR (i.e. with pinning and with the call into GetCurrentClosedTimestamp() on every request) against the last commit of this PR.

> benchstat ../benchlogs/pin-with-closedts.txt ../benchlogs/pin-elide-closedts.txt
name  old ops/sec  new ops/sec  delta
.       109k ± 3%    117k ± 2%  +7.54%  (p=0.000 n=10+10)

name  old p50      new p50      delta
.       1.14 ± 5%    1.10 ± 0%    ~     (p=0.065 n=10+9)

name  old p95      new p95      delta
.       5.80 ± 0%    5.41 ± 4%  -6.72%  (p=0.000 n=8+10)

name  old p99      new p99      delta
.       11.4 ± 7%    10.5 ± 0%  -7.57%  (p=0.000 n=10+8)

@nvanbenschoten
Copy link
Member

For even more completeness, do you have this direct comparison between master and the last commit of this PR?

@aayushshah15
Copy link
Contributor Author

For even more completeness, do you have this direct comparison between master and the last commit of this PR?

Here is master against the last commit in this PR, not sure what to make of the slight perf improvement:

benchstat  ../benchlogs/master.txt ../benchlogs/pin-elide-closedts.txt
name  old ops/sec  new ops/sec  delta
.       114k ± 3%    117k ± 2%  +3.31%  (p=0.001 n=10+10)

name  old p50      new p50      delta
.       1.10 ± 0%    1.10 ± 0%    ~     (all equal)

name  old p95      new p95      delta
.       5.50 ± 0%    5.41 ± 4%    ~     (p=0.071 n=6+10)

name  old p99      new p99      delta
.       11.0 ± 0%    10.5 ± 0%  -4.55%  (p=0.000 n=8+8)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 14 files at r1, 1 of 6 files at r2, 11 of 15 files at r3, 4 of 4 files at r4, 4 of 4 files at r5, 7 of 7 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @sumeerbhola)


pkg/kv/kvserver/replica_eval_context.go, line 39 at r4 (raw file):

func newEvalContextImpl(r *Replica) *evalContextImpl {
	return &evalContextImpl{

Should we pool these objects to avoid the new heap allocation? Their lifetime should be pretty well defined.


pkg/kv/kvserver/replica_eval_context.go, line 41 at r4 (raw file):

	return &evalContextImpl{
		Replica:  r,
		closedTS: r.GetCurrentClosedTimestamp(context.TODO()),

Let's plumb a ctx to here.


pkg/kv/kvserver/batcheval/eval_context.go, line 142 at r4 (raw file):

}

// ImmutableEvalContext is like EvalContext, but it encapsulates state that

The idea was to pass only this interface to read-only command evaluation functions, right? To do that, we may need to pull the EvalContext out of CommandArgs. Even if that is the plan, we can do it in a separate PR.


pkg/kv/kvserver/batcheval/eval_context.go, line 147 at r4 (raw file):

	// GetClosedTimestamp returns the current closed timestamp on the range.
	// It is expected that a caller will have performed some action (either
	// calling RevokeLease or WatchForMerge) to freeze further progression of

This comment reminds me that GetClosedTimestamp is also used in during range subsumption and lease transfers. In those cases, we want the latest closed timestamp, not the closed timestamp at the time of the storage snapshot. The ordering is very important for those cases, as they need the closed timestamp to be computed after their call to WaitForMerge or RevokeLease. Do you have thoughts on what we should do here? Maybe expose two methods?


pkg/storage/pebble.go, line 1681 at r4 (raw file):

	iter       cloneableIter
	unused     bool

nit: should this be called iterUnused? Is it clear what it's referring to? Same question below in pebbleBatch.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 15 files at r3, 3 of 4 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, and @sumeerbhola)


pkg/storage/pebble.go, line 1681 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: should this be called iterUnused? Is it clear what it's referring to? Same question below in pebbleBatch.

+1 to iterUnused

aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Aug 20, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Aug 22, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Aug 22, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Aug 23, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Aug 24, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Aug 24, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Aug 24, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Oct 4, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Nov 17, 2022
…luation

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since cockroachdb#76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves cockroachdb#66485
craig bot pushed a commit that referenced this pull request Nov 17, 2022
85993: kvserver: allow certain read-only requests to drop latches before evaluation r=aayushshah15 a=aayushshah15

This commit introduces a change to the way certain types of read-only requests
are evaluated. Traditionally, read-only requests have held their latches
throughout their execution. This commit allows certain qualifying reads to be
able to release their latches earlier.

At a high level, reads may attempt to resolve all conflicts upfront by
performing a sort of "validation" phase before they perform their MVCC scan.
This validation phase performs a scan of the lock table keyspace in order to
find any conflicting intents that may need to be resolved before the actual
evaluation of the request over the MVCC keyspace. If no conflicting intents are
found, then (since #76312), the
request is guaranteed to be fully isolated against all other concurrent
requests and can be allowed to release its latches at this point. This allows
the actual evaluation of the read (over the MVCC part of the keyspace) to
proceed without latches being held, which is the main motivation of this work.
This validation phase could be thought of as an extension to the validation
that the concurrency manager already performs when requests are sequenced
through it, by trying to detect any conflicting intents that have already been
pulled into the in-memory lock table.

Additionally, for certain types of requests that can drop their latches early,
and do not need to access the `IntentHistory` for any of their parent txn's
intents, this commit attempts to make their MVCC scan cheaper by eliminating
the need for an `intentInterleavingIterator`. This is enabled by the
observation that once the validation phase is complete, the only remaining
intents in the read's declared span must be intents belonging to the reader's
transaction. So if the reader doesn't need to read an intent that isn't the
latest intent on a key, then it doesn't need access to the key's
`IntentHistory` (which lives in the lock-table keyspace), and doesn't need to
use an `intentInterleavingIterator`.

Release note (performance improvement): certain types of reads will now have a
far smaller contention footprint with conflicting concurrent writers

Resolves #66485

Release justification: high benefit change to existing functionality, part of
22.2 roadmap


Co-authored-by: Aayush Shah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants