Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: pipeline replicated lock acquisition #117978

Closed
nvanbenschoten opened this issue Jan 19, 2024 · 1 comment · Fixed by #121088
Closed

kv: pipeline replicated lock acquisition #117978

nvanbenschoten opened this issue Jan 19, 2024 · 1 comment · Fixed by #121088
Assignees
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-read-committed Related to the introduction of Read Committed branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 C-performance Perf of queries or internals. Solution not expected to change functional behavior. GA-blocker T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Jan 19, 2024

Replicated lock acquisition currently replicates through Raft synchronously. This can impose a high latency on read committed transactions which perform a large number of SELECT FOR UPDATE operations or a large number of FK lookups.

This was discussed in the Read Committed RFC under the Row-Level Locks > Reliability and Enforcement section. That section also outlined a collection of possible optimizations that we could make to avoid this latency penalty.

The most promising of the optimizations is to pipeline this lock acquisition using the using AsyncConsensus infrastructure. This will require client-side tracking and a new protocol to validate pipelined lock acquisition. It is analogous to our existing write intent pipelining approach.

Jira issue: CRDB-35440

@nvanbenschoten nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-transactions Relating to MVCC and the transactional model. T-kv KV Team A-read-committed Related to the introduction of Read Committed labels Jan 19, 2024
@arulajmani
Copy link
Collaborator

@nvanbenschoten, @miraradeva and I just spoke about this issue at length. I'll capture some of this discussion below.

We should be able to use most of our existing infrastructure for write pipelining here. One tricky point is how this issue interacts with parallel commits. A transaction qualifies for a parallel commit if all in-flight intents in the last batch (the one which includes the EndTxn request) successfully replicate at a timestamp at or below the transaction's commit timestamp. When we kick off recovery for a STAGING transaction, and we find an intent did not successfully evaluate (yet), we also bump the timestamp cache to ensure it can't land in the future.

We discussed two options here:

  1. We would need something similar for locks -- that is, a way to check whether any locking requests in the last batch were successfully able to acquire replicated locks at or below the transaction's commit timestamp. This should be doable, given we store the entire TxnMeta for the lock. Additionally, if we found that a lock hadn't successfully been acquired, we would need some way to ensure it doesn't happen later. This would probably involve bumping the timestamp cache for replicated locking requests. The room had mixed to negative feelings about this special case handling for replicated locks.

  2. Disallow parallel commits entirely if there are locking requests in the EndTxn batch. This should be fine given we never expect SQL to construct a batch which contains both a locking read and EndTxn in it. Crucially, it allows us to side-step the concerns about replicated lock acquisition attempts coming along after the transaction has been recovered. This is because we know they've already been evaluated and proposed to Raft -- they can't be in-flight concurrent to the EndTxn batch. We're only validating whether the Raft proposal succeeded or not.

We settled on approach 2 as the ways to go here.


We also discussed #64723, and how it is tangentially related to the issue at hand. The reason we don't pipeline DeleteRange requests is because we don't know all the keys that'll be deleted before evaluation[*]. This means we can't construct the parallel commit condition if the last batch also includes a DeleteRange request. However, we're being overly aggressive here by disallowing pipelining of DeleteRange requests entirely -- if we follow the logic from point 2 above, we could pipeline batches containing DeleteRange requests as long as we don't try to perform a parallel commit if the last batch contains a DeleteRange request.

[*] The same logic will apply to Scan and ReverseScan requests, where we won't know all the locks that will be acquired. This means we're not gaining too much by trying to follow the approach described in option 1 above.


The other thing we noted on the call is there might be some trickiness involved with lock re-acquisitions. If we've got a pipelined replicated lock, and its being re-acquired, we want the lock re-acquisition to ensure the pipelined lock replication was successful. We do something similar for intents when a key is being written to multiple times.

We noted that replicated lock re-acquisition can sometimes be no-ops, so we'll have to be careful to think through some of those cases.

@nvanbenschoten, @miraradeva, feel free to add more details I may have missed here!

arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 5, 2024
Previously, ranged requests could not be pipelined. However, there is no
good reason to not allow them to be pipeliend -- we just have to take
extra care to correctly update in-flight writes tracking on the response
path. We do so now.

As part of this patch, we introduce two new flags -- canPipeline and
canParallelCommit. We use these flags to determine whether batches can
be pipelined or committed using parallel commits. This is in contrast to
before, where we derived this information from other flags
(isIntentWrite, !isRange). This wasn't strictly necessary for this
change, but helps clean up the concepts.

As a consequence of this change, we now have a distinction between
requests that can be pipelined and requests that can be part of a batch
that can be committed in parallel. Notably, this applies to
DeleteRangeRequests -- they can be pipeliend, but not be committed in
parallel. That's because we need to have the entire write set upfront
when performing a parallel commit, lest we need to perform recovery --
we don't have this for DeleteRange requests.

In the future, we'll extend the concept of canPipeline
(and !canParallelCommit) to other locking ranged requests as well. In
particular, (replicated) locking {,Reverse}ScanRequests who want to
pipeline their lock acquisitions.

Closes cockroachdb#64723
Informs cockroachdb#117978

Release note: None
@nvanbenschoten nvanbenschoten added GA-blocker branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 labels Mar 6, 2024
@arulajmani arulajmani self-assigned this Mar 13, 2024
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 25, 2024
Informs cockroachdb#117978.

This commit simplifies the logic in txnPipeliner.chainToInFlightWrites
which ensures that we only add a single QueryIntent request to the
BatchRequest per overlapping in-flight write. In doing so, it eliminates
an unnecessary hash-map by piggy-backing tracking onto the existing btree.
It also avoids making an assumption that we only track a single in-flight
write per key.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 25, 2024
Informs cockroachdb#117978.

This commit simplifies the logic in `inFlightWriteSet` to track
in-flight writes with the same key but different sequence numbers
separately. This simplification is done to avoid confusion around
in-flight writes with different seq nums and/or different strengths
(which will be added shortly), and whether any of these in-flight
writes should imply that the others are no longer in-flight.

Release note: None
@nvanbenschoten nvanbenschoten self-assigned this Mar 25, 2024
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 25, 2024
Previously, ranged requests could not be pipelined. However, there is no
good reason to not allow them to be pipeliend -- we just have to take
extra care to correctly update in-flight writes tracking on the response
path. We do so now.

As part of this patch, we introduce two new flags -- canPipeline and
canParallelCommit. We use these flags to determine whether batches can
be pipelined or committed using parallel commits. This is in contrast to
before, where we derived this information from other flags
(isIntentWrite, !isRange). This wasn't strictly necessary for this
change, but helps clean up the concepts.

As a consequence of this change, we now have a distinction between
requests that can be pipelined and requests that can be part of a batch
that can be committed in parallel. Notably, this applies to
DeleteRangeRequests -- they can be pipeliend, but not be committed in
parallel. That's because we need to have the entire write set upfront
when performing a parallel commit, lest we need to perform recovery --
we don't have this for DeleteRange requests.

In the future, we'll extend the concept of canPipeline
(and !canParallelCommit) to other locking ranged requests as well. In
particular, (replicated) locking {,Reverse}ScanRequests who want to
pipeline their lock acquisitions.

Closes cockroachdb#64723
Informs cockroachdb#117978

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 25, 2024
Previously, ranged requests could not be pipelined. However, there is no
good reason to not allow them to be pipeliend -- we just have to take
extra care to correctly update in-flight writes tracking on the response
path. We do so now.

As part of this patch, we introduce two new flags -- canPipeline and
canParallelCommit. We use these flags to determine whether batches can
be pipelined or committed using parallel commits. This is in contrast to
before, where we derived this information from other flags
(isIntentWrite, !isRange). This wasn't strictly necessary for this
change, but helps clean up the concepts.

As a consequence of this change, we now have a distinction between
requests that can be pipelined and requests that can be part of a batch
that can be committed in parallel. Notably, this applies to
DeleteRangeRequests -- they can be pipeliend, but not be committed in
parallel. That's because we need to have the entire write set upfront
when performing a parallel commit, lest we need to perform recovery --
we don't have this for DeleteRange requests.

In the future, we'll extend the concept of canPipeline
(and !canParallelCommit) to other locking ranged requests as well. In
particular, (replicated) locking {,Reverse}ScanRequests who want to
pipeline their lock acquisitions.

Closes cockroachdb#64723
Informs cockroachdb#117978

Release note: None
craig bot pushed a commit that referenced this issue Mar 26, 2024
119975: kv: allow DeleteRangeRequests to be pipelined  r=nvanbenschoten a=arulajmani

Previously, ranged requests could not be pipelined. However, there is no
good reason to not allow them to be pipeliend -- we just have to take
extra care to correctly update in-flight writes tracking on the response
path. We do so now.

As part of this patch, we introduce two new flags -- canPipeline and
canParallelCommit. We use these flags to determine whether batches can
be pipelined or committed using parallel commits. This is in contrast to
before, where we derived this information from other flags
(isIntentWrite, !isRange). This wasn't strictly necessary for this
change, but helps clean up the concepts.

As a consequence of this change, we now have a distinction between
requests that can be pipelined and requests that can be part of a batch
that can be committed in parallel. Notably, this applies to
DeleteRangeRequests -- they can be pipeliend, but not be committed in
parallel. That's because we need to have the entire write set upfront
when performing a parallel commit, lest we need to perform recovery --
we don't have this for DeleteRange requests.

In the future, we'll extend the concept of canPipeline
(and !canParallelCommit) to other locking ranged requests as well. In
particular, (replicated) locking {,Reverse}ScanRequests who want to
pipeline their lock acquisitions.

Closes #64723
Informs #117978

Release note: None

120812:  changefeedccl: deflake TestAlterChangefeedAddTargetsDuringBackfill r=rharding6373 a=andyyang890

This patch deflakes `TestAlterChangefeedAddTargetsDuringBackfill`
by increasing the max batch size used for changefeed initial scans.
Previously, if we were unlucky, the batch sizes could be too small
leading to a timeout while waiting for the initial scan to complete.

Fixes #120744

Release note: None

121023: roachtest: add disk-stalled/wal-failover/among-stores test r=sumeerbhola a=jbowens

Introduce a new roachtest that simulates disk stalls on one store of a 3-node cluster with two stores per node, and the --wal-failover=among-stores configuration set. The WAL failover configuration should ensure the workload continues uninterrupted until it becomes blocked on disk reads.

Informs #119418.
Informs cockroachdb/pebble#3230
Epic: CRDB-35401

121073: master: Update pkg/testutils/release/cockroach_releases.yaml r=rail a=github-actions[bot]

Update pkg/testutils/release/cockroach_releases.yaml with recent values.

Epic: None
Release note: None
Release justification: test-only updates

Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: CRL Release bot <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 26, 2024
Informs cockroachdb#117978.

This commit simplifies the logic in txnPipeliner.chainToInFlightWrites
which ensures that we only add a single QueryIntent request to the
BatchRequest per overlapping in-flight write. In doing so, it eliminates
an unnecessary hash-map by piggy-backing tracking onto the existing btree.
It also avoids making an assumption that we only track a single in-flight
write per key.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 26, 2024
Informs cockroachdb#117978.

This commit simplifies the logic in `inFlightWriteSet` to track
in-flight writes with the same key but different sequence numbers
separately. This simplification is done to avoid confusion around
in-flight writes with different seq nums and/or different strengths
(which will be added shortly), and whether any of these in-flight
writes should imply that the others are no longer in-flight.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 26, 2024
Informs cockroachdb#117978.

This commit updates the `txnPipeliner` to use the response keys from
`Get`, `Scan`, `ReverseScan`, and `DeleteRange` requests to track
pipelined and non-pipelined lock acquisitions / intent writes, instead
of assuming that the requests could have left intents anywhere in their
request span. This more precise tracking avoid broad ranged intent
resolution when more narrow intent resolution is possible. It will also
be used by cockroachdb#117978 to track in-flight replicated lock acquisition.

This is a WIP. Some tests need to be updated.

Release note: None
craig bot pushed a commit that referenced this issue Mar 26, 2024
120865: workload/schemachange: correct error code for referencing dropping enum r=rafiss a=annrpom

This patch changes the error code expected for referencing an enum that's in the process of being dropped to the proper one.

Epic: none

Release note: None

121062: orchestration: released CockroachDB version 23.2.3. Next version: 23.2.4 r=yecs1999 a=cockroach-teamcity

Release note: None
Epic: None
Release justification: non-production (release infra) change.


121065: kv: prep in-flight write tracking for replicated locks r=nvanbenschoten a=nvanbenschoten

Informs #117978.

This PR includes a pair of simplifications to the `txnPipeliner` to make fewer assumptions about in-flight writes, as a way to prepare for #117978.

#### kv: remove chainedKeys hash-map from chainToInFlightWrites, simplify

This commit simplifies the logic in txnPipeliner.chainToInFlightWrites which ensures that we only add a single QueryIntent request to the BatchRequest per overlapping in-flight write. In doing so, it eliminates an unnecessary hash-map by piggy-backing tracking onto the existing btree. It also avoids making an assumption that we only track a single in-flight write per key.

#### kv: track in-flight writes with same key and different seq nums

This commit simplifies the logic in `inFlightWriteSet` to track in-flight writes with the same key but different sequence numbers separately. This simplification is done to avoid confusion around in-flight writes with different seq nums and/or different strengths (which will be added shortly), and whether any of these in-flight writes should imply that the others are no longer in-flight.

Release note: None

Co-authored-by: Annie Pompa <[email protected]>
Co-authored-by: Justin Beaver <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 27, 2024
Previously, QueryIntent requests were only used to verify whether an
intent was successfully evaluated and replicated. This patch extends
QueryIntent request to also be able to verify whether a pipelined
shared or exclusive lock was successfully replicated or not.

Informs cockroachdb#117978

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 27, 2024
Informs cockroachdb#117978.

This commit updates the `txnPipeliner` to use the response keys from
`Get`, `Scan`, `ReverseScan`, and `DeleteRange` requests to track
pipelined and non-pipelined lock acquisitions / intent writes, instead
of assuming that the requests could have left intents anywhere in their
request span. This more precise tracking avoid broad ranged intent
resolution when more narrow intent resolution is possible. It will also
be used by cockroachdb#117978 to track in-flight replicated lock acquisition.

Release note: None
craig bot pushed a commit that referenced this issue Mar 27, 2024
121086: kv: use response keys of ranged writes for lock tracking r=nvanbenschoten a=nvanbenschoten

Informs #117978.

This PR updates the `txnPipeliner` to use the response keys from `Get`, `Scan`, `ReverseScan`, and `DeleteRange` requests to track pipelined and non-pipelined lock acquisitions / intent writes, instead of assuming that the requests could have left intents anywhere in their request span. This more precise tracking avoid broad ranged intent resolution when more narrow intent resolution is possible. It will also be used by #117978 to track in-flight replicated lock acquisition.

This is a WIP. Some tests need to be updated.

Release note: None

121152: roachtest: adjust WAL failover disk stall roachtest's logging config r=sumeerbhola a=jbowens

The previous logging config outputted all logs to stderr too, which is not buffered and could become blocked on the stalled disk.

Epic: none
Release note: none

121156: sqlstats: fix reset in-memory sql stats on flush r=xinhaoz a=xinhaoz

After flushing in-memory sql stats to disk, we reset and prep each app container for reuse by:
- Decrementing the per-node fingerprint counter by the size of the app container. This counter prevents us from writing more sql stats when we reach the maximum amount of fingerprints stored in memory.
- Clearing the container and reducing its capacity to 1/2.

When introducing atomic flushing, we swapped the 2 ops above in the reset step, resulting in the decrement step being a noop. The counter never resets and eventually results in each attempt at writing sql stats to be throttled which then also signals the sql stats flush worker.

Epic: none
Fixes: #121134

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 27, 2024
Previously, QueryIntent requests were only used to verify whether an
intent was successfully evaluated and replicated. This patch extends
QueryIntent request to also be able to verify whether a pipelined
shared or exclusive lock was successfully replicated or not.

Informs cockroachdb#117978

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 27, 2024
Previously, QueryIntent requests were only used to verify whether an
intent was successfully evaluated and replicated. This patch extends
QueryIntent request to also be able to verify whether a pipelined
shared or exclusive lock was successfully replicated or not.

Informs cockroachdb#117978

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 27, 2024
Previously, QueryIntent requests were only used to verify whether an
intent was successfully evaluated and replicated. This patch extends
QueryIntent request to also be able to verify whether a pipelined
shared or exclusive lock was successfully replicated or not.

Informs cockroachdb#117978

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this issue Mar 27, 2024
Previously, QueryIntent requests were only used to verify whether an
intent was successfully evaluated and replicated. This patch extends
QueryIntent request to also be able to verify whether a pipelined
shared or exclusive lock was successfully replicated or not.

Informs cockroachdb#117978

Release note: None
craig bot pushed a commit that referenced this issue Mar 27, 2024
119933: kv: add ability to verify pipelined replicated shared/exclusive locks  r=nvanbenschoten a=arulajmani

Previously, QueryIntent requests were only used to verify whether an
intent was successfully evaluated and replicated. This patch extends
QueryIntent request to also be able to verify whether a pipelined
shared or exclusive lock was successfully replicated or not.

Informs #117978

Release note: None

120787: ui: use max aggregator for commit latency on changefeed dashboard r=rharding6373 a=rharding6373

Previously, the commit latency in the changefeed dashboard in the db console would be aggregated by sum across all nodes. This was confusing for users who might see unexpectedly high commit latency.

In this change, we use max aggregation for the commit latency so that users see the max commit latency from all the nodes instead of the sum. This provides more useful observability into changefeed behavior.

Fixes: #119246
Fixes: #112947
Epic: None

Release note (ui change): The "Commit Latency" chart in the changefeed dashboard now aggregates by max instead of by sum for multi-node changefeeds. This more accurately reflects the amount of time for events to be acknowledged by the downstream sink.

121217: backupccl: fix data race with admission pacer r=msbutler a=aadityasondhi

We now use one pacer per fileSSTSink.

Fixes #121199.
Fixes #121202.
Fixes #121201.
Fixes #121200.
Fixes #121198.
Fixes #121197.
Fixes #121196.
Fixes #121195.
Fixes #121194.
Fixes #121193.
Fixes #121192.
Fixes #121191.
Fixes #121190.
Fixes #121189.
Fixes #121188.
Fixes #121187.

Release note: None

121222: optbuilder: fix recently introduced nil pointer in error case r=yuzefovich a=yuzefovich

This commit fixes a recently introduced nil pointer internal error when attempting to CALL not a procedure that is specified not by its name. `ResolvableFunctionReference` might not have `ReferenceByName`, so this commit switches to using `FunctionReference` that is always set.

Fixes: #121095.

Release note: None

Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: Aaditya Sondhi <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 28, 2024
Fixes cockroachdb#117978.

TODO: Needs testing.

This commit completes the client-side handling of replicated lock acquisition
pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan
requests now qualifies to be pipelined. The `txnPipeliner` is updated to track
the strength associated with each in-flight write and pass that along to the
corresponding QueryIntentRequest.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Mar 29, 2024
Fixes cockroachdb#117978.

This commit completes the client-side handling of replicated lock acquisition
pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan
requests now qualifies to be pipelined. The `txnPipeliner` is updated to track
the strength associated with each in-flight write and pass that along to the
corresponding QueryIntentRequest.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 1, 2024
Fixes cockroachdb#117978.

This commit completes the client-side handling of replicated lock acquisition
pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan
requests now qualifies to be pipelined. The `txnPipeliner` is updated to track
the strength associated with each in-flight write and pass that along to the
corresponding QueryIntentRequest.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 2, 2024
Fixes cockroachdb#117978.

This commit completes the client-side handling of replicated lock acquisition
pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan
requests now qualifies to be pipelined. The `txnPipeliner` is updated to track
the strength associated with each in-flight write and pass that along to the
corresponding QueryIntentRequest.

Release note: None
craig bot pushed a commit that referenced this issue Apr 3, 2024
121088: kv: pipeline replicated lock acquisition r=nvanbenschoten a=nvanbenschoten

Fixes #117978.

Builds upon the foundation laid in [#119975](#119975), [#119933](#119933), [#121065](#121065), and [#121086](#121086).

This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through `Get`, `Scan`, and `ReverseScan` requests now qualifies to be pipelined. The `txnPipeliner` is updated to track the strength associated with each in-flight write and pass that along to the corresponding `QueryIntentRequest`.

See [benchmark with TPC-C results here](#121088 (comment)).

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
@craig craig bot closed this as completed in af394f7 Apr 3, 2024
nvanbenschoten added a commit that referenced this issue Apr 4, 2024
Fixes #117978.

This commit completes the client-side handling of replicated lock acquisition
pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan
requests now qualifies to be pipelined. The `txnPipeliner` is updated to track
the strength associated with each in-flight write and pass that along to the
corresponding QueryIntentRequest.

Release note: None
@github-project-automation github-project-automation bot moved this to Incoming in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. A-read-committed Related to the introduction of Read Committed branch-release-24.1 Used to mark GA and release blockers, technical advisories, and bugs for 24.1 C-performance Perf of queries or internals. Solution not expected to change functional behavior. GA-blocker T-kv KV Team
Projects
No open projects
Status: Incoming
Development

Successfully merging a pull request may close this issue.

2 participants