-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: pipeline replicated lock acquisition #117978
Comments
@nvanbenschoten, @miraradeva and I just spoke about this issue at length. I'll capture some of this discussion below. We should be able to use most of our existing infrastructure for write pipelining here. One tricky point is how this issue interacts with parallel commits. A transaction qualifies for a parallel commit if all in-flight intents in the last batch (the one which includes the We discussed two options here:
We settled on approach 2 as the ways to go here. We also discussed #64723, and how it is tangentially related to the issue at hand. The reason we don't pipeline [*] The same logic will apply to The other thing we noted on the call is there might be some trickiness involved with lock re-acquisitions. If we've got a pipelined replicated lock, and its being re-acquired, we want the lock re-acquisition to ensure the pipelined lock replication was successful. We do something similar for intents when a key is being written to multiple times. We noted that replicated lock re-acquisition can sometimes be no-ops, so we'll have to be careful to think through some of those cases. @nvanbenschoten, @miraradeva, feel free to add more details I may have missed here! |
Previously, ranged requests could not be pipelined. However, there is no good reason to not allow them to be pipeliend -- we just have to take extra care to correctly update in-flight writes tracking on the response path. We do so now. As part of this patch, we introduce two new flags -- canPipeline and canParallelCommit. We use these flags to determine whether batches can be pipelined or committed using parallel commits. This is in contrast to before, where we derived this information from other flags (isIntentWrite, !isRange). This wasn't strictly necessary for this change, but helps clean up the concepts. As a consequence of this change, we now have a distinction between requests that can be pipelined and requests that can be part of a batch that can be committed in parallel. Notably, this applies to DeleteRangeRequests -- they can be pipeliend, but not be committed in parallel. That's because we need to have the entire write set upfront when performing a parallel commit, lest we need to perform recovery -- we don't have this for DeleteRange requests. In the future, we'll extend the concept of canPipeline (and !canParallelCommit) to other locking ranged requests as well. In particular, (replicated) locking {,Reverse}ScanRequests who want to pipeline their lock acquisitions. Closes cockroachdb#64723 Informs cockroachdb#117978 Release note: None
Informs cockroachdb#117978. This commit simplifies the logic in txnPipeliner.chainToInFlightWrites which ensures that we only add a single QueryIntent request to the BatchRequest per overlapping in-flight write. In doing so, it eliminates an unnecessary hash-map by piggy-backing tracking onto the existing btree. It also avoids making an assumption that we only track a single in-flight write per key. Release note: None
Informs cockroachdb#117978. This commit simplifies the logic in `inFlightWriteSet` to track in-flight writes with the same key but different sequence numbers separately. This simplification is done to avoid confusion around in-flight writes with different seq nums and/or different strengths (which will be added shortly), and whether any of these in-flight writes should imply that the others are no longer in-flight. Release note: None
Previously, ranged requests could not be pipelined. However, there is no good reason to not allow them to be pipeliend -- we just have to take extra care to correctly update in-flight writes tracking on the response path. We do so now. As part of this patch, we introduce two new flags -- canPipeline and canParallelCommit. We use these flags to determine whether batches can be pipelined or committed using parallel commits. This is in contrast to before, where we derived this information from other flags (isIntentWrite, !isRange). This wasn't strictly necessary for this change, but helps clean up the concepts. As a consequence of this change, we now have a distinction between requests that can be pipelined and requests that can be part of a batch that can be committed in parallel. Notably, this applies to DeleteRangeRequests -- they can be pipeliend, but not be committed in parallel. That's because we need to have the entire write set upfront when performing a parallel commit, lest we need to perform recovery -- we don't have this for DeleteRange requests. In the future, we'll extend the concept of canPipeline (and !canParallelCommit) to other locking ranged requests as well. In particular, (replicated) locking {,Reverse}ScanRequests who want to pipeline their lock acquisitions. Closes cockroachdb#64723 Informs cockroachdb#117978 Release note: None
Previously, ranged requests could not be pipelined. However, there is no good reason to not allow them to be pipeliend -- we just have to take extra care to correctly update in-flight writes tracking on the response path. We do so now. As part of this patch, we introduce two new flags -- canPipeline and canParallelCommit. We use these flags to determine whether batches can be pipelined or committed using parallel commits. This is in contrast to before, where we derived this information from other flags (isIntentWrite, !isRange). This wasn't strictly necessary for this change, but helps clean up the concepts. As a consequence of this change, we now have a distinction between requests that can be pipelined and requests that can be part of a batch that can be committed in parallel. Notably, this applies to DeleteRangeRequests -- they can be pipeliend, but not be committed in parallel. That's because we need to have the entire write set upfront when performing a parallel commit, lest we need to perform recovery -- we don't have this for DeleteRange requests. In the future, we'll extend the concept of canPipeline (and !canParallelCommit) to other locking ranged requests as well. In particular, (replicated) locking {,Reverse}ScanRequests who want to pipeline their lock acquisitions. Closes cockroachdb#64723 Informs cockroachdb#117978 Release note: None
119975: kv: allow DeleteRangeRequests to be pipelined r=nvanbenschoten a=arulajmani Previously, ranged requests could not be pipelined. However, there is no good reason to not allow them to be pipeliend -- we just have to take extra care to correctly update in-flight writes tracking on the response path. We do so now. As part of this patch, we introduce two new flags -- canPipeline and canParallelCommit. We use these flags to determine whether batches can be pipelined or committed using parallel commits. This is in contrast to before, where we derived this information from other flags (isIntentWrite, !isRange). This wasn't strictly necessary for this change, but helps clean up the concepts. As a consequence of this change, we now have a distinction between requests that can be pipelined and requests that can be part of a batch that can be committed in parallel. Notably, this applies to DeleteRangeRequests -- they can be pipeliend, but not be committed in parallel. That's because we need to have the entire write set upfront when performing a parallel commit, lest we need to perform recovery -- we don't have this for DeleteRange requests. In the future, we'll extend the concept of canPipeline (and !canParallelCommit) to other locking ranged requests as well. In particular, (replicated) locking {,Reverse}ScanRequests who want to pipeline their lock acquisitions. Closes #64723 Informs #117978 Release note: None 120812: changefeedccl: deflake TestAlterChangefeedAddTargetsDuringBackfill r=rharding6373 a=andyyang890 This patch deflakes `TestAlterChangefeedAddTargetsDuringBackfill` by increasing the max batch size used for changefeed initial scans. Previously, if we were unlucky, the batch sizes could be too small leading to a timeout while waiting for the initial scan to complete. Fixes #120744 Release note: None 121023: roachtest: add disk-stalled/wal-failover/among-stores test r=sumeerbhola a=jbowens Introduce a new roachtest that simulates disk stalls on one store of a 3-node cluster with two stores per node, and the --wal-failover=among-stores configuration set. The WAL failover configuration should ensure the workload continues uninterrupted until it becomes blocked on disk reads. Informs #119418. Informs cockroachdb/pebble#3230 Epic: CRDB-35401 121073: master: Update pkg/testutils/release/cockroach_releases.yaml r=rail a=github-actions[bot] Update pkg/testutils/release/cockroach_releases.yaml with recent values. Epic: None Release note: None Release justification: test-only updates Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Andy Yang <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: CRL Release bot <[email protected]>
Informs cockroachdb#117978. This commit simplifies the logic in txnPipeliner.chainToInFlightWrites which ensures that we only add a single QueryIntent request to the BatchRequest per overlapping in-flight write. In doing so, it eliminates an unnecessary hash-map by piggy-backing tracking onto the existing btree. It also avoids making an assumption that we only track a single in-flight write per key. Release note: None
Informs cockroachdb#117978. This commit simplifies the logic in `inFlightWriteSet` to track in-flight writes with the same key but different sequence numbers separately. This simplification is done to avoid confusion around in-flight writes with different seq nums and/or different strengths (which will be added shortly), and whether any of these in-flight writes should imply that the others are no longer in-flight. Release note: None
Informs cockroachdb#117978. This commit updates the `txnPipeliner` to use the response keys from `Get`, `Scan`, `ReverseScan`, and `DeleteRange` requests to track pipelined and non-pipelined lock acquisitions / intent writes, instead of assuming that the requests could have left intents anywhere in their request span. This more precise tracking avoid broad ranged intent resolution when more narrow intent resolution is possible. It will also be used by cockroachdb#117978 to track in-flight replicated lock acquisition. This is a WIP. Some tests need to be updated. Release note: None
120865: workload/schemachange: correct error code for referencing dropping enum r=rafiss a=annrpom This patch changes the error code expected for referencing an enum that's in the process of being dropped to the proper one. Epic: none Release note: None 121062: orchestration: released CockroachDB version 23.2.3. Next version: 23.2.4 r=yecs1999 a=cockroach-teamcity Release note: None Epic: None Release justification: non-production (release infra) change. 121065: kv: prep in-flight write tracking for replicated locks r=nvanbenschoten a=nvanbenschoten Informs #117978. This PR includes a pair of simplifications to the `txnPipeliner` to make fewer assumptions about in-flight writes, as a way to prepare for #117978. #### kv: remove chainedKeys hash-map from chainToInFlightWrites, simplify This commit simplifies the logic in txnPipeliner.chainToInFlightWrites which ensures that we only add a single QueryIntent request to the BatchRequest per overlapping in-flight write. In doing so, it eliminates an unnecessary hash-map by piggy-backing tracking onto the existing btree. It also avoids making an assumption that we only track a single in-flight write per key. #### kv: track in-flight writes with same key and different seq nums This commit simplifies the logic in `inFlightWriteSet` to track in-flight writes with the same key but different sequence numbers separately. This simplification is done to avoid confusion around in-flight writes with different seq nums and/or different strengths (which will be added shortly), and whether any of these in-flight writes should imply that the others are no longer in-flight. Release note: None Co-authored-by: Annie Pompa <[email protected]> Co-authored-by: Justin Beaver <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>
Previously, QueryIntent requests were only used to verify whether an intent was successfully evaluated and replicated. This patch extends QueryIntent request to also be able to verify whether a pipelined shared or exclusive lock was successfully replicated or not. Informs cockroachdb#117978 Release note: None
Informs cockroachdb#117978. This commit updates the `txnPipeliner` to use the response keys from `Get`, `Scan`, `ReverseScan`, and `DeleteRange` requests to track pipelined and non-pipelined lock acquisitions / intent writes, instead of assuming that the requests could have left intents anywhere in their request span. This more precise tracking avoid broad ranged intent resolution when more narrow intent resolution is possible. It will also be used by cockroachdb#117978 to track in-flight replicated lock acquisition. Release note: None
121086: kv: use response keys of ranged writes for lock tracking r=nvanbenschoten a=nvanbenschoten Informs #117978. This PR updates the `txnPipeliner` to use the response keys from `Get`, `Scan`, `ReverseScan`, and `DeleteRange` requests to track pipelined and non-pipelined lock acquisitions / intent writes, instead of assuming that the requests could have left intents anywhere in their request span. This more precise tracking avoid broad ranged intent resolution when more narrow intent resolution is possible. It will also be used by #117978 to track in-flight replicated lock acquisition. This is a WIP. Some tests need to be updated. Release note: None 121152: roachtest: adjust WAL failover disk stall roachtest's logging config r=sumeerbhola a=jbowens The previous logging config outputted all logs to stderr too, which is not buffered and could become blocked on the stalled disk. Epic: none Release note: none 121156: sqlstats: fix reset in-memory sql stats on flush r=xinhaoz a=xinhaoz After flushing in-memory sql stats to disk, we reset and prep each app container for reuse by: - Decrementing the per-node fingerprint counter by the size of the app container. This counter prevents us from writing more sql stats when we reach the maximum amount of fingerprints stored in memory. - Clearing the container and reducing its capacity to 1/2. When introducing atomic flushing, we swapped the 2 ops above in the reset step, resulting in the decrement step being a noop. The counter never resets and eventually results in each attempt at writing sql stats to be throttled which then also signals the sql stats flush worker. Epic: none Fixes: #121134 Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Jackson Owens <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]>
Previously, QueryIntent requests were only used to verify whether an intent was successfully evaluated and replicated. This patch extends QueryIntent request to also be able to verify whether a pipelined shared or exclusive lock was successfully replicated or not. Informs cockroachdb#117978 Release note: None
Previously, QueryIntent requests were only used to verify whether an intent was successfully evaluated and replicated. This patch extends QueryIntent request to also be able to verify whether a pipelined shared or exclusive lock was successfully replicated or not. Informs cockroachdb#117978 Release note: None
Previously, QueryIntent requests were only used to verify whether an intent was successfully evaluated and replicated. This patch extends QueryIntent request to also be able to verify whether a pipelined shared or exclusive lock was successfully replicated or not. Informs cockroachdb#117978 Release note: None
Previously, QueryIntent requests were only used to verify whether an intent was successfully evaluated and replicated. This patch extends QueryIntent request to also be able to verify whether a pipelined shared or exclusive lock was successfully replicated or not. Informs cockroachdb#117978 Release note: None
119933: kv: add ability to verify pipelined replicated shared/exclusive locks r=nvanbenschoten a=arulajmani Previously, QueryIntent requests were only used to verify whether an intent was successfully evaluated and replicated. This patch extends QueryIntent request to also be able to verify whether a pipelined shared or exclusive lock was successfully replicated or not. Informs #117978 Release note: None 120787: ui: use max aggregator for commit latency on changefeed dashboard r=rharding6373 a=rharding6373 Previously, the commit latency in the changefeed dashboard in the db console would be aggregated by sum across all nodes. This was confusing for users who might see unexpectedly high commit latency. In this change, we use max aggregation for the commit latency so that users see the max commit latency from all the nodes instead of the sum. This provides more useful observability into changefeed behavior. Fixes: #119246 Fixes: #112947 Epic: None Release note (ui change): The "Commit Latency" chart in the changefeed dashboard now aggregates by max instead of by sum for multi-node changefeeds. This more accurately reflects the amount of time for events to be acknowledged by the downstream sink. 121217: backupccl: fix data race with admission pacer r=msbutler a=aadityasondhi We now use one pacer per fileSSTSink. Fixes #121199. Fixes #121202. Fixes #121201. Fixes #121200. Fixes #121198. Fixes #121197. Fixes #121196. Fixes #121195. Fixes #121194. Fixes #121193. Fixes #121192. Fixes #121191. Fixes #121190. Fixes #121189. Fixes #121188. Fixes #121187. Release note: None 121222: optbuilder: fix recently introduced nil pointer in error case r=yuzefovich a=yuzefovich This commit fixes a recently introduced nil pointer internal error when attempting to CALL not a procedure that is specified not by its name. `ResolvableFunctionReference` might not have `ReferenceByName`, so this commit switches to using `FunctionReference` that is always set. Fixes: #121095. Release note: None Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: rharding6373 <[email protected]> Co-authored-by: Aaditya Sondhi <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>
Fixes cockroachdb#117978. TODO: Needs testing. This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan requests now qualifies to be pipelined. The `txnPipeliner` is updated to track the strength associated with each in-flight write and pass that along to the corresponding QueryIntentRequest. Release note: None
Fixes cockroachdb#117978. This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan requests now qualifies to be pipelined. The `txnPipeliner` is updated to track the strength associated with each in-flight write and pass that along to the corresponding QueryIntentRequest. Release note: None
Fixes cockroachdb#117978. This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan requests now qualifies to be pipelined. The `txnPipeliner` is updated to track the strength associated with each in-flight write and pass that along to the corresponding QueryIntentRequest. Release note: None
Fixes cockroachdb#117978. This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan requests now qualifies to be pipelined. The `txnPipeliner` is updated to track the strength associated with each in-flight write and pass that along to the corresponding QueryIntentRequest. Release note: None
121088: kv: pipeline replicated lock acquisition r=nvanbenschoten a=nvanbenschoten Fixes #117978. Builds upon the foundation laid in [#119975](#119975), [#119933](#119933), [#121065](#121065), and [#121086](#121086). This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through `Get`, `Scan`, and `ReverseScan` requests now qualifies to be pipelined. The `txnPipeliner` is updated to track the strength associated with each in-flight write and pass that along to the corresponding `QueryIntentRequest`. See [benchmark with TPC-C results here](#121088 (comment)). Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
Fixes #117978. This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through Get, Scan, and ReverseScan requests now qualifies to be pipelined. The `txnPipeliner` is updated to track the strength associated with each in-flight write and pass that along to the corresponding QueryIntentRequest. Release note: None
Replicated lock acquisition currently replicates through Raft synchronously. This can impose a high latency on read committed transactions which perform a large number of SELECT FOR UPDATE operations or a large number of FK lookups.
This was discussed in the Read Committed RFC under the Row-Level Locks > Reliability and Enforcement section. That section also outlined a collection of possible optimizations that we could make to avoid this latency penalty.
The most promising of the optimizations is to pipeline this lock acquisition using the using
AsyncConsensus
infrastructure. This will require client-side tracking and a new protocol to validate pipelined lock acquisition. It is analogous to our existing write intent pipelining approach.Jira issue: CRDB-35440
The text was updated successfully, but these errors were encountered: