kv: perform PushTxn(PUSH_TIMESTAMP) without Raft consensus #95911

nvanbenschoten · 2023-01-26T06:25:18Z

This PR contains a sequence of three commits that combine to resolve #94728.

check txn push marker on commit, not txn record creation

The first commit moves the point when a transaction checks the timestamp cache for its minimum commit timestamp from transaction record creation time back to commit time. This allows us to use the timestamp cache to communicate successful PushTxn(TIMESTAMP) to a pushee with an existing record without rewriting its transaction record.

For details, see the changes to the state machine diagram attached to Replica.CanCreateTxnRecord for a visual depiction of this change.

always promote PUSH_TIMESTAMP to PUSH_ABORT on failed staging record

The second commit simplifies logic in PushTxnRequest that promoted a PUSH_TIMESTAMP to a PUSH_ABORT when it
found a STAGING transaction record that it knew to be part of a failed parallel commit attempt.

The logic tried to be smart and minimize the cases where it needed to promote a PUSH_TIMESTAMP to a PUSH_ABORT. It was avoiding doing so if it had previously found an intent with a higher epoch. In practice, this optimization doesn't seem to matter. It was also making logic in a following commit harder to write because it was preserving cases where a PUSH_TIMESTAMP would succeed against a STAGING transaction record. We don't want to support such state transitions, so eliminate them.

don't rewrite txn record on PushTxn(TIMESTAMP)

With the previous two commits, transactions will check the timestamp cache before committing to determine whether they have had their commit timestamp pushed. The final commit exploits this to avoid ever rewriting a transaction's record on a timestamp push. Instead, the timestamp cache is used, regardless of whether the record already existed or not. Doing so avoids consensus.

Release note: None

cockroach-teamcity · 2023-01-26T06:25:28Z

This change is

arulajmani

Reviewed 9 of 9 files at r1, 3 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

-- commits line 6 at r1:
nit: "back to"

pkg/kv/kvserver/replica_tscache.go line 410 at r1 (raw file):

//     PushTxn(ABORT). As a result of this mechanism, a pusher transaction never
//     needs to create the transaction record for contending transactions that
//     it forms write-write conflicts with.

The phrasing here leaves the handling of write-read conflicts ambiguous. Consider adding a reference to MinTxnCommitTS here, so that readers know where to look if they're curious about Txn record creation semantics when there's a write-read conflict.

pkg/kv/kvserver/replica_tscache.go line 449 at r1 (raw file):

//	                   v            | v    EndTxn(!STAGING)        v  v            |
//	               +--------------------+  or PushTxn(ABORT)     +--------------------+
//	               |                    |  then: txn.ts -> v1    |                    |

txn.ts -> v1 should only happens when EndTxn(!Staging), right? It doesn't make sense in the PushTxn(ABORT) case -- should we split these cases out now?

This commit moves the point when a transaction checks the timestamp cache for its minimum commit timestamp from transaction record creation time back to commit time. This allows us to use the timestamp cache to communicate successful PushTxn(TIMESTAMP) to a pushee with an existing record without re-writing its transaction record. See the changes to the state machine diagram attached to `Replica.CanCreateTxnRecord` for a visual depiction of this change.

This commit simplifies logic in PushTxnRequest that promoted a PUSH_TIMESTAMP to a PUSH_ABORT when it found a STAGING transaction record which it knew to be part of a failed parallel commit attempt. The logic was trying to be smart and minimize the cases where it needed to promote a PUSH_TIMESTAMP to a PUSH_ABORT. It was avoiding doing so if it had previously found an intent with a higher epoch. In practice, this optimization doesn't seem to matter. It was also making logic in a following commit harder to write because it was preserving cases where a PUSH_TIMESTAMP would succeed against a STAGING transaction record. We don't want to support such state transitions, so eliminate them.

Fixes cockroachdb#94728. With the previous commit, transactions will check the timestamp cache before committing to determine whether they have had their commit timestamp pushed. This commit exploits this to avoid ever rewriting a transaction's record on a timestamp push. Instead, the timestamp cache is used, regardless of whether the record already existed or not. Doing so avoids consensus.

nvanbenschoten

TFTR!

bors r=arulajmani

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)

-- commits line 6 at r1:

Previously, arulajmani (Arul Ajmani) wrote…

nit: "back to"

Done.

pkg/kv/kvserver/replica_tscache.go line 410 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

The phrasing here leaves the handling of write-read conflicts ambiguous. Consider adding a reference to MinTxnCommitTS here, so that readers know where to look if they're curious about Txn record creation semantics when there's a write-read conflict.

Good idea, done.

pkg/kv/kvserver/replica_tscache.go line 449 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

txn.ts -> v1 should only happens when EndTxn(!Staging), right? It doesn't make sense in the PushTxn(ABORT) case -- should we split these cases out now?

Yeah, you're correct about that, good catch. We could forward the record's WriteTimestamp to MinTxnCommitTS() when aborting the txn on a PushTxn(ABORT), but we don't because the write timestamp is meaningless once the txn is aborted. I'm going to leave this as is out of a desire not to make this diagram any more crowded and harder to read than it already is while acknowledging that it is not entirely accurate in this specific case.

craig · 2023-02-02T01:51:56Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2023-02-02T02:29:30Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2023-02-02T05:22:45Z

Build succeeded:

Bazel Essential CI (Cockroach)

nvanbenschoten requested a review from arulajmani January 26, 2023 06:25

nvanbenschoten requested a review from a team as a code owner January 26, 2023 06:25

nvanbenschoten force-pushed the nvanbenschoten/pushNoConsensus branch 2 times, most recently from ea417d0 to 9128ec7 Compare January 26, 2023 22:09

arulajmani approved these changes Jan 31, 2023

View reviewed changes

nvanbenschoten added 3 commits February 1, 2023 18:56

nvanbenschoten force-pushed the nvanbenschoten/pushNoConsensus branch from 9128ec7 to 3dbb321 Compare February 1, 2023 23:57

nvanbenschoten commented Feb 1, 2023

View reviewed changes

craig bot merged commit 22244a7 into cockroachdb:master Feb 2, 2023

nvanbenschoten deleted the nvanbenschoten/pushNoConsensus branch February 2, 2023 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: perform PushTxn(PUSH_TIMESTAMP) without Raft consensus #95911

kv: perform PushTxn(PUSH_TIMESTAMP) without Raft consensus #95911

nvanbenschoten commented Jan 26, 2023 •

edited

Loading

cockroach-teamcity commented Jan 26, 2023

arulajmani left a comment

nvanbenschoten left a comment

craig bot commented Feb 2, 2023

craig bot commented Feb 2, 2023

craig bot commented Feb 2, 2023

kv: perform PushTxn(PUSH_TIMESTAMP) without Raft consensus #95911

kv: perform PushTxn(PUSH_TIMESTAMP) without Raft consensus #95911

Conversation

nvanbenschoten commented Jan 26, 2023 • edited Loading

check txn push marker on commit, not txn record creation

always promote PUSH_TIMESTAMP to PUSH_ABORT on failed staging record

don't rewrite txn record on PushTxn(TIMESTAMP)

cockroach-teamcity commented Jan 26, 2023

arulajmani left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

craig bot commented Feb 2, 2023

craig bot commented Feb 2, 2023

craig bot commented Feb 2, 2023

nvanbenschoten commented Jan 26, 2023 •

edited

Loading