Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: perform PushTxn(PUSH_TIMESTAMP) without Raft consensus #95911

Merged

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Jan 26, 2023

This PR contains a sequence of three commits that combine to resolve #94728.

check txn push marker on commit, not txn record creation

The first commit moves the point when a transaction checks the timestamp cache for its minimum commit timestamp from transaction record creation time back to commit time. This allows us to use the timestamp cache to communicate successful PushTxn(TIMESTAMP) to a pushee with an existing record without rewriting its transaction record.

For details, see the changes to the state machine diagram attached to Replica.CanCreateTxnRecord for a visual depiction of this change.

always promote PUSH_TIMESTAMP to PUSH_ABORT on failed staging record

The second commit simplifies logic in PushTxnRequest that promoted a PUSH_TIMESTAMP to a PUSH_ABORT when it
found a STAGING transaction record that it knew to be part of a failed parallel commit attempt.

The logic tried to be smart and minimize the cases where it needed to promote a PUSH_TIMESTAMP to a PUSH_ABORT. It was avoiding doing so if it had previously found an intent with a higher epoch. In practice, this optimization doesn't seem to matter. It was also making logic in a following commit harder to write because it was preserving cases where a PUSH_TIMESTAMP would succeed against a STAGING transaction record. We don't want to support such state transitions, so eliminate them.

don't rewrite txn record on PushTxn(TIMESTAMP)

With the previous two commits, transactions will check the timestamp cache before committing to determine whether they have had their commit timestamp pushed. The final commit exploits this to avoid ever rewriting a transaction's record on a timestamp push. Instead, the timestamp cache is used, regardless of whether the record already existed or not. Doing so avoids consensus.

Release note: None

@nvanbenschoten nvanbenschoten requested a review from a team as a code owner January 26, 2023 06:25
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/pushNoConsensus branch 2 times, most recently from ea417d0 to 9128ec7 Compare January 26, 2023 22:09
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 9 of 9 files at r1, 3 of 3 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


-- commits line 6 at r1:
nit: "back to"


pkg/kv/kvserver/replica_tscache.go line 410 at r1 (raw file):

//     PushTxn(ABORT). As a result of this mechanism, a pusher transaction never
//     needs to create the transaction record for contending transactions that
//     it forms write-write conflicts with.

The phrasing here leaves the handling of write-read conflicts ambiguous. Consider adding a reference to MinTxnCommitTS here, so that readers know where to look if they're curious about Txn record creation semantics when there's a write-read conflict.


pkg/kv/kvserver/replica_tscache.go line 449 at r1 (raw file):

//	                   v            | v    EndTxn(!STAGING)        v  v            |
//	               +--------------------+  or PushTxn(ABORT)     +--------------------+
//	               |                    |  then: txn.ts -> v1    |                    |

txn.ts -> v1 should only happens when EndTxn(!Staging), right? It doesn't make sense in the PushTxn(ABORT) case -- should we split these cases out now?

This commit moves the point when a transaction checks the timestamp
cache for its minimum commit timestamp from transaction record creation
time back to commit time. This allows us to use the timestamp cache to
communicate successful PushTxn(TIMESTAMP) to a pushee with an existing
record without re-writing its transaction record.

See the changes to the state machine diagram attached to
`Replica.CanCreateTxnRecord` for a visual depiction of this change.
This commit simplifies logic in PushTxnRequest that promoted a
PUSH_TIMESTAMP to a PUSH_ABORT when it found a STAGING transaction
record which it knew to be part of a failed parallel commit attempt.

The logic was trying to be smart and minimize the cases where it needed
to promote a PUSH_TIMESTAMP to a PUSH_ABORT. It was avoiding doing so if
it had previously found an intent with a higher epoch. In practice, this
optimization doesn't seem to matter. It was also making logic in a
following commit harder to write because it was preserving cases where a
PUSH_TIMESTAMP would succeed against a STAGING transaction record. We
don't want to support such state transitions, so eliminate them.
Fixes cockroachdb#94728.

With the previous commit, transactions will check the timestamp cache
before committing to determine whether they have had their commit
timestamp pushed. This commit exploits this to avoid ever rewriting a
transaction's record on a timestamp push. Instead, the timestamp cache
is used, regardless of whether the record already existed or not. Doing
so avoids consensus.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/pushNoConsensus branch from 9128ec7 to 3dbb321 Compare February 1, 2023 23:57
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=arulajmani

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)


-- commits line 6 at r1:

Previously, arulajmani (Arul Ajmani) wrote…

nit: "back to"

Done.


pkg/kv/kvserver/replica_tscache.go line 410 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

The phrasing here leaves the handling of write-read conflicts ambiguous. Consider adding a reference to MinTxnCommitTS here, so that readers know where to look if they're curious about Txn record creation semantics when there's a write-read conflict.

Good idea, done.


pkg/kv/kvserver/replica_tscache.go line 449 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

txn.ts -> v1 should only happens when EndTxn(!Staging), right? It doesn't make sense in the PushTxn(ABORT) case -- should we split these cases out now?

Yeah, you're correct about that, good catch. We could forward the record's WriteTimestamp to MinTxnCommitTS() when aborting the txn on a PushTxn(ABORT), but we don't because the write timestamp is meaningless once the txn is aborted. I'm going to leave this as is out of a desire not to make this diagram any more crowded and harder to read than it already is while acknowledging that it is not entirely accurate in this specific case.

@craig
Copy link
Contributor

craig bot commented Feb 2, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 2, 2023

Build failed (retrying...):

@craig craig bot merged commit 22244a7 into cockroachdb:master Feb 2, 2023
@craig
Copy link
Contributor

craig bot commented Feb 2, 2023

Build succeeded:

@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/pushNoConsensus branch February 2, 2023 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: PushTxn(PUSH_TIMESTAMP) without Raft consensus
3 participants