-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: support STAGING transactions in kv protocol and client #36012
kv: support STAGING transactions in kv protocol and client #36012
Conversation
742f533
to
07f6a10
Compare
07f6a10
to
06f74b4
Compare
Ok, this is ready for review when possible. The one major questions I have that I'd appreciate input from @tbg, @ajwerner, and maybe @andreimatei on is about how we should represent a transaction's "intent spans" and "in-flight requests". This comes up both in the Also, I'm not planning on merging this until after the DistSender changes are also ready to go in because on its own this will cause a performance hit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, 2 of 2 files at r2, 28 of 28 files at r3, 17 of 30 files at r4, 2 of 2 files at r5, 28 of 29 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @tbg)
pkg/kv/txn_interceptor_committer.go, line 147 at r3 (raw file):
// the status back to PENDING. if txn := pErr.GetTxn(); txn != nil && txn.Status == roachpb.STAGING { pErr.SetTxn(cloneWithStatus(txn, roachpb.PENDING))
Just to check my understanding, at this point the serialized Txn record will be STAGING but we know here that it's not committed so our in-memory view of the Txn is PENDING?
pkg/kv/txn_interceptor_committer.go, line 151 at r3 (raw file):
// Same deal with MixedSuccessErrors. // TODO(nvanbenschoten): Merge code? Wait for MixedSuccessError to go away? if aPSErr, ok := pErr.GetDetail().(*roachpb.MixedSuccessError); ok {
What is aPSErr
supposed to mean?
pkg/kv/txn_interceptor_committer.go, line 255 at r3 (raw file):
// committing EndTransaction in parallel with other writes. If so, it returns // the set of writes that may still be in-flight when the EndTransaction is // evaluated. These writes should be declared are requirements for the
either should be declared as
or are
pkg/kv/txn_interceptor_committer.go, line 405 at r6 (raw file):
ba.Add(&et) const maxAttempts = 5
Comment on why retries might happen and why it is a good idea?
pkg/kv/txn_interceptor_committer_test.go, line 194 at r3 (raw file):
mockSender.MockSend(func(ba roachpb.BatchRequest) (*roachpb.BatchResponse, *roachpb.Error) { require.Equal(t, 3, len(ba.Requests))
require.Len
is a thing
pkg/kv/txn_interceptor_pipeliner.go, line 68 at r3 (raw file):
// that have been proposed but not proven to have succeeded are first checked // before considering the transaction committed. These async writes are referred // to as "outstanding writes" and this process of proving that an outstanding
should we call them "in-flight writes" for consistency? Can you help elucidate the difference between "outstanding" and "in-flight" for me? Both terms are used in a sentence below. From what I gather one is the set of writes stored in the BTree of the txnPipeliner and another is the map of the EndTransaction populated by the txnComitter. The mixed terminology is a little bit confusing.
pkg/roachpb/data.go, line 980 at r3 (raw file):
} if !t.Status.IsFinalized() { if (t.Epoch < o.Epoch) || (t.Epoch == o.Epoch && o.Status != PENDING) {
putting a pin in this to remember to think harder about this condition
pkg/storage/replica_test.go, line 10672 at r3 (raw file):
return sendWrappedWithErr(etH, &et) }, expError: "TransactionAbortedError(ABORT_REASON_ALREADY_COMMITTED_OR_ROLLED_BACK_POSSIBLE_REPLAY)",
Is TransactionAbortedError
a good name for this error?
pkg/storage/replica_test.go, line 11114 at r3 (raw file):
}, run: func(txn *roachpb.Transaction, _ hlc.Timestamp) error { rt := recoverTxnArgs(txn, true /* implicitlyCommitted */)
This got renamed to allWritesFound
pkg/storage/replica_write.go, line 506 at r3 (raw file):
et = etClone reqsClone := append([]roachpb.RequestUnion(nil), ba.Requests...)
preallocate this because you know the size
pkg/storage/replica_write.go, line 494 at r6 (raw file):
} origET := et
Can you comment about what's going on here?
pkg/storage/batcheval/cmd_end_transaction.go, line 242 at r3 (raw file):
case roachpb.PENDING: if h.Txn.Epoch < reply.Txn.Epoch {
Given the code duplication you could make this case PENDING, STAGING and put a
if reply.Txn.Status == roachpb.STAGING {
// Remove the existing in-flight writes. They may be replaced.
reply.Txn.InFlightWrites = nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are questions more than anything
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @nvanbenschoten, and @tbg)
f9d3fc5
to
2f1d26c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @tbg)
pkg/kv/txn_interceptor_committer.go, line 147 at r3 (raw file):
Previously, ajwerner wrote…
Just to check my understanding, at this point the serialized Txn record will be STAGING but we know here that it's not committed so our in-memory view of the Txn is PENDING?
Yes, exactly! This is really subtle so I'll improve the comment.
pkg/kv/txn_interceptor_committer.go, line 151 at r3 (raw file):
Previously, ajwerner wrote…
What is
aPSErr
supposed to mean?
I think it's "a partial success error". I got that from here. I really need to just address the TODO and get rid of MixedSuccessError
s entirely.
pkg/kv/txn_interceptor_committer.go, line 255 at r3 (raw file):
Previously, ajwerner wrote…
either
should be declared as
orare
Done.
pkg/kv/txn_interceptor_committer.go, line 405 at r6 (raw file):
Previously, ajwerner wrote…
Comment on why retries might happen and why it is a good idea?
I removed this. I don't have a strong enough understanding of when retries would be a good idea. We can add them back in if we see a need for them.
pkg/kv/txn_interceptor_committer_test.go, line 194 at r3 (raw file):
Previously, ajwerner wrote…
require.Len
is a thing
Nice! Switched where appropriate.
pkg/kv/txn_interceptor_pipeliner.go, line 68 at r3 (raw file):
Previously, ajwerner wrote…
should we call them "in-flight writes" for consistency? Can you help elucidate the difference between "outstanding" and "in-flight" for me? Both terms are used in a sentence below. From what I gather one is the set of writes stored in the BTree of the txnPipeliner and another is the map of the EndTransaction populated by the txnComitter. The mixed terminology is a little bit confusing.
Yeah, I think this is a good point. The distinction here is that:
"outstanding" = writes that have not completed by the time an EndTransaction
is issued
"in-flight" = writes that are in-flight at the same time as an EndTransaction
This distinction here makes sense given the development progression of txn pipelining and now parallel commits, but as a reader of this code I can see how the distinction would feel artificial. There's definitely a draw to unifying the concepts.
If we were to unify them, which name would you like better? "Outstanding"? "In-flight"? "Concurrent"? Something else? I'm into the idea of making the change, I just don't want to change the 200 instances of "outstanding" to something else before we've settled on something better. @tbg do you have an opinion?
pkg/storage/replica_test.go, line 10672 at r3 (raw file):
Previously, ajwerner wrote…
Is
TransactionAbortedError
a good name for this error?
TransactionAbortedError
s arise in these situations because the transaction record is committed and already GCed by the time the second request reaches it. At that point, there's no way to tell whether the transaction was committed or aborted (unlike in the "without eager gc" cases below). All we really know if that the transaction record can't be created again in the future.
Fortunately, it doesn't really matter because transaction records can't be GCed until after all of their intents have already been resolved. This is explained in pretty good detail here.
pkg/storage/replica_test.go, line 11114 at r3 (raw file):
Previously, ajwerner wrote…
This got renamed to
allWritesFound
Ah good catch, but other way around. I renamed it to implicitlyCommitted
in the previous PR and didn't catch all of these during the rebase. Done.
pkg/storage/replica_write.go, line 506 at r3 (raw file):
Previously, ajwerner wrote…
preallocate this because you know the size
That doesn't seem to make a difference:
var slice = make([]int, 512)
func BenchmarkPreAlloc(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = append(make([]int, 0, len(slice)), slice...)
}
}
func BenchmarkNoPreAlloc(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = append([]int(nil), slice...)
}
}
name time/op
PreAlloc-4 722ns ± 1%
NoPreAlloc-4 723ns ± 4%
name alloc/op
PreAlloc-4 4.10kB ± 0%
NoPreAlloc-4 4.10kB ± 0%
name allocs/op
PreAlloc-4 1.00 ± 0%
NoPreAlloc-4 1.00 ± 0%
pkg/storage/replica_write.go, line 494 at r6 (raw file):
Previously, ajwerner wrote…
Can you comment about what's going on here?
Done.
pkg/storage/batcheval/cmd_end_transaction.go, line 242 at r3 (raw file):
Previously, ajwerner wrote…
Given the code duplication you could make this case PENDING, STAGING and put a
if reply.Txn.Status == roachpb.STAGING { // Remove the existing in-flight writes. They may be replaced. reply.Txn.InFlightWrites = nil }
Done.
2f1d26c
to
5791929
Compare
5791929
to
a635e1c
Compare
This was suggested in cockroachdb#36012. The following commit will make a distinction between "in-flight" writes and a "write footprint", so it makes sense to first perform this global rename. Release note: None
The final two points to address here before it is ready to review are:
Both of these are addressed in #37302, so that PR is the final precursor to this one. |
This commit addresses the impedance mismatch discussed in cockroachdb#35763 (review) between tracking in-flight writes and tracking a transaction's intent "footprint". These are two related but fundamentally different concepts and it's best to be explicit about how they relate and how they differ. To this end, this commit moves all intent tracking into the txnPipeliner. The pipeliner now tracks two separate sets: the "in-flight writes" and the "write footprint". The former is morally a subset of the latter, but for the sake of avoiding duplication, in-flight writes are not included in the write footprint. Instead, the write footprint includes: 1. write from the current epoch that have already succeeded (point writes that were pipelined and already proved and ranged writes) 2. writes from the current epoch that may or may not have succeeded 3. all writes from prior epochs Structuring the two sets this way has a major benefit -- it is exactly how we would like to represent the writes that a transaction has performed on `EndTransactionRequests` and on STAGING `Transaction` records. This will simplify client-logic in cockroachdb#36012 significantly. Release note: None
This was suggested in cockroachdb#36012. The following commit will make a distinction between "in-flight" writes and a "write footprint", so it makes sense to first perform this global rename. Release note: None
1811d8e
to
d7ed002
Compare
This allows QueryIntent to behave correctly even as a transaction's provisional commit timestamp grows. Release note: None
Parallel commits will want intents that are found to be missing to be prevented from ever being written. This is true both for the committing transaction itself and other transactions attempting to recover a STAGING transaction. In fact, all use cases of QueryIntent will now want the PREVENT IfMissingBehavior. However, a committing transaction will still also want the existing behavior of returning an error if the intent is missing, which is provided by the RETURN_ERROR IfMissingBehavior. This indicates that the options in the IfMissingBehavior are not sufficiently orthogonal to justify an enum. This change removes the enum and replaces it with an `ErrorIfMissing` bool. This is backwards compatible because up until now, no has used the PREVENT behavior. Release note: None
These checks usually go hand-in-hand, but the output of the error check is almost always more useful when tests fail. Release note: None
Release note: None
roachpb.EndTransactionRequest is 120 bytes large. We shouldn't pass it by value to these leaf functions. Release note: None
This commit adjusts the representation of `InFlightWrites` in Transaction protos in accordance with the thinking from cockroachdb#37302. The structure no longer points into the `IntentSpans`. Instead, it lives alongside the `IntentSpans`, which logically extends to include all of the `InFlightWrites` but doesn't in practice to avoid duplication. Release note: None
d7ed002
to
c7bad2b
Compare
@tbg I decided I didn't want to sit on this until #37469 is ready because this large of a PR tends to rot and I don't intend to finalize that PR until next week. I added in one more commit that temporarily disables parallel commits, which we can revert when #37469 goes in. Here's a note from that commit message:
With that extra commit, this PR is now eligible to be merged into master. |
Sounds good, I didn't think that was easy enough to do or I would've suggested so myself during review. Going to give this a look later today. |
c7bad2b
to
987cc5b
Compare
This commit allows us to merge cockroachdb#36012 without waiting for cockroachdb#37469. We can revert this commit once the changes in cockroachdb#37469 land. The fear of merging cockroachdb#36012 without cockroachdb#37469 and without disabling parallel commits is that we would then be adding a new transaction state transition without actually performing implicit commits in parallel with the rest of the transaction's writes. This appears to slow down nightly benchmarks by about 5-10%. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r15, 3 of 3 files at r16.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/replica_write.go, line 527 at r15 (raw file):
// Ranged write or read. See below. } }
assert len(inflight) >= len(writes)
pkg/storage/replica_write.go, line 572 at r15 (raw file):
// // This reduces the complexity of this loop from O(n*log(m)) down to // O(n) where n is the number of requests in the batch and m is the
point writes
#batch + #pointwritesinbatch * log inflight
This PR polishes off the protocol and client level changes from cockroachdb#35165 to allow transactions to be moved to the STAGING state during parallel commits. To do so, the change makes two major contributions. First, it adds support to EndTransaction request and response for moving a transaction record to the STAGING status in addition to the ABORTED and COMMITTED status. This is done in a way that interacts elegantly with a transaction's in-flight writes and allows the request to move a transaction record directly to the COMMITTED status when all of the in-flight writes are on the same range as the transaction record. The second contribution is that the change adds full support to the txnCommitter interceptor to issue parallel commits when possible and then to launch the asynchronous portion of marking a transaction as explicitly committed if it determines that a transaction is implicitly committed. This is likely the area of the change where reviewers will want to look first, as it incorporates the bulk of the parallel commits algorithm and also serves as its primary documentation. The final remaining piece of work after this change is to adjust DistSender to actually send EndTransaction requests in parallel with writes in the same batch and with QueryIntents for previous writes. This change makes it safe to do so, but doesn't actually add any logic to DistSender. There are a few complications to that piece and it's also the least interesting part of this change, which is why I have saved it for last. Release note: None
Release note: None
Just code movement. Release note: None
Release note: None
This commit allows us to merge cockroachdb#36012 without waiting for cockroachdb#37469. We can revert this commit once the changes in cockroachdb#37469 land. The fear of merging cockroachdb#36012 without cockroachdb#37469 and without disabling parallel commits is that we would then be adding a new transaction state transition without actually performing implicit commits in parallel with the rest of the transaction's writes. This appears to slow down nightly benchmarks by about 5-10%. Release note: None
987cc5b
to
9d874cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
pkg/storage/replica_write.go, line 527 at r15 (raw file):
Previously, tbg (Tobias Grieger) wrote…
assert len(inflight) >= len(writes)
Done.
pkg/storage/replica_write.go, line 572 at r15 (raw file):
Previously, tbg (Tobias Grieger) wrote…
point writes
#batch + #pointwritesinbatch * log inflight
Done.
bors r+ |
36012: kv: support STAGING transactions in kv protocol and client r=nvanbenschoten a=nvanbenschoten This PR polishes off the protocol and client level changes from #35165 to allow transactions to be moved to the STAGING state during parallel commits. To do so, the change makes two major contributions. First, it adds support to `EndTransaction` request and response for moving a transaction record to the STAGING status in addition to the ABORTED and COMMITTED status. This is done in a way that interacts elegantly with a transaction's in-flight writes and allows the request to move a transaction record directly to the COMMITTED status when all of the in-flight writes are on the same range as the transaction record. The second contribution is that the change adds full support to the `txnCommitter` interceptor to issue parallel commits when possible and then to launch the asynchronous portion of marking a transaction as explicitly committed if it determines that a transaction is implicitly committed. This is likely the area of the change where reviewers will want to look first, as it incorporates the bulk of the parallel commits algorithm and also serves as its primary documentation. The final remaining piece of work after this change is to adjust DistSender to actually send `EndTransaction` requests in parallel with writes in the same batch and with `QueryIntents` for previous writes. This change makes it safe to do so, but doesn't actually add any logic to DistSender. There are a few complications to that piece and it's also the least interesting part of this change, which is why I have saved it for last. Co-authored-by: Nathan VanBenschoten <[email protected]>
Build succeeded |
This PR polishes off the protocol and client level changes from #35165 to allow transactions to be moved to the STAGING state during parallel commits. To do so, the change makes two major contributions.
First, it adds support to
EndTransaction
request and response for moving a transaction record to the STAGING status in addition to the ABORTED and COMMITTED status. This is done in a way that interacts elegantly with a transaction's in-flight writes and allows the request to move a transaction record directly to the COMMITTED status when all of the in-flight writes are on the same range as the transaction record.The second contribution is that the change adds full support to the
txnCommitter
interceptor to issue parallel commits when possible and then to launch the asynchronous portion of marking a transaction as explicitly committed if it determines that a transaction is implicitly committed. This is likely the area of the change where reviewers will want to look first, as it incorporates the bulk of the parallel commits algorithm and also serves as its primary documentation.The final remaining piece of work after this change is to adjust DistSender to actually send
EndTransaction
requests in parallel with writes in the same batch and withQueryIntents
for previous writes. This change makes it safe to do so, but doesn't actually add any logic to DistSender. There are a few complications to that piece and it's also the least interesting part of this change, which is why I have saved it for last.