-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: add ability to verify pipelined replicated shared/exclusive locks #119933
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 9 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/kv/kvpb/api.proto
line 1440 at r2 (raw file):
// with the supplied lock strength or something stronger at the sequence // number. kv.kvserver.concurrency.lock.Strength lock_strength = 4;
Do we need to set this field in resolveIndeterminateCommitForTxnProbe
?
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 149 at r2 (raw file):
// in a meaningful way; see if we can get rid of it, or, at the least, not // return locks from other transactions for Shared and Exclusive locks. return result.Result{}, kvpb.NewIntentMissingError(args.Key, nil /* intent */)
Should we fix up the NewIntentMissingError
interface so that we can pass in the missing lock?
pkg/roachpb/data.go
line 1004 at r2 (raw file):
} func (t *Transaction) GetTxnMeta() *enginepb.TxnMeta {
This doesn't feel like it belongs as a method on Transaction
. Rather, it feels like it should be a method on MVCCWriteOptions
.
pkg/roachpb/data.proto
line 630 at r2 (raw file):
// The key that the write was made at. bytes key = 1 [(gogoproto.casttype) = "Key"]; // The sequence number of the request that created the write.
This will only be set if lock_strength
is Intent, right?
EDIT: or maybe not. Maybe this can be used to avoid the need to handle IgnoredSeqNums. But then we need to consider cases where an acquisition is a no-op because an existing lock exists with the same or a higher strength.
pkg/storage/lock_table_key_scanner.go
line 131 at r2 (raw file):
ctx context.Context, reader Reader, txn *enginepb.TxnMeta,
If we're moving in this direction, consider just passing in a txnID
. We'd need to adjust the comment above.
pkg/storage/mvcc.go
line 5902 at r2 (raw file):
} // VerifyLock returns true if the supplied transaction holds a replicated lock
nit: there are parallels between MVCCCheckForAcquireLock
and MVCCAcquireLock
that are lost by putting this chunk of code between them. Let's move it below validateLockAcquisitionStrength
.
pkg/storage/mvcc.go
line 5913 at r2 (raw file):
} ltScanner, err := newLockTableKeyScanner( ctx, reader, txn, str, 0, 0, BatchEvalReadCategory)
We're passing in str
and then catching LockConflictErrors
as an indication that a matching lock doesn't exist for txn
. I guess that works, but it's kind of subtle and it's not clear that it's needed. Should we update lockTableKeyScanner
to work with str = lock.None
? I think the underlying LockTableIterator
is built such that MatchMinStr = lock.None
will not match with any other txn IDs. So the change might be as simple as:
diff --git a/pkg/storage/lock_table_key_scanner.go b/pkg/storage/lock_table_key_scanner.go
index e7516f86310..1ad424e1607 100644
--- a/pkg/storage/lock_table_key_scanner.go
+++ b/pkg/storage/lock_table_key_scanner.go
@@ -62,6 +62,9 @@ func strongerOrEqualStrengths(str lock.Strength) []lock.Strength {
// the provided lock strength.
func minConflictLockStrength(str lock.Strength) (lock.Strength, error) {
switch str {
+ case lock.None:
+ // Don't conflict with any locks held by other transactions.
+ return lock.None, nil
case lock.Shared:
return lock.Exclusive, nil
case lock.Exclusive, lock.Intent:
Then we can remove the LockConflictError
handling below.
pkg/storage/mvcc.go
line 5938 at r2 (raw file):
} // TODO(arul): add a comment explaining this.
Should we also check IgnoredSeqNums
? I guess we'll need to handle cases where the lock is held but should have been rolled back and replaced by another lock. I haven't thought all the way through what that case means though.
pkg/storage/mvcc.go
Outdated
} | ||
|
||
// TODO(arul): add a comment explaining this. | ||
if foundLock.Txn.Sequence <= txn.Sequence { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussion in person:
- we need a
if enginepb.TxnSeqIsIgnored(foundLock.Txn.Sequence, txn.IgnoredSeqNums) { continue } else { return true }
- we don't need a
foundLock.Txn.Sequence <=> txn.Sequence
comparison
@@ -541,6 +564,7 @@ func (tp *txnPipeliner) chainToInFlightWrites(ba *kvpb.BatchRequest) *kvpb.Batch | |||
}, | |||
Txn: meta, | |||
ErrorIfMissing: true, | |||
LockStrength: w.LockStrength, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in person — we don't want to send QueryIntentRequests for intents+locks at ignored sequence numbers. Just delete them from ifWrites
immediately and don't chain on to them.
5269438
to
de9994b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marking this as ready for review.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvpb/api.proto
line 1440 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we need to set this field in
resolveIndeterminateCommitForTxnProbe
?
Done. We'll need to make more changes to resolveIndeterminateCommitForTxnProbe
in the client side tracking PR as well; for now, I'm just hard coding the lock strength to lock.Intent
.
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 149 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we fix up the
NewIntentMissingError
interface so that we can pass in the missing lock?
NewIntentMissingError
accepts the wrong intent that was found instead of what the QueryIntent
request was expecting. Is it meaningful in the case of replicated locks? For example, if we're querying for an exclusive lock, and an exclusive lock exists from a different transaction, does it matter?
We'd have to jump through a few loops to return any/all locks held by other transactions if the QueryIntent
didn't find the expected replicated lock. I'm not sure it's worth it. Wdyt?
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 50 at r3 (raw file):
// TODO(arul): add a test. // // TODO(XXX): Do we really need this? We're saying that we don't need to
@nvanbenschoten wanted to call this bit out for discussion. I may be missing something here.
pkg/storage/lock_table_key_scanner.go
line 131 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we're moving in this direction, consider just passing in a
txnID
. We'd need to adjust the comment above.
Done.
pkg/storage/mvcc.go
line 5913 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We're passing in
str
and then catchingLockConflictErrors
as an indication that a matching lock doesn't exist fortxn
. I guess that works, but it's kind of subtle and it's not clear that it's needed. Should we updatelockTableKeyScanner
to work withstr = lock.None
? I think the underlyingLockTableIterator
is built such thatMatchMinStr = lock.None
will not match with any other txn IDs. So the change might be as simple as:diff --git a/pkg/storage/lock_table_key_scanner.go b/pkg/storage/lock_table_key_scanner.go index e7516f86310..1ad424e1607 100644 --- a/pkg/storage/lock_table_key_scanner.go +++ b/pkg/storage/lock_table_key_scanner.go @@ -62,6 +62,9 @@ func strongerOrEqualStrengths(str lock.Strength) []lock.Strength { // the provided lock strength. func minConflictLockStrength(str lock.Strength) (lock.Strength, error) { switch str { + case lock.None: + // Don't conflict with any locks held by other transactions. + return lock.None, nil case lock.Shared: return lock.Exclusive, nil case lock.Exclusive, lock.Intent:Then we can remove the
LockConflictError
handling below.
Done.
pkg/storage/mvcc.go
line 5938 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we also check
IgnoredSeqNums
? I guess we'll need to handle cases where the lock is held but should have been rolled back and replaced by another lock. I haven't thought all the way through what that case means though.
Done.
pkg/storage/mvcc.go
line 5939 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Discussion in person:
- we need a
if enginepb.TxnSeqIsIgnored(foundLock.Txn.Sequence, txn.IgnoredSeqNums) { continue } else { return true }
- we don't need a
foundLock.Txn.Sequence <=> txn.Sequence
comparison
Done.
pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go
line 567 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Discussed in person — we don't want to send QueryIntentRequests for intents+locks at ignored sequence numbers. Just delete them from
ifWrites
immediately and don't chain on to them.
More of a note to you, now that you're doing the client side piece.
pkg/roachpb/data.go
line 1004 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This doesn't feel like it belongs as a method on
Transaction
. Rather, it feels like it should be a method onMVCCWriteOptions
.
Got rid of this.
pkg/roachpb/data.proto
line 630 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This will only be set if
lock_strength
is Intent, right?EDIT: or maybe not. Maybe this can be used to avoid the need to handle IgnoredSeqNums. But then we need to consider cases where an acquisition is a no-op because an existing lock exists with the same or a higher strength.
No longer part of this PR, but you're right we're not using this on the server. It's up to us if we want to special case this tracking on the client or if we want to track this for all lock strengths but just never use it during evaluation.
de9994b
to
70b7031
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job on this! Leaving a handful of comments, but none are substantial or should require much effort to address. Will stamp right after they're addressed.
Reviewed 8 of 15 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @jbowens)
pkg/kv/kvpb/api.proto
line 1444 at r4 (raw file):
// with the supplied lock strength or something stronger at the sequence // number. kv.kvserver.concurrency.lock.Strength lock_strength = 4;
nit: we're going to rename this request type to QueryLock
at some point, so we might as well just name this strength
to avoid QueryLock.LockStrength
.
pkg/kv/kvpb/api.proto
line 1450 at r4 (raw file):
// ignored will be treated as "not found"; that's because they can be removed // at any time. repeated storage.enginepb.IgnoredSeqNumRange ignored_seqnums = 5
Now that I see this spelled out, it probably does make sense to add validation to cmd_query_intent.go
that !enginepb.TxnSeqIsIgnored(args.Txn.Sequence, args.IgnoredSeqNums)
. Otherwise, return an AssertionFailed
error.
pkg/kv/kvserver/replica_tscache.go
line 315 at r4 (raw file):
addToTSCache(start, end, t.Txn.WriteTimestamp, uuid.UUID{}) } // NB: If this QueryIntentRequest was querying a replicated lock instead
Given all this, should we make the previous logic conditional on t.Strength == lock.Intent
(with the corresponding mixed version logic, which we might want to pull behind a StrengthOrDefault
method on QueryIntentRequest
)?
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 149 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
NewIntentMissingError
accepts the wrong intent that was found instead of what theQueryIntent
request was expecting. Is it meaningful in the case of replicated locks? For example, if we're querying for an exclusive lock, and an exclusive lock exists from a different transaction, does it matter?We'd have to jump through a few loops to return any/all locks held by other transactions if the
QueryIntent
didn't find the expected replicated lock. I'm not sure it's worth it. Wdyt?
Oh you're right, I was confused and thinking about RefreshFailedError
instead. Agreed that we want to pass nil
.
Should we hoist the intent
variable up a scope and then avoid the duplicate !reply.FoundIntent && args.ErrorIfMissing
checks?
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 50 at r3 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
@nvanbenschoten wanted to call this bit out for discussion. I may be missing something here.
This isn't to prevent the replicated locks from landing after the query intent request, it's to ensure that the query intent request waits on the lock acquisition's replication. Otherwise, it would slip ahead and fail to find the replicated lock.
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 57 at r4 (raw file):
// them. Does that mean this latching isn't required? txnID := req.(*kvpb.QueryIntentRequest).Txn.ID latchSpans.AddNonMVCC(spanset.SpanReadWrite, roachpb.Span{
s/SpanReadWrite/SpanReadOnly/
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 100 at r4 (raw file):
// TODO(arul): We should be able to remove the lock.None case once // compatibility with 24.1 is no longer an issue. if args.LockStrength == lock.Intent || args.LockStrength == lock.None {
See comment above about hiding this behind a method to centralize the compatibility logic.
pkg/kv/kvserver/txnrecovery/manager.go
line 210 at r4 (raw file):
}, Txn: meta, LockStrength: lock.Intent,
Want to leave me a TODO here? Just TODO(nvanbenschoten): pass in the correct strength.
pkg/storage/lock_table_key_scanner.go
line 145 at r4 (raw file):
readCategory ReadCategory, ) (*lockTableKeyScanner, error) { if txnID.Equal(uuid.UUID{}) && str == lock.None {
We can probably skip this. It's already checked in LockTableIteratorOptions.validate
, so no need for the noise here.
pkg/storage/mvcc.go
line 5967 at r4 (raw file):
return err } txnID := txn.ID
small nit: might as well just inline this instead of pulling out the variable, like you do below. It's saving one character.
pkg/storage/mvcc.go
line 6165 at r4 (raw file):
// nothing stopping another request from rolling back the lock even though // it exists right now. if !enginepb.TxnSeqIsIgnored(foundLock.Txn.Sequence, ignoredSeqNums) {
In my opinion, structuring this like if enginepb.TxnSeqIsIgnored(... { continue }
and then return true
below would be easier to read. That way the happy case is split out from the various exceptional cases, which each result in an early continue
.
pkg/storage/testdata/mvcc_histories/verify_locks
line 59 at r4 (raw file):
with t=A txn_step seq=10 verify_lock k=k1 str=shared
Should we test the base case somewhere of a key with no lock?
pkg/storage/testdata/mvcc_histories/verify_locks
line 59 at r4 (raw file):
with t=A txn_step seq=10 verify_lock k=k1 str=shared
Also, should we test the case where the key has an intent on it?
70b7031
to
27138c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFAL
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @nvanbenschoten)
pkg/kv/kvserver/replica_tscache.go
line 315 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Given all this, should we make the previous logic conditional on
t.Strength == lock.Intent
(with the corresponding mixed version logic, which we might want to pull behind aStrengthOrDefault
method onQueryIntentRequest
)?
Done. Reworded the comment here too to work better with the new structure.
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 149 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Oh you're right, I was confused and thinking about
RefreshFailedError
instead. Agreed that we want to passnil
.Should we hoist the
intent
variable up a scope and then avoid the duplicate!reply.FoundIntent && args.ErrorIfMissing
checks?
Good call, done.
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 50 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This isn't to prevent the replicated locks from landing after the query intent request, it's to ensure that the query intent request waits on the lock acquisition's replication. Otherwise, it would slip ahead and fail to find the replicated lock.
Ah right!
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 57 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/SpanReadWrite/SpanReadOnly/
Copy pasta. Fixed.
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 100 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
See comment above about hiding this behind a method to centralize the compatibility logic.
Done.
pkg/storage/mvcc.go
line 5967 at r4 (raw file):
It's saving one character.
Every little bit counts, but okay 😂
pkg/storage/mvcc.go
line 6165 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
In my opinion, structuring this like
if enginepb.TxnSeqIsIgnored(... { continue }
and thenreturn true
below would be easier to read. That way the happy case is split out from the various exceptional cases, which each result in an earlycontinue
.
Agreed. Also not sure what I was thinking with the "continue" there :P
pkg/storage/testdata/mvcc_histories/verify_locks
line 59 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Also, should we test the case where the key has an intent on it?
Done
pkg/storage/testdata/mvcc_histories/verify_locks
line 59 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we test the base case somewhere of a key with no lock?
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @jbowens)
pkg/kv/kvpb/api.proto
line 1450 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Now that I see this spelled out, it probably does make sense to add validation to
cmd_query_intent.go
that!enginepb.TxnSeqIsIgnored(args.Txn.Sequence, args.IgnoredSeqNums)
. Otherwise, return anAssertionFailed
error.
Did we want to add something like this to cmd_query_intent.go
?
pkg/kv/kvserver/replica_tscache.go
line 313 at r5 (raw file):
// the request evaluated successfully (so it can't land later) -- it's // only checking whether the replication succeeded or not. if req.(*kvpb.QueryIntentRequest).StrengthOrDefault() == lock.Intent {
s/req.(*kvpb.QueryIntentRequest)/t/
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 46 at r5 (raw file):
latchSpans.AddNonMVCC(spanset.SpanReadOnly, req.Header().Span()) // They also acquire a read latch on the per-transaction local key that all // replicated shared lock acquisitions acquire latches on to isolate them
"to isolate them To isolate themselves"
27138c3
to
8cacf5e
Compare
8cacf5e
to
c29ac18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the version gate as well.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani, @jbowens, and @nvanbenschoten)
pkg/kv/kvpb/api.proto
line 1450 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did we want to add something like this to
cmd_query_intent.go
?
Sorry for missing the comment earlier. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @jbowens)
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 91 at r6 (raw file):
} if !enginepb.TxnSeqIsIgnored(args.Txn.Sequence, args.IgnoredSeqNums) {
Shouldn't this assertion fire if enginepb.TxnSeqIsIgnored(args.Txn.Sequence, args.IgnoredSeqNums)
(no negation)?
Looks like CI is hitting:
|
c29ac18
to
bc24bf5
Compare
Previously, QueryIntent requests were only used to verify whether an intent was successfully evaluated and replicated. This patch extends QueryIntent request to also be able to verify whether a pipelined shared or exclusive lock was successfully replicated or not. Informs cockroachdb#117978 Release note: None
bc24bf5
to
5cb2264
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick reviews and all the help here @nvanbenschoten!
bors r=nvanbenschoten
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jbowens and @nvanbenschoten)
pkg/kv/kvserver/batcheval/cmd_query_intent.go
line 91 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Shouldn't this assertion fire
if enginepb.TxnSeqIsIgnored(args.Txn.Sequence, args.IgnoredSeqNums)
(no negation)?
It should have. Trying to work too fast :/
121088: kv: pipeline replicated lock acquisition r=nvanbenschoten a=nvanbenschoten Fixes #117978. Builds upon the foundation laid in [#119975](#119975), [#119933](#119933), [#121065](#121065), and [#121086](#121086). This commit completes the client-side handling of replicated lock acquisition pipelining. Replicated lock acquisition through `Get`, `Scan`, and `ReverseScan` requests now qualifies to be pipelined. The `txnPipeliner` is updated to track the strength associated with each in-flight write and pass that along to the corresponding `QueryIntentRequest`. See [benchmark with TPC-C results here](#121088 (comment)). Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]>
Previously, QueryIntent requests were only used to verify whether an
intent was successfully evaluated and replicated. This patch extends
QueryIntent request to also be able to verify whether a pipelined
shared or exclusive lock was successfully replicated or not.
Informs #117978
Release note: None