-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: extend resolverQueue to support waiter transactions #76719
Conversation
f2d2629
to
5130e56
Compare
271a67a
to
6cf0c75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I like where this is going.
Reviewed 2 of 5 files at r3, 1 of 7 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @matthewtodd)
pkg/sql/contention/resolver.go, line 107 at r4 (raw file):
// remainingRetries stores a mapping of each contention event to its // remaining number of retries attempts. The key in the map is the // transaction ID of the blocking transaction.
this comment is no longer true
pkg/sql/contention/resolver.go, line 172 at r4 (raw file):
// self-throttling once that QPS value exceed certain value. remoteReq, localReq := makeRPCRequestsFromBatch(currentBatch)
Now I understand that the remoteReq is to resolve blocking txn ids, and the localReq is for waiting txn ids. But it took me a while. Maybe consider renaming these variables?
pkg/sql/contention/resolver.go, line 188 at r4 (raw file):
for _, event := range currentBatch { needToRetryDueToBlockingTxnID, initialRetryBudget1 :=
I wonder if a name rather than a numeric suffix would help? Could get too long, though.
pkg/sql/contention/resolver.go, line 189 at r4 (raw file):
for _, event := range currentBatch { needToRetryDueToBlockingTxnID, initialRetryBudget1 := maybeUpdateBlockingTxnFingerprintID(
drop the word blocking
pkg/sql/contention/resolver.go, line 233 at r4 (raw file):
} if uuid.Nil.Equal(txnID) {
comment here?
pkg/sql/contention/resolver.go, line 242 at r4 (raw file):
if _, ok := inProgressTxnIDs[txnID]; ok { return true /* needToRetry */, retryBudgetInfinite
nit: retryBudgetForTxnInProgress
pkg/sql/contention/resolver.go, line 335 at r4 (raw file):
func makeRPCRequestsFromBatch( batch []contentionpb.ExtendedContentionEvent, ) (remoteReq, localReq *serverpb.TxnIDResolutionRequest) {
consider the same renaming here
pkg/sql/contentionpb/contention.go, line 90 at r4 (raw file):
} func hashUUID(u uuid.UUID, fnv *util.FNV64) {
nit: since this code is a little clever, it's probably worth a direct unit test that it works? (You can craft UUID bytes to show things like first half same, second half different -> different hash, etc.)
pkg/sql/contentionpb/contention.go, line 92 at r4 (raw file):
func hashUUID(u uuid.UUID, fnv *util.FNV64) { b := u.GetBytes() // Hash the blocking txn id.
maybe say more about what's happening here, why 2 calls to DecodeUint64Descending
pkg/sql/contention/resolver_test.go, line 415 at r4 (raw file):
f.setTxnIDEntry(tc.coordinatorNodeID, tc.TxnID, tc.TxnFingerprintID) // TODO(azhng): wip: this is kind of a hack heh? f.setTxnIDEntry("local", tc.TxnID, tc.TxnFingerprintID)
delete this? Also add some more test cases: want to populate remote & local coordinators with different data? Also test the retry behavior.
34a5a87
to
2e3e487
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @matthewtodd)
pkg/sql/contention/resolver.go, line 107 at r4 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
this comment is no longer true
Done.
pkg/sql/contention/resolver.go, line 172 at r4 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
Now I understand that the remoteReq is to resolve blocking txn ids, and the localReq is for waiting txn ids. But it took me a while. Maybe consider renaming these variables?
Done.
pkg/sql/contention/resolver.go, line 188 at r4 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
I wonder if a name rather than a numeric suffix would help? Could get too long, though.
Done. renamed
pkg/sql/contention/resolver.go, line 189 at r4 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
drop the word blocking
Done.
pkg/sql/contention/resolver.go, line 233 at r4 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
comment here?
Done.
pkg/sql/contention/resolver.go, line 242 at r4 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: retryBudgetForTxnInProgress
Done.
pkg/sql/contention/resolver.go, line 335 at r4 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
consider the same renaming here
Done.
pkg/sql/contentionpb/contention.go, line 90 at r4 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: since this code is a little clever, it's probably worth a direct unit test that it works? (You can craft UUID bytes to show things like first half same, second half different -> different hash, etc.)
Done.
pkg/sql/contentionpb/contention.go, line 92 at r4 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
maybe say more about what's happening here, why 2 calls to DecodeUint64Descending
Done.
pkg/sql/contention/resolver_test.go, line 415 at r4 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
delete this? Also add some more test cases: want to populate remote & local coordinators with different data? Also test the retry behavior.
Done. Extended test harness to support have different waiting/blocking txn metadata.
2e3e487
to
a72f19b
Compare
@@ -274,12 +272,12 @@ func (s *eventStore) forEachEvent( | |||
} | |||
|
|||
func (s *eventStore) getEventByBlockingTxnID( | |||
txnID uuid.UUID, | |||
hash uint64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function needs renaming (by Blocking + Waiting + Timestamp hash ID)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed.
f0c5aa8
to
c82d6a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebasing these stacked PRs makes finding just the new code confusing -- maybe something to strategize around in the future? But I think I found it here! Just a few nits, but otherwise
Reviewed 1 of 5 files at r10, 5 of 8 files at r11, 1 of 5 files at r14, 1 of 8 files at r15.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @Azhng, @matthewtodd, and @THardy98)
pkg/sql/contention/resolver.go, line 172 at r15 (raw file):
// self-throttling once that QPS value exceed certain value. blockTxnIDsRPCReq, waitingTxnIDsRPCReq := makeRPCRequestsFromBatch(currentBatch)
nit: maybe block -> blocking
pkg/sql/contention/resolver.go, line 172 at r15 (raw file):
// self-throttling once that QPS value exceed certain value. blockTxnIDsRPCReq, waitingTxnIDsRPCReq := makeRPCRequestsFromBatch(currentBatch)
nit: maybe remove the RPC
from these names? It's redundant with Req
IMO.
pkg/sql/contention/resolver.go, line 239 at r15 (raw file):
} // Sometimes DistSQL engine is used by in weird ways. It is possible for a
nit: "is used by in"
pkg/sql/contention/resolver.go, line 343 at r15 (raw file):
func makeRPCRequestsFromBatch( batch []contentionpb.ExtendedContentionEvent, ) (blockingTxnIDRPCReq, waitingTxnIDRPCReq *serverpb.TxnIDResolutionRequest) {
nit: ditto re: the redundancy of RPC
here.
c82d6a7
to
fd8f40a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @matthewtodd and @THardy98)
pkg/sql/contention/resolver.go, line 172 at r15 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: maybe block -> blocking
Done.
pkg/sql/contention/resolver.go, line 172 at r15 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: maybe remove the
RPC
from these names? It's redundant withReq
IMO.
Done.
pkg/sql/contention/resolver.go, line 239 at r15 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: "is used by in"
Done.
pkg/sql/contention/resolver.go, line 343 at r15 (raw file):
Previously, matthewtodd (Matthew Todd) wrote…
nit: ditto re: the redundancy of
RPC
here.
Done.
This commit introduces contention event resolver. The new resolver implements a queue-like interface, where unresolved contention events will be queued into the resolver. The resolution is performed by invoking the `resolve()` method on the resolver queue, and the result of the resolution can be retrieved by calling `dequeue()` method. Release note: None
This commit introduces contention event store, an in-memory bounded FIFO store that stores historical contention events and map collected contention events to their corresponding transaction fingerprint IDs. Release note (sql change): A new cluster setting `sql.contention.event_store.capacity` is added . This cluster setting can be used to control the in-memory capacity of contention event store. When this setting is set to zero, the contention event store is disabled.
Previously, contention event only include the txnID of the transaction that held the lock (a.k.a. blocking transaction). The txnID of the waiter transaction is missing from the contention event. This commit includes the txnID of the waiter transaction into contention events. Release note: None
Previously, the resovlerQueue used in the contention event store only resolved the txnID of the blocking transaction. This commit, the resolverQueue would also resolve the txnID of the waiting transaction. Release note: None
fd8f40a
to
a943872
Compare
TFTR! bors r+ |
Build succeeded: |
Reviewer note: only last two commits are relevant
First commit
sql: add waiter txn id to extended contention event
Previously, contention event only include the txnID of the transaction
that held the lock (a.k.a. blocking transaction). The txnID of the
waiter transaction is missing from the contention event.
This commit includes the txnID of the waiter transaction into contention
events.
Release note: None
Second commit
sql: extend resolverQueue to resolve waiter txn id
Previously, the resovlerQueue used in the contention event store only
resolved the txnID of the blocking transaction.
This commit, the resolverQueue would also resolve the txnID of the
waiting transaction.
Release note: None