-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: Fix performance regression due to new call to collectSpansRead #91462
kvserver: Fix performance regression due to new call to collectSpansRead #91462
Conversation
6023a92
to
0ed1846
Compare
Could you add the bench diff for comparison. |
Added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits / questions - the results look good.
Reviewed 3 of 4 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @KaiSun314)
pkg/kv/kvserver/replica_send.go
line 404 at r1 (raw file):
defer func() { // Handle load-based splitting, if necessary. if br != nil {
Is there a reason you swapped to checking the br as opposed to the error - are there cases where we would return a pErr and also a br here?
pkg/kv/kvserver/replica_split_load.go
line 51 at r1 (raw file):
// getResponseBoundarySpan computes the union span of the true spans that were // iterated over (using the request span and the response's resumeSpan).
nit : drop the parens.
0ed1846
to
68b0f01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli)
pkg/kv/kvserver/replica_send.go
line 404 at r1 (raw file):
Previously, kvoli (Austen) wrote…
Is there a reason you swapped to checking the br as opposed to the error - are there cases where we would return a pErr and also a br here?
It seems that executeBatchWithConcurrencyRetries returns either a non-nil br and nil pErr, or a nil br and a non-nil pErr.
To be safe though, I added both checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review Austen!
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli)
68b0f01
to
343085f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @KaiSun314)
pkg/kv/kvserver/replica_split_load.go
line 77 at r3 (raw file):
} // TODO(kaisun): There are a few situations where the request did not
Could we add a tracking issue for this as well? Just to mention it is known behavior and was apparent in the previous request based splitter too.
343085f
to
80078ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli)
pkg/kv/kvserver/replica_split_load.go
line 77 at r3 (raw file):
Previously, kvoli (Austen) wrote…
Could we add a tracking issue for this as well? Just to mention it is known behavior and was apparent in the previous request based splitter too.
Done.
Drive-by comment, the release note seems much too technical. Release notes are consumed by the docs team who will prepare them for consumption by customers. It's unclear to me what they could conceivably make of the release note at hand. Also, I'm not sure a release note is even necessary: the perf regression never made it into a release, right? |
80078ec
to
9589632
Compare
Ah true good point Tobi, thanks! I have changed to none for the release note. |
9589632
to
abb7aa5
Compare
When we incorporated the use of response data in the load-based splitter, we called collectSpansRead, which is allocation heavy and computationally expensive, resulting in a performance regression. To address this, this patch performs 3 optimizations: 1. Remove the call to collectSpansRead; instead, add a custom function to iterate over the batch of requests / responses and calculate the true spans 2. Instead of constructing a *spanset.SpanSet and finding the union of spans (which uses O(batch_size) memory), we directly compute the union of spans while iterating over the batch resulting in only O(1) memory used 3. Lazily compute the union of true spans only when it is truly needed i.e. we are under heavy load (e.g. >2500QPS) and a load-based splitter has been initialized Release note: None
abb7aa5
to
ca76c28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the test structure.
Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @KaiSun314)
Thank you so much for the review! bors r+ |
Build failed (retrying...): |
Build succeeded: |
Fixes: #91374
Fixes: #91723
When we incorporated the use of response data in the load-based splitter, we called collectSpansRead, which is allocation heavy and computationally expensive, resulting in a performance regression.
To address this, this patch performs 3 optimizations:
Cherry-picking this commit to the commit right before we incorporated response data in the load-based splitter (068845f) and running
the output is:
Release note: None