Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: Fix performance regression due to new call to collectSpansRead #91462

Merged

Conversation

KaiSun314
Copy link
Contributor

@KaiSun314 KaiSun314 commented Nov 8, 2022

Fixes: #91374
Fixes: #91723

When we incorporated the use of response data in the load-based splitter, we called collectSpansRead, which is allocation heavy and computationally expensive, resulting in a performance regression.

To address this, this patch performs 3 optimizations:

  1. Remove the call to collectSpansRead; instead, add a custom function to iterate over the batch of requests / responses and calculate the true spans
  2. Instead of constructing a *spanset.SpanSet and finding the union of spans (which uses O(batch_size) memory), we directly compute the union of spans while iterating over the batch resulting in only O(1) memory used
  3. Lazily compute the union of true spans only when it is truly needed i.e. we are under heavy load (e.g. >2500QPS) and a load-based splitter has been initialized

Cherry-picking this commit to the commit right before we incorporated response data in the load-based splitter (068845f) and running

~/benchdiff/benchdiff --old=068845ff72315f8b64f0e930c17c48f078203bc4 --new=abf61ce75c47e16bc39ed0e714f2e46f1d97eb7c --count=20 --post-checkout='./dev generate go' --run='KV/././rows=1$$' ./pkg/sql/tests

the output is:
More Efficient Response Data Benchdiff

Release note: None

@KaiSun314 KaiSun314 requested a review from a team as a code owner November 8, 2022 04:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@KaiSun314 KaiSun314 force-pushed the use-response-data-more-efficient branch from 6023a92 to 0ed1846 Compare November 8, 2022 05:12
@KaiSun314 KaiSun314 requested a review from kvoli November 8, 2022 15:08
@kvoli
Copy link
Collaborator

kvoli commented Nov 8, 2022

Could you add the bench diff for comparison.

@KaiSun314
Copy link
Contributor Author

KaiSun314 commented Nov 8, 2022

Could you add the bench diff for comparison.

Added

Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Some nits / questions - the results look good.

Reviewed 3 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @KaiSun314)


pkg/kv/kvserver/replica_send.go line 404 at r1 (raw file):

	defer func() {
		// Handle load-based splitting, if necessary.
		if br != nil {

Is there a reason you swapped to checking the br as opposed to the error - are there cases where we would return a pErr and also a br here?


pkg/kv/kvserver/replica_split_load.go line 51 at r1 (raw file):

// getResponseBoundarySpan computes the union span of the true spans that were
// iterated over (using the request span and the response's resumeSpan).

nit : drop the parens.

@KaiSun314 KaiSun314 force-pushed the use-response-data-more-efficient branch from 0ed1846 to 68b0f01 Compare November 9, 2022 22:34
Copy link
Contributor Author

@KaiSun314 KaiSun314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli)


pkg/kv/kvserver/replica_send.go line 404 at r1 (raw file):

Previously, kvoli (Austen) wrote…

Is there a reason you swapped to checking the br as opposed to the error - are there cases where we would return a pErr and also a br here?

It seems that executeBatchWithConcurrencyRetries returns either a non-nil br and nil pErr, or a nil br and a non-nil pErr.

To be safe though, I added both checks.

Copy link
Contributor Author

@KaiSun314 KaiSun314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review Austen!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli)

@KaiSun314 KaiSun314 force-pushed the use-response-data-more-efficient branch from 68b0f01 to 343085f Compare November 10, 2022 17:12
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @KaiSun314)


pkg/kv/kvserver/replica_split_load.go line 77 at r3 (raw file):

		}

		// TODO(kaisun): There are a few situations where the request did not

Could we add a tracking issue for this as well? Just to mention it is known behavior and was apparent in the previous request based splitter too.

@KaiSun314 KaiSun314 force-pushed the use-response-data-more-efficient branch from 343085f to 80078ec Compare November 11, 2022 03:45
Copy link
Contributor Author

@KaiSun314 KaiSun314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli)


pkg/kv/kvserver/replica_split_load.go line 77 at r3 (raw file):

Previously, kvoli (Austen) wrote…

Could we add a tracking issue for this as well? Just to mention it is known behavior and was apparent in the previous request based splitter too.

Done.

@tbg
Copy link
Member

tbg commented Nov 13, 2022

Drive-by comment, the release note seems much too technical. Release notes are consumed by the docs team who will prepare them for consumption by customers. It's unclear to me what they could conceivably make of the release note at hand. Also, I'm not sure a release note is even necessary: the perf regression never made it into a release, right?

@KaiSun314 KaiSun314 force-pushed the use-response-data-more-efficient branch from 80078ec to 9589632 Compare November 14, 2022 20:35
@KaiSun314
Copy link
Contributor Author

Ah true good point Tobi, thanks! I have changed to none for the release note.

When we incorporated the use of response data in the load-based
splitter, we called collectSpansRead, which is allocation heavy and
computationally expensive, resulting in a performance regression.

To address this, this patch performs 3 optimizations:
1. Remove the call to collectSpansRead; instead, add a custom function
to iterate over the batch of requests / responses and calculate the true
spans
2. Instead of constructing a *spanset.SpanSet and finding the union of
spans (which uses O(batch_size) memory), we directly compute the union
of spans while iterating over the batch resulting in only O(1) memory
used
3. Lazily compute the union of true spans only when it is truly needed
i.e. we are under heavy load (e.g. >2500QPS) and a load-based splitter
has been initialized

Release note: None
@KaiSun314 KaiSun314 force-pushed the use-response-data-more-efficient branch from abb7aa5 to ca76c28 Compare November 15, 2022 00:57
Copy link
Collaborator

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the test structure.

:lgtm:

Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @KaiSun314)

@KaiSun314
Copy link
Contributor Author

Thank you so much for the review!

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 22, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Nov 23, 2022

Build succeeded:

@craig craig bot merged commit 0d9669a into cockroachdb:master Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants