-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvcoord: optimize batch truncation loop #82865
Conversation
26d656e
to
92d7e0b
Compare
0500074
to
924f278
Compare
Added the support for the descending scan direction too. I also tried performing some bounds-checks elimination in the optimized strategies, but they didn't show any improvement in the benchmark, so I abandoned that idea. RFAL. |
Some macro benchmark numbers: here is the comparison of all TPCH queries (via 20 runs of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at the meaty third commit yet but cool results. For the second commit splitting out the TestTruncate move and the subsequent changes would help make reviewing it a bit easier.
Reviewed 1 of 1 files at r1, 3 of 6 files at r2, 3 of 6 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @tbg, and @yuzefovich)
pkg/kv/kvclient/kvcoord/batch.go
line 34 at r2 (raw file):
// ... // } // helper.Init(scanDir, requests)
[nit] Instead of helper.Init, how about a constructor for the helper that takes in whatever parameters you need? Looks like we'r always having to call .Init anyway.
pkg/kv/kvclient/kvcoord/batch.go
line 203 at r3 (raw file):
// We're dealing with a range-spanning request. if l, r := keys.IsLocal(h.headers[i].Key), keys.IsLocal(h.headers[i].EndKey); (l && !r) || (!l && r) { return errors.Errorf("local key mixed with global key in range")
AssertionFailedf for this and the one below?
pkg/kv/kvclient/kvcoord/batch.go
line 210 at r3 (raw file):
} if scanDir == Ascending { sort.Sort(ascBatchTruncationHelper{BatchTruncationHelper: h})
[nit] Here and below, would sort.Slice let us avoid defining two additional helper types?
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1540 at r1 (raw file):
// descriptor. If the underlying range seems to have split, we // recursively invoke divideAndSendBatchToRanges to re-enumerate the // ranges in the span and resend to each.
[nit] Add some commentary about the positions parameter (looks like you have some in the subsequent commit within the helper struct).
pkg/kv/kvclient/kvcoord/batch_test.go
line 257 at r2 (raw file):
// TestTruncate verifies the truncation logic of the BatchTruncationHelper over // a single range as well as truncateLegacy() function directly. func TestTruncate(t *testing.T) {
Entirely optional, but a randomized form of this test that asserts on the invariants (ordering of truncated requests, valid position indexes, etc.) we expect for a given set of random-but-well-formed inputs (keys, query span, scan direction, etc.) would give me a lot of confidence. Ditto for the test below with different seek keys. I'm thinking of something like this, which shook out a lot of bugs:
cockroach/pkg/spanconfig/spanconfigstore/span_store_test.go
Lines 29 to 41 in 4124128
// TestRandomized randomly sets/deletes span configs for arbitrary keyspans | |
// within some alphabet. For a test span, it then asserts that the config we | |
// retrieve is what we expect to find from the store. It also ensures that all | |
// ranges are non-overlapping, and that coalesced split-keys works as expected | |
// (adjacent configs, if identical, don't induce split points). | |
func TestRandomized(t *testing.T) { | |
defer leaktest.AfterTest(t)() | |
randutil.SeedForTests() | |
ctx := context.Background() | |
alphabet := "abcdefghijklmnopqrstuvwxyz" | |
configs := "ABCDEF" | |
ops := []string{"set", "del"} |
924f278
to
d7f981a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split out the test code movement into a separate commit.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @nvanbenschoten, and @tbg)
pkg/kv/kvclient/kvcoord/batch.go
line 34 at r2 (raw file):
Previously, irfansharif (irfan sharif) wrote…
[nit] Instead of helper.Init, how about a constructor for the helper that takes in whatever parameters you need? Looks like we'r always having to call .Init anyway.
Done.
pkg/kv/kvclient/kvcoord/batch.go
line 203 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
AssertionFailedf for this and the one below?
Done.
pkg/kv/kvclient/kvcoord/batch.go
line 210 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
[nit] Here and below, would sort.Slice let us avoid defining two additional helper types?
No, I don't think so - sort.Slice
operates on a single slice whereas we have many slices which we want to order according to the headers and make the swap atomic.
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1540 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
[nit] Add some commentary about the positions parameter (looks like you have some in the subsequent commit within the helper struct).
Done.
pkg/kv/kvclient/kvcoord/batch_test.go
line 257 at r2 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Entirely optional, but a randomized form of this test that asserts on the invariants (ordering of truncated requests, valid position indexes, etc.) we expect for a given set of random-but-well-formed inputs (keys, query span, scan direction, etc.) would give me a lot of confidence. Ditto for the test below with different seek keys. I'm thinking of something like this, which shook out a lot of bugs:
cockroach/pkg/spanconfig/spanconfigstore/span_store_test.go
Lines 29 to 41 in 4124128
// TestRandomized randomly sets/deletes span configs for arbitrary keyspans // within some alphabet. For a test span, it then asserts that the config we // retrieve is what we expect to find from the store. It also ensures that all // ranges are non-overlapping, and that coalesced split-keys works as expected // (adjacent configs, if identical, don't induce split points). func TestRandomized(t *testing.T) { defer leaktest.AfterTest(t)() randutil.SeedForTests() ctx := context.Background() alphabet := "abcdefghijklmnopqrstuvwxyz" configs := "ABCDEF" ops := []string{"set", "del"}
Done.
Friendly ping @irfansharif @nvanbenschoten. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty unfamiliar with this code and I tried understanding + reviewing it as much as I could, it . You'd definitely benefit from better reviewers than me, but that's usually always true. The benchmark results are pretty compelling!
Reviewed 1 of 3 files at r5, 2 of 5 files at r6, 6 of 6 files at r7, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @yuzefovich)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1529 at r7 (raw file):
// sendPartialBatch sends the supplied batch to the range specified by desc. // // The batch request is supposed to be truncated already so that it contains
Is this property worth an assertion? Perhaps under test builds if too expensive to do underotherwise.
pkg/kv/kvclient/kvcoord/batch_test.go
line 305 at r7 (raw file):
keys: [][2]string{{loc("b"), loc("e") + "\x00"}}, expKeys: [][2]string{{loc("b"), loc("e") + "\x00"}}, from: "b", to: "e\x00",
Is this right? Isn't the test above setting up "e\x00" key and retrieving it with a non-inclusive bound on "e\x00"? I'm probably reading things wrong.
pkg/kv/kvclient/kvcoord/dist_sender_server_test.go
line 562 at r7 (raw file):
{ name: "unsorted, non-overlapping, neither satisfied", bound: 6,
For my own understanding, why change some of these tests instead of adding new test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! I'll wait till the end of the week hoping that @nvanbenschoten will have some time to review this too.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1529 at r7 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Is this property worth an assertion? Perhaps under test builds if too expensive to do underotherwise.
Hm, it's a good question. We could add something like
diff --git a/pkg/kv/kvclient/kvcoord/dist_sender.go b/pkg/kv/kvclient/kvcoord/dist_sender.go
index 44cdbfca8b..4f6dac24f2 100644
--- a/pkg/kv/kvclient/kvcoord/dist_sender.go
+++ b/pkg/kv/kvclient/kvcoord/dist_sender.go
@@ -32,6 +32,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/rpc/nodedialer"
"github.com/cockroachdb/cockroach/pkg/settings"
"github.com/cockroachdb/cockroach/pkg/settings/cluster"
+ "github.com/cockroachdb/cockroach/pkg/util/buildutil"
"github.com/cockroachdb/cockroach/pkg/util/grpcutil"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/log"
@@ -1541,6 +1542,15 @@ func (ds *DistSender) sendPartialBatch(
routingTok rangecache.EvictionToken,
positions []int,
) response {
+ if buildutil.CrdbTestBuild {
+ baRS, err := keys.Range(ba.Requests)
+ if err != nil {
+ return response{pErr: roachpb.NewError(err)}
+ }
+ if !routingTok.Desc().ContainsKeyRange(baRS.Key, baRS.EndKey) {
+ return response{pErr: roachpb.NewErrorf("the batch wasn't truncated for sendPartialBatch")}
+ }
+ }
if batchIdx == 1 {
ds.metrics.PartialBatchCount.Inc(2) // account for first batch
} else if batchIdx > 1 {
does this check seem correct? I'm a bit afraid of using routingTok.Desc
in case it has become stale or something like that - the DistSender internally can re-execute batches from scratch in some of the scenarios instead of returning an error, but here we'd short-circuit that. Do you have an idea of such a check that would not introduce any regressions?
pkg/kv/kvclient/kvcoord/batch_test.go
line 305 at r7 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Is this right? Isn't the test above setting up "e\x00" key and retrieving it with a non-inclusive bound on "e\x00"? I'm probably reading things wrong.
This hasn't changed in this PR (last update in 2015), so I'm assuming it's right 😅 It's definitely something about local vs global keys distinction which I don't quite understand either.
pkg/kv/kvclient/kvcoord/dist_sender_server_test.go
line 562 at r7 (raw file):
Previously, irfansharif (irfan sharif) wrote…
For my own understanding, why change some of these tests instead of adding new test cases?
All changes in this commit to the tests were made to make the tests pass - previously, before the requests were sorted, the test cases assumed that partial response (because of the bound
limit) would contain response to the first requests in the batch (i.e. with the smallest positions). But now since we're sorting the requests internally, that might no longer be the case. So this commit tries to change the test cases so that regardless of how requests are ordered, the expected results are always the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @nvanbenschoten, and @yuzefovich)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1529 at r7 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Hm, it's a good question. We could add something like
diff --git a/pkg/kv/kvclient/kvcoord/dist_sender.go b/pkg/kv/kvclient/kvcoord/dist_sender.go index 44cdbfca8b..4f6dac24f2 100644 --- a/pkg/kv/kvclient/kvcoord/dist_sender.go +++ b/pkg/kv/kvclient/kvcoord/dist_sender.go @@ -32,6 +32,7 @@ import ( "github.com/cockroachdb/cockroach/pkg/rpc/nodedialer" "github.com/cockroachdb/cockroach/pkg/settings" "github.com/cockroachdb/cockroach/pkg/settings/cluster" + "github.com/cockroachdb/cockroach/pkg/util/buildutil" "github.com/cockroachdb/cockroach/pkg/util/grpcutil" "github.com/cockroachdb/cockroach/pkg/util/hlc" "github.com/cockroachdb/cockroach/pkg/util/log" @@ -1541,6 +1542,15 @@ func (ds *DistSender) sendPartialBatch( routingTok rangecache.EvictionToken, positions []int, ) response { + if buildutil.CrdbTestBuild { + baRS, err := keys.Range(ba.Requests) + if err != nil { + return response{pErr: roachpb.NewError(err)} + } + if !routingTok.Desc().ContainsKeyRange(baRS.Key, baRS.EndKey) { + return response{pErr: roachpb.NewErrorf("the batch wasn't truncated for sendPartialBatch")} + } + } if batchIdx == 1 { ds.metrics.PartialBatchCount.Inc(2) // account for first batch } else if batchIdx > 1 {
does this check seem correct? I'm a bit afraid of using
routingTok.Desc
in case it has become stale or something like that - the DistSender internally can re-execute batches from scratch in some of the scenarios instead of returning an error, but here we'd short-circuit that. Do you have an idea of such a check that would not introduce any regressions?
Oh, hm, forgot that this is possibly a stale view of the range descriptor. I'm not sure and I'll defer to you/Nathan, but the check was roughly what I was expecting
This commit refactors the DistSender's loop of iterating over the range descriptors so that the truncation of the BatchRequest happens serially. This incurs a minor performance hit when the requests are sent in parallel, but it makes it possible to apply the optimizations to this iteration in the following commits. Release note: None
This commit moves `TestTruncate` as well as `BenchmarkTruncate` into `batch_test.go` file in the preparation for the refactor done in the following commit. The code itself wasn't changed at all. Release note: None
This commit introduces a batch truncation helper that encompasses logic of truncating requests to the boundaries of a single range as well as returning the next key to seek the range iterator to. The helper is now used both in the DistSender as well as in the Streamer. No modification to the actual logic of `Truncate` nor `Next`/`prev` functions has been done other than incorporating the return of the next seek key into `Truncate` function itself. This is needed since the following commit will tightly couple the truncation process with the next seek key determination in order to optimize it. The helper can be configured with a knob indicating whether `Truncate` needs to return requests in the original order. This behavior is necessary by the BatchRequests that contain writes since in several spots we rely on the ordering assumptions (e.g. of increasing values of `Sequence`). The following adjustments were made to the tests: - `BenchmarkTruncate` has been renamed to `BenchmarkTruncateLegacy` - `TestTruncate` has been refactored to exercise the new and the old code-paths - `TestBatchPrevNext` has been refactored to run through the new code path, also a few test cases have been adjusted slightly. This commit also introduces some unit tests for the new code path when it runs in a loop over multiple ranges as well as a corresponding benchmark. Release note: None
This commit optimizes the batch truncation loop for the case when the requests only use global keys. The optimized approach sorts the requests according to their start keys (with the Ascending scan direction) or in the reverse order of their end keys (with the Descending scan direction). Then on each `Truncate` call it looks only at a subset of the requests (that haven't been fully processed yet and don't start after the current range), allowing us to avoid many unnecessary key comparisons. Please see a comment on the new `truncateAsc` and `truncateDesc` functions for more details and examples. The optimized approach actually has a worse time complexity in the absolute worst case (when each request is a range-spanning one and actually spans all of the ranges against which the requests are truncated) - because of the need to sort the requests upfront - but in practice, it is much faster, especially with point requests. ``` name old time/op new time/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 59.8µs ± 1% 33.9µs ± 3% -43.38% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 83.0µs ± 4% 69.7µs ± 7% -15.98% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 865µs ± 1% 62µs ± 1% -92.84% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 1.07ms ± 5% 0.46ms ± 8% -56.95% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 7.09ms ± 0% 5.99ms ± 1% -15.56% (p=0.000 n=10+9) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 9.22ms ± 0% 10.52ms ± 1% +14.08% (p=0.000 n=9+10) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 108ms ± 0% 6ms ± 2% -94.50% (p=0.000 n=8+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 129ms ± 1% 71ms ± 1% -45.06% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 60.2µs ± 1% 36.0µs ± 2% -40.14% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 82.4µs ± 8% 72.1µs ± 4% -12.51% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 862µs ± 1% 72µs ± 1% -91.65% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 1.06ms ± 4% 0.49ms ± 8% -53.39% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 7.24ms ± 0% 6.46ms ± 1% -10.74% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 10.1ms ± 2% 9.8ms ± 2% -2.36% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 107ms ± 0% 7ms ± 0% -93.57% (p=0.000 n=8+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 122ms ± 0% 80ms ± 1% -34.32% (p=0.000 n=9+9) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 78.9µs ± 1% 36.4µs ± 3% -53.81% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 79.4µs ± 4% 52.3µs ± 5% -34.14% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 1.16ms ± 1% 0.07ms ± 1% -94.39% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 1.01ms ± 4% 0.46ms ± 5% -54.56% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 9.42ms ± 0% 6.26ms ± 1% -33.52% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 8.41ms ± 1% 9.22ms ± 1% +9.54% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 145ms ± 0% 6ms ± 1% -95.63% (p=0.000 n=9+9) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 125ms ± 1% 67ms ± 1% -46.31% (p=0.000 n=9+9) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 77.8µs ± 1% 39.6µs ± 2% -49.10% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 74.0µs ± 3% 63.0µs ± 6% -14.92% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 1.16ms ± 1% 0.08ms ± 1% -93.47% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 1.04ms ± 5% 0.47ms ± 7% -54.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 9.50ms ± 0% 6.73ms ± 1% -29.21% (p=0.000 n=9+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 8.88ms ± 1% 13.24ms ± 1% +49.04% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 146ms ± 0% 7ms ± 1% -94.98% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 125ms ± 1% 74ms ± 1% -40.75% (p=0.000 n=10+9) name old alloc/op new alloc/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 7.58kB ± 0% 21.00kB ±11% +176.84% (p=0.000 n=7+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 39.8kB ± 6% 49.1kB ±15% +23.46% (p=0.000 n=9+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 6.48kB ± 5% 18.25kB ± 2% +181.79% (p=0.000 n=10+9) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 428kB ±20% 368kB ±13% -13.85% (p=0.003 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 1.60MB ± 0% 2.91MB ± 0% +82.49% (p=0.000 n=8+10) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 5.24MB ± 0% 5.89MB ± 0% +12.41% (p=0.000 n=10+8) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 1.15MB ± 4% 2.41MB ± 1% +110.09% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 69.8MB ± 1% 64.6MB ± 1% -7.55% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 9.77kB ±22% 21.98kB ± 4% +125.07% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 38.9kB ±23% 49.9kB ± 2% +28.28% (p=0.000 n=10+8) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 6.56kB ± 4% 20.20kB ± 3% +208.11% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 407kB ±15% 372kB ±13% -8.68% (p=0.043 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 1.65MB ± 0% 3.62MB ± 0% +118.55% (p=0.000 n=8+8) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 6.60MB ± 2% 5.65MB ± 1% -14.38% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.10MB ± 5% 2.77MB ± 1% +152.58% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 60.5MB ± 1% 67.9MB ± 1% +12.19% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 17.2kB ±10% 21.5kB ± 1% +24.91% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 35.5kB ±12% 29.1kB ± 4% -17.83% (p=0.000 n=10+9) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 138kB ± 1% 20kB ± 3% -85.34% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 344kB ±15% 363kB ±10% +5.50% (p=0.035 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 2.78MB ± 0% 3.35MB ± 0% +20.24% (p=0.000 n=8+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 4.42MB ± 6% 5.29MB ± 0% +19.67% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 17.9MB ± 0% 2.7MB ± 2% -85.21% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 65.3MB ± 0% 61.0MB ± 1% -6.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 15.9kB ± 3% 26.7kB ± 1% +67.87% (p=0.000 n=10+9) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 29.4kB ± 6% 41.6kB ± 5% +41.50% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 138kB ± 0% 23kB ± 3% -83.61% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 390kB ±19% 350kB ±11% -10.16% (p=0.015 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 2.69MB ± 4% 3.51MB ± 1% +30.22% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 4.89MB ± 1% 8.19MB ± 0% +67.68% (p=0.000 n=10+9) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 17.9MB ± 0% 3.0MB ± 2% -83.34% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 65.4MB ± 1% 62.9MB ± 1% -3.81% (p=0.000 n=10+10) name old allocs/op new allocs/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 50.0 ± 0% 52.4 ±10% +4.80% (p=0.017 n=8+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 569 ±12% 557 ±15% ~ (p=0.617 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 207 ± 4% 210 ± 5% ~ (p=0.380 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 6.97k ±13% 6.02k ± 9% -13.64% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 126 ± 0% 122 ± 0% -3.17% (p=0.002 n=8+10) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 51.9k ± 1% 42.8k ± 1% -17.59% (p=0.000 n=10+9) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 1.12k ± 1% 1.12k ± 1% +0.43% (p=0.027 n=10+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 786k ± 1% 714k ± 1% -9.13% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 41.2 ± 3% 58.0 ± 3% +40.78% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 574 ±18% 532 ±11% ~ (p=0.143 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 205 ± 2% 234 ± 3% +14.26% (p=0.000 n=9+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 6.70k ± 9% 6.00k ± 9% -10.40% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 127 ± 0% 125 ± 0% -1.57% (p=0.001 n=8+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 63.7k ± 1% 27.8k ± 2% -56.34% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.14k ± 0% 1.14k ± 1% ~ (p=0.515 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 696k ± 1% 752k ± 1% +7.97% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 554 ± 1% 169 ± 2% -69.52% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 519 ± 9% 268 ± 9% -48.32% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 8.38k ± 0% 0.33k ± 3% -96.06% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 5.90k ±10% 5.89k ± 5% ~ (p=0.796 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 65.7k ± 0% 16.5k ± 0% -74.87% (p=0.002 n=8+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 39.5k ± 1% 27.8k ± 2% -29.54% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 1.05M ± 0% 0.02M ± 0% -98.33% (p=0.000 n=9+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 741k ± 0% 679k ± 1% -8.32% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 559 ± 0% 182 ± 2% -67.42% (p=0.000 n=9+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 438 ± 8% 404 ±13% -7.76% (p=0.014 n=9+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 8.39k ± 0% 0.36k ± 5% -95.75% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 6.38k ±11% 5.56k ± 9% -12.85% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 65.7k ± 0% 16.5k ± 0% -74.84% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 46.8k ± 1% 67.6k ± 1% +44.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.05M ± 0% 0.02M ± 0% -98.33% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 739k ± 1% 694k ± 1% -6.08% (p=0.000 n=10+10) ``` The truncation loops for the optimized strategy are very similar in both directions, so I tried to extract the differences out into an interface. However, this showed non-trivial slow down and increase in allocations, so I chose to have some duplicated code to get the best performance. Here is a snippet of the comparison when the interface was prototyped: ``` name old time/op new time/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 36.9µs ± 3% 44.5µs ± 3% +20.55% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 74.8µs ± 5% 88.8µs ± 3% +18.73% (p=0.000 n=9+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 64.9µs ± 1% 78.3µs ± 1% +20.72% (p=0.000 n=10+9) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 471µs ± 8% 682µs ±13% +44.73% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 6.34ms ± 1% 7.39ms ± 0% +16.47% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 11.2ms ± 1% 12.4ms ± 1% +10.36% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 6.40ms ± 2% 7.39ms ± 1% +15.47% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 70.9ms ± 1% 102.0ms ± 2% +43.87% (p=0.000 n=9+9) name old alloc/op new alloc/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 22.2kB ± 9% 30.4kB ± 0% +36.55% (p=0.000 n=10+7) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 52.2kB ± 5% 67.6kB ± 4% +29.47% (p=0.000 n=8+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 20.0kB ± 2% 32.2kB ± 1% +60.86% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 372kB ±13% 600kB ±10% +61.29% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 3.24MB ± 0% 4.45MB ± 0% +37.42% (p=0.000 n=8+7) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 6.61MB ± 0% 7.86MB ± 0% +18.90% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 2.75MB ± 2% 3.74MB ± 1% +36.03% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 65.7MB ± 1% 97.2MB ± 1% +47.95% (p=0.000 n=10+10) name old allocs/op new allocs/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 177 ± 2% 314 ± 0% +77.40% (p=0.000 n=10+8) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 597 ± 8% 847 ± 8% +41.89% (p=0.000 n=9+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 329 ± 3% 531 ± 2% +61.40% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 6.02k ± 9% 9.56k ± 6% +58.80% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 16.5k ± 0% 32.9k ± 0% +99.17% (p=0.000 n=8+8) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 53.5k ± 1% 73.1k ± 1% +36.69% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 17.5k ± 0% 33.9k ± 0% +94.02% (p=0.000 n=6+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 727k ± 1% 1194k ± 1% +64.31% (p=0.000 n=10+10) ``` An additional knob is introduced to the batch truncation helper to indicate whether the helper can take ownership of the passed-in requests slice and reorder it as it pleases. This is the case for the Streamer, but the DistSender relies on the order of requests not being modified, so the helper makes a copy of the slice. Some tests needed an adjustment since now we process requests not necessarily in the original order, so the population of ResumeSpans might be different. Release note: None
d7f981a
to
067db69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I'll go ahead and merge this. Thanks for the review Irfan!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/dist_sender.go
line 1529 at r7 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Oh, hm, forgot that this is possibly a stale view of the range descriptor. I'm not sure and I'll defer to you/Nathan, but the check was roughly what I was expecting
I feel like it's ok to not have an assertion here given that this function is not exported, and it seems unlikely that a new caller will be introduced any time soon.
Build failed (retrying...): |
Build failed: |
Unrelated flake. bors r+ |
Build failed (retrying...): |
Build failed: |
Unrelated flakes. bors r+ |
Build succeeded: |
kvcoord: truncate BatchRequest to range boundaries serially
This commit refactors the DistSender's loop of iterating over the range
descriptors so that the truncation of the BatchRequest happens
serially. This incurs a minor performance hit when the requests are sent
in parallel, but it makes it possible to apply the optimizations to this
iteration in the following commits.
Release note: None
kvcoord: merge truncate_test into batch_test
This commit moves
TestTruncate
as well asBenchmarkTruncate
intobatch_test.go
file in the preparation for the refactor done in thefollowing commit. The code itself wasn't changed at all.
Release note: None
kvcoord: introduce batch truncation helper
This commit introduces a batch truncation helper that encompasses logic
of truncating requests to the boundaries of a single range as well as
returning the next key to seek the range iterator to. The helper is now
used both in the DistSender as well as in the Streamer. No modification
to the actual logic of
Truncate
norNext
/prev
functions has beendone other than incorporating the return of the next seek key into
Truncate
function itself. This is needed since the following commitwill tightly couple the truncation process with the next seek key
determination in order to optimize it.
The helper can be configured with a knob indicating whether
Truncate
needs to return requests in the original order. This behavior is
necessary by the BatchRequests that contain writes since in several
spots we rely on the ordering assumptions (e.g. of increasing values of
Sequence
).The following adjustments were made to the tests:
BenchmarkTruncate
has been renamed toBenchmarkTruncateLegacy
TestTruncate
has been refactored to exercise the new and the oldcode-paths
TestBatchPrevNext
has been refactored to run through the new codepath, also a few test cases have been adjusted slightly.
This commit also introduces some unit tests for the new code path when
it runs in a loop over multiple ranges as well as a corresponding
benchmark.
Release note: None
kvcoord: optimize batch truncation loop
This commit optimizes the batch truncation loop for the case when the
requests only use global keys.
The optimized approach sorts the requests according to their start keys
(with the Ascending scan direction) or in the reverse order of their end
keys (with the Descending scan direction). Then on each
Truncate
callit looks only at a subset of the requests (that haven't been fully
processed yet and don't start after the current range), allowing us to
avoid many unnecessary key comparisons. Please see a comment on the new
truncateAsc
andtruncateDesc
functions for more details and examples.The optimized approach actually has a worse time complexity in the
absolute worst case (when each request is a range-spanning one and
actually spans all of the ranges against which the requests are
truncated) - because of the need to sort the requests upfront - but in
practice, it is much faster, especially with point requests.
The truncation loops for the optimized strategy are very similar in both
directions, so I tried to extract the differences out into an interface.
However, this showed non-trivial slow down and increase in allocations,
so I chose to have some duplicated code to get the best performance.
Here is a snippet of the comparison when the interface was prototyped:
An additional knob is introduced to the batch truncation helper to
indicate whether the helper can take ownership of the passed-in requests
slice and reorder it as it pleases. This is the case for the Streamer,
but the DistSender relies on the order of requests not being modified,
so the helper makes a copy of the slice.
Some tests needed an adjustment since now we process requests not
necessarily in the original order, so the population of ResumeSpans
might be different.
Fixes: #68536.
Addresses: #82159.
Release note: None