Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvcoord: optimize batch truncation loop #82865

Merged
merged 4 commits into from
Jul 6, 2022

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jun 14, 2022

kvcoord: truncate BatchRequest to range boundaries serially

This commit refactors the DistSender's loop of iterating over the range
descriptors so that the truncation of the BatchRequest happens
serially. This incurs a minor performance hit when the requests are sent
in parallel, but it makes it possible to apply the optimizations to this
iteration in the following commits.

Release note: None

kvcoord: merge truncate_test into batch_test

This commit moves TestTruncate as well as BenchmarkTruncate into
batch_test.go file in the preparation for the refactor done in the
following commit. The code itself wasn't changed at all.

Release note: None

kvcoord: introduce batch truncation helper

This commit introduces a batch truncation helper that encompasses logic
of truncating requests to the boundaries of a single range as well as
returning the next key to seek the range iterator to. The helper is now
used both in the DistSender as well as in the Streamer. No modification
to the actual logic of Truncate nor Next/prev functions has been
done other than incorporating the return of the next seek key into
Truncate function itself. This is needed since the following commit
will tightly couple the truncation process with the next seek key
determination in order to optimize it.

The helper can be configured with a knob indicating whether Truncate
needs to return requests in the original order. This behavior is
necessary by the BatchRequests that contain writes since in several
spots we rely on the ordering assumptions (e.g. of increasing values of
Sequence).

The following adjustments were made to the tests:

  • BenchmarkTruncate has been renamed to BenchmarkTruncateLegacy
  • TestTruncate has been refactored to exercise the new and the old
    code-paths
  • TestBatchPrevNext has been refactored to run through the new code
    path, also a few test cases have been adjusted slightly.

This commit also introduces some unit tests for the new code path when
it runs in a loop over multiple ranges as well as a corresponding
benchmark.

Release note: None

kvcoord: optimize batch truncation loop

This commit optimizes the batch truncation loop for the case when the
requests only use global keys.

The optimized approach sorts the requests according to their start keys
(with the Ascending scan direction) or in the reverse order of their end
keys (with the Descending scan direction). Then on each Truncate call
it looks only at a subset of the requests (that haven't been fully
processed yet and don't start after the current range), allowing us to
avoid many unnecessary key comparisons. Please see a comment on the new
truncateAsc and truncateDesc functions for more details and examples.

The optimized approach actually has a worse time complexity in the
absolute worst case (when each request is a range-spanning one and
actually spans all of the ranges against which the requests are
truncated) - because of the need to sort the requests upfront - but in
practice, it is much faster, especially with point requests.

name                                                               old time/op    new time/op    delta
TruncateLoop/asc/reqs=128/ranges=4/type=get-24                       59.8µs ± 1%    33.9µs ± 3%   -43.38%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24                      83.0µs ± 4%    69.7µs ± 7%   -15.98%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=get-24                       865µs ± 1%      62µs ± 1%   -92.84%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24                     1.07ms ± 5%    0.46ms ± 8%   -56.95%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=get-24                     7.09ms ± 0%    5.99ms ± 1%   -15.56%  (p=0.000 n=10+9)
TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24                    9.22ms ± 0%   10.52ms ± 1%   +14.08%  (p=0.000 n=9+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=get-24                     108ms ± 0%       6ms ± 2%   -94.50%  (p=0.000 n=8+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24                    129ms ± 1%      71ms ± 1%   -45.06%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24         60.2µs ± 1%    36.0µs ± 2%   -40.14%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24        82.4µs ± 8%    72.1µs ± 4%   -12.51%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24         862µs ± 1%      72µs ± 1%   -91.65%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24       1.06ms ± 4%    0.49ms ± 8%   -53.39%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24       7.24ms ± 0%    6.46ms ± 1%   -10.74%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24      10.1ms ± 2%     9.8ms ± 2%    -2.36%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24       107ms ± 0%       7ms ± 0%   -93.57%  (p=0.000 n=8+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24      122ms ± 0%      80ms ± 1%   -34.32%  (p=0.000 n=9+9)
TruncateLoop/desc/reqs=128/ranges=4/type=get-24                      78.9µs ± 1%    36.4µs ± 3%   -53.81%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24                     79.4µs ± 4%    52.3µs ± 5%   -34.14%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24                     1.16ms ± 1%    0.07ms ± 1%   -94.39%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24                    1.01ms ± 4%    0.46ms ± 5%   -54.56%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24                    9.42ms ± 0%    6.26ms ± 1%   -33.52%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24                   8.41ms ± 1%    9.22ms ± 1%    +9.54%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24                    145ms ± 0%       6ms ± 1%   -95.63%  (p=0.000 n=9+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24                   125ms ± 1%      67ms ± 1%   -46.31%  (p=0.000 n=9+9)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24        77.8µs ± 1%    39.6µs ± 2%   -49.10%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24       74.0µs ± 3%    63.0µs ± 6%   -14.92%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24       1.16ms ± 1%    0.08ms ± 1%   -93.47%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24      1.04ms ± 5%    0.47ms ± 7%   -54.65%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24      9.50ms ± 0%    6.73ms ± 1%   -29.21%  (p=0.000 n=9+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24     8.88ms ± 1%   13.24ms ± 1%   +49.04%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24      146ms ± 0%       7ms ± 1%   -94.98%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24     125ms ± 1%      74ms ± 1%   -40.75%  (p=0.000 n=10+9)

name                                                               old alloc/op   new alloc/op   delta
TruncateLoop/asc/reqs=128/ranges=4/type=get-24                       7.58kB ± 0%   21.00kB ±11%  +176.84%  (p=0.000 n=7+10)
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24                      39.8kB ± 6%    49.1kB ±15%   +23.46%  (p=0.000 n=9+10)
TruncateLoop/asc/reqs=128/ranges=64/type=get-24                      6.48kB ± 5%   18.25kB ± 2%  +181.79%  (p=0.000 n=10+9)
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24                      428kB ±20%     368kB ±13%   -13.85%  (p=0.003 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=get-24                     1.60MB ± 0%    2.91MB ± 0%   +82.49%  (p=0.000 n=8+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24                    5.24MB ± 0%    5.89MB ± 0%   +12.41%  (p=0.000 n=10+8)
TruncateLoop/asc/reqs=16384/ranges=64/type=get-24                    1.15MB ± 4%    2.41MB ± 1%  +110.09%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24                   69.8MB ± 1%    64.6MB ± 1%    -7.55%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24         9.77kB ±22%   21.98kB ± 4%  +125.07%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24        38.9kB ±23%    49.9kB ± 2%   +28.28%  (p=0.000 n=10+8)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24        6.56kB ± 4%   20.20kB ± 3%  +208.11%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24        407kB ±15%     372kB ±13%    -8.68%  (p=0.043 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24       1.65MB ± 0%    3.62MB ± 0%  +118.55%  (p=0.000 n=8+8)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24      6.60MB ± 2%    5.65MB ± 1%   -14.38%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24      1.10MB ± 5%    2.77MB ± 1%  +152.58%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24     60.5MB ± 1%    67.9MB ± 1%   +12.19%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=get-24                      17.2kB ±10%    21.5kB ± 1%   +24.91%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24                     35.5kB ±12%    29.1kB ± 4%   -17.83%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24                      138kB ± 1%      20kB ± 3%   -85.34%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24                     344kB ±15%     363kB ±10%    +5.50%  (p=0.035 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24                    2.78MB ± 0%    3.35MB ± 0%   +20.24%  (p=0.000 n=8+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24                   4.42MB ± 6%    5.29MB ± 0%   +19.67%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24                   17.9MB ± 0%     2.7MB ± 2%   -85.21%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24                  65.3MB ± 0%    61.0MB ± 1%    -6.65%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24        15.9kB ± 3%    26.7kB ± 1%   +67.87%  (p=0.000 n=10+9)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24       29.4kB ± 6%    41.6kB ± 5%   +41.50%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24        138kB ± 0%      23kB ± 3%   -83.61%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24       390kB ±19%     350kB ±11%   -10.16%  (p=0.015 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24      2.69MB ± 4%    3.51MB ± 1%   +30.22%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24     4.89MB ± 1%    8.19MB ± 0%   +67.68%  (p=0.000 n=10+9)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24     17.9MB ± 0%     3.0MB ± 2%   -83.34%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24    65.4MB ± 1%    62.9MB ± 1%    -3.81%  (p=0.000 n=10+10)

name                                                               old allocs/op  new allocs/op  delta
TruncateLoop/asc/reqs=128/ranges=4/type=get-24                         50.0 ± 0%      52.4 ±10%    +4.80%  (p=0.017 n=8+10)
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24                         569 ±12%       557 ±15%      ~     (p=0.617 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=get-24                         207 ± 4%       210 ± 5%      ~     (p=0.380 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24                      6.97k ±13%     6.02k ± 9%   -13.64%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=get-24                        126 ± 0%       122 ± 0%    -3.17%  (p=0.002 n=8+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24                     51.9k ± 1%     42.8k ± 1%   -17.59%  (p=0.000 n=10+9)
TruncateLoop/asc/reqs=16384/ranges=64/type=get-24                     1.12k ± 1%     1.12k ± 1%    +0.43%  (p=0.027 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24                     786k ± 1%      714k ± 1%    -9.13%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24           41.2 ± 3%      58.0 ± 3%   +40.78%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24           574 ±18%       532 ±11%      ~     (p=0.143 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24           205 ± 2%       234 ± 3%   +14.26%  (p=0.000 n=9+9)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24        6.70k ± 9%     6.00k ± 9%   -10.40%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24          127 ± 0%       125 ± 0%    -1.57%  (p=0.001 n=8+9)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24       63.7k ± 1%     27.8k ± 2%   -56.34%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24       1.14k ± 0%     1.14k ± 1%      ~     (p=0.515 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24       696k ± 1%      752k ± 1%    +7.97%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=get-24                         554 ± 1%       169 ± 2%   -69.52%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24                        519 ± 9%       268 ± 9%   -48.32%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24                      8.38k ± 0%     0.33k ± 3%   -96.06%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24                     5.90k ±10%     5.89k ± 5%      ~     (p=0.796 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24                     65.7k ± 0%     16.5k ± 0%   -74.87%  (p=0.002 n=8+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24                    39.5k ± 1%     27.8k ± 2%   -29.54%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24                    1.05M ± 0%     0.02M ± 0%   -98.33%  (p=0.000 n=9+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24                    741k ± 0%      679k ± 1%    -8.32%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24           559 ± 0%       182 ± 2%   -67.42%  (p=0.000 n=9+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24          438 ± 8%       404 ±13%    -7.76%  (p=0.014 n=9+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24        8.39k ± 0%     0.36k ± 5%   -95.75%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24       6.38k ±11%     5.56k ± 9%   -12.85%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24       65.7k ± 0%     16.5k ± 0%   -74.84%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24      46.8k ± 1%     67.6k ± 1%   +44.65%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24      1.05M ± 0%     0.02M ± 0%   -98.33%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24      739k ± 1%      694k ± 1%    -6.08%  (p=0.000 n=10+10)

The truncation loops for the optimized strategy are very similar in both
directions, so I tried to extract the differences out into an interface.
However, this showed non-trivial slow down and increase in allocations,
so I chose to have some duplicated code to get the best performance.
Here is a snippet of the comparison when the interface was prototyped:

name                                                 old time/op    new time/op    delta
TruncateLoop/desc/reqs=128/ranges=4/type=get-24        36.9µs ± 3%    44.5µs ± 3%  +20.55%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24       74.8µs ± 5%    88.8µs ± 3%  +18.73%  (p=0.000 n=9+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24       64.9µs ± 1%    78.3µs ± 1%  +20.72%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24       471µs ± 8%     682µs ±13%  +44.73%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24      6.34ms ± 1%    7.39ms ± 0%  +16.47%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24     11.2ms ± 1%    12.4ms ± 1%  +10.36%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24     6.40ms ± 2%    7.39ms ± 1%  +15.47%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24    70.9ms ± 1%   102.0ms ± 2%  +43.87%  (p=0.000 n=9+9)

name                                                 old alloc/op   new alloc/op   delta
TruncateLoop/desc/reqs=128/ranges=4/type=get-24        22.2kB ± 9%    30.4kB ± 0%  +36.55%  (p=0.000 n=10+7)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24       52.2kB ± 5%    67.6kB ± 4%  +29.47%  (p=0.000 n=8+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24       20.0kB ± 2%    32.2kB ± 1%  +60.86%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24       372kB ±13%     600kB ±10%  +61.29%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24      3.24MB ± 0%    4.45MB ± 0%  +37.42%  (p=0.000 n=8+7)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24     6.61MB ± 0%    7.86MB ± 0%  +18.90%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24     2.75MB ± 2%    3.74MB ± 1%  +36.03%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24    65.7MB ± 1%    97.2MB ± 1%  +47.95%  (p=0.000 n=10+10)

name                                                 old allocs/op  new allocs/op  delta
TruncateLoop/desc/reqs=128/ranges=4/type=get-24           177 ± 2%       314 ± 0%  +77.40%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24          597 ± 8%       847 ± 8%  +41.89%  (p=0.000 n=9+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24          329 ± 3%       531 ± 2%  +61.40%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24       6.02k ± 9%     9.56k ± 6%  +58.80%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24       16.5k ± 0%     32.9k ± 0%  +99.17%  (p=0.000 n=8+8)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24      53.5k ± 1%     73.1k ± 1%  +36.69%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24      17.5k ± 0%     33.9k ± 0%  +94.02%  (p=0.000 n=6+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24      727k ± 1%     1194k ± 1%  +64.31%  (p=0.000 n=10+10)

An additional knob is introduced to the batch truncation helper to
indicate whether the helper can take ownership of the passed-in requests
slice and reorder it as it pleases. This is the case for the Streamer,
but the DistSender relies on the order of requests not being modified,
so the helper makes a copy of the slice.

Some tests needed an adjustment since now we process requests not
necessarily in the original order, so the population of ResumeSpans
might be different.

Fixes: #68536.
Addresses: #82159.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the truncate-next-ci branch 10 times, most recently from 26d656e to 92d7e0b Compare June 15, 2022 04:50
@yuzefovich yuzefovich requested review from nvanbenschoten, irfansharif and a team June 15, 2022 04:53
@yuzefovich yuzefovich marked this pull request as ready for review June 15, 2022 04:53
@yuzefovich yuzefovich force-pushed the truncate-next-ci branch 2 times, most recently from 0500074 to 924f278 Compare June 15, 2022 18:39
@yuzefovich yuzefovich requested a review from tbg June 15, 2022 18:40
@yuzefovich
Copy link
Member Author

Added the support for the descending scan direction too. I also tried performing some bounds-checks elimination in the optimized strategies, but they didn't show any improvement in the benchmark, so I abandoned that idea.

RFAL.

@yuzefovich
Copy link
Member Author

yuzefovich commented Jun 16, 2022

Some macro benchmark numbers: here is the comparison of all TPCH queries (via 20 runs of tpchvec roachtest), before this PR and after this PR (with the current default settings, meaning the streamer is enabled):

Q1: 3.24s	3.25s	0.30%
Q2: 0.19s	0.19s	0.47%
Q3: 1.39s	1.35s	-2.86%
Q4: 1.04s	1.04s	0.29%
Q5: 1.81s	1.70s	-5.95%
Q6: 6.62s	5.18s	-21.78%
Q7: 5.50s	4.72s	-14.19%
Q8: 0.81s	0.71s	-12.11%
Q9: 4.26s	4.04s	-5.17%
Q10: 1.39s	1.34s	-3.37%
Q11: 0.51s	0.48s	-6.08%
Q12: 6.77s	5.23s	-22.71%
Q13: 1.03s	0.98s	-5.57%
Q14: 0.55s	0.44s	-19.58%
Q15: 4.24s	3.42s	-19.51%
Q16: 0.76s	0.74s	-2.20%
Q17: 0.70s	0.68s	-2.98%
Q18: 5.45s	5.61s	3.09%
Q19: 3.36s	3.33s	-0.91%
Q20: 11.18s	9.52s	-14.79%
Q21: 5.75s	4.84s	-15.81%
Q22: 0.53s	0.54s	1.70%

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the meaty third commit yet but cool results. For the second commit splitting out the TestTruncate move and the subsequent changes would help make reviewing it a bit easier.

Reviewed 1 of 1 files at r1, 3 of 6 files at r2, 3 of 6 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @tbg, and @yuzefovich)


pkg/kv/kvclient/kvcoord/batch.go line 34 at r2 (raw file):

//     ...
//   }
//   helper.Init(scanDir, requests)

[nit] Instead of helper.Init, how about a constructor for the helper that takes in whatever parameters you need? Looks like we'r always having to call .Init anyway.


pkg/kv/kvclient/kvcoord/batch.go line 203 at r3 (raw file):

			// We're dealing with a range-spanning request.
			if l, r := keys.IsLocal(h.headers[i].Key), keys.IsLocal(h.headers[i].EndKey); (l && !r) || (!l && r) {
				return errors.Errorf("local key mixed with global key in range")

AssertionFailedf for this and the one below?


pkg/kv/kvclient/kvcoord/batch.go line 210 at r3 (raw file):

	}
	if scanDir == Ascending {
		sort.Sort(ascBatchTruncationHelper{BatchTruncationHelper: h})

[nit] Here and below, would sort.Slice let us avoid defining two additional helper types?


pkg/kv/kvclient/kvcoord/dist_sender.go line 1540 at r1 (raw file):

// descriptor. If the underlying range seems to have split, we
// recursively invoke divideAndSendBatchToRanges to re-enumerate the
// ranges in the span and resend to each.

[nit] Add some commentary about the positions parameter (looks like you have some in the subsequent commit within the helper struct).


pkg/kv/kvclient/kvcoord/batch_test.go line 257 at r2 (raw file):

// TestTruncate verifies the truncation logic of the BatchTruncationHelper over
// a single range as well as truncateLegacy() function directly.
func TestTruncate(t *testing.T) {

Entirely optional, but a randomized form of this test that asserts on the invariants (ordering of truncated requests, valid position indexes, etc.) we expect for a given set of random-but-well-formed inputs (keys, query span, scan direction, etc.) would give me a lot of confidence. Ditto for the test below with different seek keys. I'm thinking of something like this, which shook out a lot of bugs:

// TestRandomized randomly sets/deletes span configs for arbitrary keyspans
// within some alphabet. For a test span, it then asserts that the config we
// retrieve is what we expect to find from the store. It also ensures that all
// ranges are non-overlapping, and that coalesced split-keys works as expected
// (adjacent configs, if identical, don't induce split points).
func TestRandomized(t *testing.T) {
defer leaktest.AfterTest(t)()
randutil.SeedForTests()
ctx := context.Background()
alphabet := "abcdefghijklmnopqrstuvwxyz"
configs := "ABCDEF"
ops := []string{"set", "del"}

@blathers-crl blathers-crl bot requested a review from irfansharif June 22, 2022 00:55
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split out the test code movement into a separate commit.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @nvanbenschoten, and @tbg)


pkg/kv/kvclient/kvcoord/batch.go line 34 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[nit] Instead of helper.Init, how about a constructor for the helper that takes in whatever parameters you need? Looks like we'r always having to call .Init anyway.

Done.


pkg/kv/kvclient/kvcoord/batch.go line 203 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

AssertionFailedf for this and the one below?

Done.


pkg/kv/kvclient/kvcoord/batch.go line 210 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[nit] Here and below, would sort.Slice let us avoid defining two additional helper types?

No, I don't think so - sort.Slice operates on a single slice whereas we have many slices which we want to order according to the headers and make the swap atomic.


pkg/kv/kvclient/kvcoord/dist_sender.go line 1540 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[nit] Add some commentary about the positions parameter (looks like you have some in the subsequent commit within the helper struct).

Done.


pkg/kv/kvclient/kvcoord/batch_test.go line 257 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Entirely optional, but a randomized form of this test that asserts on the invariants (ordering of truncated requests, valid position indexes, etc.) we expect for a given set of random-but-well-formed inputs (keys, query span, scan direction, etc.) would give me a lot of confidence. Ditto for the test below with different seek keys. I'm thinking of something like this, which shook out a lot of bugs:

// TestRandomized randomly sets/deletes span configs for arbitrary keyspans
// within some alphabet. For a test span, it then asserts that the config we
// retrieve is what we expect to find from the store. It also ensures that all
// ranges are non-overlapping, and that coalesced split-keys works as expected
// (adjacent configs, if identical, don't induce split points).
func TestRandomized(t *testing.T) {
defer leaktest.AfterTest(t)()
randutil.SeedForTests()
ctx := context.Background()
alphabet := "abcdefghijklmnopqrstuvwxyz"
configs := "ABCDEF"
ops := []string{"set", "del"}

Done.

@tbg tbg removed their request for review June 22, 2022 19:50
@yuzefovich
Copy link
Member Author

Friendly ping @irfansharif @nvanbenschoten.

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty unfamiliar with this code and I tried understanding + reviewing it as much as I could, it :lgtm:. You'd definitely benefit from better reviewers than me, but that's usually always true. The benchmark results are pretty compelling!

Reviewed 1 of 3 files at r5, 2 of 5 files at r6, 6 of 6 files at r7, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @yuzefovich)


pkg/kv/kvclient/kvcoord/dist_sender.go line 1529 at r7 (raw file):

// sendPartialBatch sends the supplied batch to the range specified by desc.
//
// The batch request is supposed to be truncated already so that it contains

Is this property worth an assertion? Perhaps under test builds if too expensive to do underotherwise.


pkg/kv/kvclient/kvcoord/batch_test.go line 305 at r7 (raw file):

			keys:    [][2]string{{loc("b"), loc("e") + "\x00"}},
			expKeys: [][2]string{{loc("b"), loc("e") + "\x00"}},
			from:    "b", to: "e\x00",

Is this right? Isn't the test above setting up "e\x00" key and retrieving it with a non-inclusive bound on "e\x00"? I'm probably reading things wrong.


pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 562 at r7 (raw file):

		{
			name:  "unsorted, non-overlapping, neither satisfied",
			bound: 6,

For my own understanding, why change some of these tests instead of adding new test cases?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I'll wait till the end of the week hoping that @nvanbenschoten will have some time to review this too.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/dist_sender.go line 1529 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Is this property worth an assertion? Perhaps under test builds if too expensive to do underotherwise.

Hm, it's a good question. We could add something like

diff --git a/pkg/kv/kvclient/kvcoord/dist_sender.go b/pkg/kv/kvclient/kvcoord/dist_sender.go
index 44cdbfca8b..4f6dac24f2 100644
--- a/pkg/kv/kvclient/kvcoord/dist_sender.go
+++ b/pkg/kv/kvclient/kvcoord/dist_sender.go
@@ -32,6 +32,7 @@ import (
        "github.com/cockroachdb/cockroach/pkg/rpc/nodedialer"
        "github.com/cockroachdb/cockroach/pkg/settings"
        "github.com/cockroachdb/cockroach/pkg/settings/cluster"
+       "github.com/cockroachdb/cockroach/pkg/util/buildutil"
        "github.com/cockroachdb/cockroach/pkg/util/grpcutil"
        "github.com/cockroachdb/cockroach/pkg/util/hlc"
        "github.com/cockroachdb/cockroach/pkg/util/log"
@@ -1541,6 +1542,15 @@ func (ds *DistSender) sendPartialBatch(
        routingTok rangecache.EvictionToken,
        positions []int,
 ) response {
+       if buildutil.CrdbTestBuild {
+               baRS, err := keys.Range(ba.Requests)
+               if err != nil {
+                       return response{pErr: roachpb.NewError(err)}
+               }
+               if !routingTok.Desc().ContainsKeyRange(baRS.Key, baRS.EndKey) {
+                       return response{pErr: roachpb.NewErrorf("the batch wasn't truncated for sendPartialBatch")}
+               }
+       }
        if batchIdx == 1 {
                ds.metrics.PartialBatchCount.Inc(2) // account for first batch
        } else if batchIdx > 1 {

does this check seem correct? I'm a bit afraid of using routingTok.Desc in case it has become stale or something like that - the DistSender internally can re-execute batches from scratch in some of the scenarios instead of returning an error, but here we'd short-circuit that. Do you have an idea of such a check that would not introduce any regressions?


pkg/kv/kvclient/kvcoord/batch_test.go line 305 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Is this right? Isn't the test above setting up "e\x00" key and retrieving it with a non-inclusive bound on "e\x00"? I'm probably reading things wrong.

This hasn't changed in this PR (last update in 2015), so I'm assuming it's right 😅 It's definitely something about local vs global keys distinction which I don't quite understand either.


pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 562 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

For my own understanding, why change some of these tests instead of adding new test cases?

All changes in this commit to the tests were made to make the tests pass - previously, before the requests were sorted, the test cases assumed that partial response (because of the bound limit) would contain response to the first requests in the batch (i.e. with the smallest positions). But now since we're sorting the requests internally, that might no longer be the case. So this commit tries to change the test cases so that regardless of how requests are ordered, the expected results are always the same.

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @nvanbenschoten, and @yuzefovich)


pkg/kv/kvclient/kvcoord/dist_sender.go line 1529 at r7 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, it's a good question. We could add something like

diff --git a/pkg/kv/kvclient/kvcoord/dist_sender.go b/pkg/kv/kvclient/kvcoord/dist_sender.go
index 44cdbfca8b..4f6dac24f2 100644
--- a/pkg/kv/kvclient/kvcoord/dist_sender.go
+++ b/pkg/kv/kvclient/kvcoord/dist_sender.go
@@ -32,6 +32,7 @@ import (
        "github.com/cockroachdb/cockroach/pkg/rpc/nodedialer"
        "github.com/cockroachdb/cockroach/pkg/settings"
        "github.com/cockroachdb/cockroach/pkg/settings/cluster"
+       "github.com/cockroachdb/cockroach/pkg/util/buildutil"
        "github.com/cockroachdb/cockroach/pkg/util/grpcutil"
        "github.com/cockroachdb/cockroach/pkg/util/hlc"
        "github.com/cockroachdb/cockroach/pkg/util/log"
@@ -1541,6 +1542,15 @@ func (ds *DistSender) sendPartialBatch(
        routingTok rangecache.EvictionToken,
        positions []int,
 ) response {
+       if buildutil.CrdbTestBuild {
+               baRS, err := keys.Range(ba.Requests)
+               if err != nil {
+                       return response{pErr: roachpb.NewError(err)}
+               }
+               if !routingTok.Desc().ContainsKeyRange(baRS.Key, baRS.EndKey) {
+                       return response{pErr: roachpb.NewErrorf("the batch wasn't truncated for sendPartialBatch")}
+               }
+       }
        if batchIdx == 1 {
                ds.metrics.PartialBatchCount.Inc(2) // account for first batch
        } else if batchIdx > 1 {

does this check seem correct? I'm a bit afraid of using routingTok.Desc in case it has become stale or something like that - the DistSender internally can re-execute batches from scratch in some of the scenarios instead of returning an error, but here we'd short-circuit that. Do you have an idea of such a check that would not introduce any regressions?

Oh, hm, forgot that this is possibly a stale view of the range descriptor. I'm not sure and I'll defer to you/Nathan, but the check was roughly what I was expecting

This commit refactors the DistSender's loop of iterating over the range
descriptors so that the truncation of the BatchRequest happens
serially. This incurs a minor performance hit when the requests are sent
in parallel, but it makes it possible to apply the optimizations to this
iteration in the following commits.

Release note: None
This commit moves `TestTruncate` as well as `BenchmarkTruncate` into
`batch_test.go` file in the preparation for the refactor done in the
following commit. The code itself wasn't changed at all.

Release note: None
This commit introduces a batch truncation helper that encompasses logic
of truncating requests to the boundaries of a single range as well as
returning the next key to seek the range iterator to. The helper is now
used both in the DistSender as well as in the Streamer. No modification
to the actual logic of `Truncate` nor `Next`/`prev` functions has been
done other than incorporating the return of the next seek key into
`Truncate` function itself. This is needed since the following commit
will tightly couple the truncation process with the next seek key
determination in order to optimize it.

The helper can be configured with a knob indicating whether `Truncate`
needs to return requests in the original order. This behavior is
necessary by the BatchRequests that contain writes since in several
spots we rely on the ordering assumptions (e.g. of increasing values of
`Sequence`).

The following adjustments were made to the tests:
- `BenchmarkTruncate` has been renamed to `BenchmarkTruncateLegacy`
- `TestTruncate` has been refactored to exercise the new and the old
code-paths
- `TestBatchPrevNext` has been refactored to run through the new code
path, also a few test cases have been adjusted slightly.

This commit also introduces some unit tests for the new code path when
it runs in a loop over multiple ranges as well as a corresponding
benchmark.

Release note: None
This commit optimizes the batch truncation loop for the case when the
requests only use global keys.

The optimized approach sorts the requests according to their start keys
(with the Ascending scan direction) or in the reverse order of their end
keys (with the Descending scan direction).  Then on each `Truncate` call
it looks only at a subset of the requests (that haven't been fully
processed yet and don't start after the current range), allowing us to
avoid many unnecessary key comparisons. Please see a comment on the new
`truncateAsc` and `truncateDesc` functions for more details and examples.

The optimized approach actually has a worse time complexity in the
absolute worst case (when each request is a range-spanning one and
actually spans all of the ranges against which the requests are
truncated) - because of the need to sort the requests upfront - but in
practice, it is much faster, especially with point requests.

```
name                                                               old time/op    new time/op    delta
TruncateLoop/asc/reqs=128/ranges=4/type=get-24                       59.8µs ± 1%    33.9µs ± 3%   -43.38%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24                      83.0µs ± 4%    69.7µs ± 7%   -15.98%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=get-24                       865µs ± 1%      62µs ± 1%   -92.84%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24                     1.07ms ± 5%    0.46ms ± 8%   -56.95%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=get-24                     7.09ms ± 0%    5.99ms ± 1%   -15.56%  (p=0.000 n=10+9)
TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24                    9.22ms ± 0%   10.52ms ± 1%   +14.08%  (p=0.000 n=9+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=get-24                     108ms ± 0%       6ms ± 2%   -94.50%  (p=0.000 n=8+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24                    129ms ± 1%      71ms ± 1%   -45.06%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24         60.2µs ± 1%    36.0µs ± 2%   -40.14%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24        82.4µs ± 8%    72.1µs ± 4%   -12.51%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24         862µs ± 1%      72µs ± 1%   -91.65%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24       1.06ms ± 4%    0.49ms ± 8%   -53.39%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24       7.24ms ± 0%    6.46ms ± 1%   -10.74%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24      10.1ms ± 2%     9.8ms ± 2%    -2.36%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24       107ms ± 0%       7ms ± 0%   -93.57%  (p=0.000 n=8+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24      122ms ± 0%      80ms ± 1%   -34.32%  (p=0.000 n=9+9)
TruncateLoop/desc/reqs=128/ranges=4/type=get-24                      78.9µs ± 1%    36.4µs ± 3%   -53.81%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24                     79.4µs ± 4%    52.3µs ± 5%   -34.14%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24                     1.16ms ± 1%    0.07ms ± 1%   -94.39%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24                    1.01ms ± 4%    0.46ms ± 5%   -54.56%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24                    9.42ms ± 0%    6.26ms ± 1%   -33.52%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24                   8.41ms ± 1%    9.22ms ± 1%    +9.54%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24                    145ms ± 0%       6ms ± 1%   -95.63%  (p=0.000 n=9+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24                   125ms ± 1%      67ms ± 1%   -46.31%  (p=0.000 n=9+9)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24        77.8µs ± 1%    39.6µs ± 2%   -49.10%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24       74.0µs ± 3%    63.0µs ± 6%   -14.92%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24       1.16ms ± 1%    0.08ms ± 1%   -93.47%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24      1.04ms ± 5%    0.47ms ± 7%   -54.65%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24      9.50ms ± 0%    6.73ms ± 1%   -29.21%  (p=0.000 n=9+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24     8.88ms ± 1%   13.24ms ± 1%   +49.04%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24      146ms ± 0%       7ms ± 1%   -94.98%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24     125ms ± 1%      74ms ± 1%   -40.75%  (p=0.000 n=10+9)

name                                                               old alloc/op   new alloc/op   delta
TruncateLoop/asc/reqs=128/ranges=4/type=get-24                       7.58kB ± 0%   21.00kB ±11%  +176.84%  (p=0.000 n=7+10)
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24                      39.8kB ± 6%    49.1kB ±15%   +23.46%  (p=0.000 n=9+10)
TruncateLoop/asc/reqs=128/ranges=64/type=get-24                      6.48kB ± 5%   18.25kB ± 2%  +181.79%  (p=0.000 n=10+9)
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24                      428kB ±20%     368kB ±13%   -13.85%  (p=0.003 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=get-24                     1.60MB ± 0%    2.91MB ± 0%   +82.49%  (p=0.000 n=8+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24                    5.24MB ± 0%    5.89MB ± 0%   +12.41%  (p=0.000 n=10+8)
TruncateLoop/asc/reqs=16384/ranges=64/type=get-24                    1.15MB ± 4%    2.41MB ± 1%  +110.09%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24                   69.8MB ± 1%    64.6MB ± 1%    -7.55%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24         9.77kB ±22%   21.98kB ± 4%  +125.07%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24        38.9kB ±23%    49.9kB ± 2%   +28.28%  (p=0.000 n=10+8)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24        6.56kB ± 4%   20.20kB ± 3%  +208.11%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24        407kB ±15%     372kB ±13%    -8.68%  (p=0.043 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24       1.65MB ± 0%    3.62MB ± 0%  +118.55%  (p=0.000 n=8+8)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24      6.60MB ± 2%    5.65MB ± 1%   -14.38%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24      1.10MB ± 5%    2.77MB ± 1%  +152.58%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24     60.5MB ± 1%    67.9MB ± 1%   +12.19%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=get-24                      17.2kB ±10%    21.5kB ± 1%   +24.91%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24                     35.5kB ±12%    29.1kB ± 4%   -17.83%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24                      138kB ± 1%      20kB ± 3%   -85.34%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24                     344kB ±15%     363kB ±10%    +5.50%  (p=0.035 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24                    2.78MB ± 0%    3.35MB ± 0%   +20.24%  (p=0.000 n=8+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24                   4.42MB ± 6%    5.29MB ± 0%   +19.67%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24                   17.9MB ± 0%     2.7MB ± 2%   -85.21%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24                  65.3MB ± 0%    61.0MB ± 1%    -6.65%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24        15.9kB ± 3%    26.7kB ± 1%   +67.87%  (p=0.000 n=10+9)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24       29.4kB ± 6%    41.6kB ± 5%   +41.50%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24        138kB ± 0%      23kB ± 3%   -83.61%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24       390kB ±19%     350kB ±11%   -10.16%  (p=0.015 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24      2.69MB ± 4%    3.51MB ± 1%   +30.22%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24     4.89MB ± 1%    8.19MB ± 0%   +67.68%  (p=0.000 n=10+9)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24     17.9MB ± 0%     3.0MB ± 2%   -83.34%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24    65.4MB ± 1%    62.9MB ± 1%    -3.81%  (p=0.000 n=10+10)

name                                                               old allocs/op  new allocs/op  delta
TruncateLoop/asc/reqs=128/ranges=4/type=get-24                         50.0 ± 0%      52.4 ±10%    +4.80%  (p=0.017 n=8+10)
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24                         569 ±12%       557 ±15%      ~     (p=0.617 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=get-24                         207 ± 4%       210 ± 5%      ~     (p=0.380 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24                      6.97k ±13%     6.02k ± 9%   -13.64%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=get-24                        126 ± 0%       122 ± 0%    -3.17%  (p=0.002 n=8+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24                     51.9k ± 1%     42.8k ± 1%   -17.59%  (p=0.000 n=10+9)
TruncateLoop/asc/reqs=16384/ranges=64/type=get-24                     1.12k ± 1%     1.12k ± 1%    +0.43%  (p=0.027 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24                     786k ± 1%      714k ± 1%    -9.13%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24           41.2 ± 3%      58.0 ± 3%   +40.78%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24           574 ±18%       532 ±11%      ~     (p=0.143 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24           205 ± 2%       234 ± 3%   +14.26%  (p=0.000 n=9+9)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24        6.70k ± 9%     6.00k ± 9%   -10.40%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24          127 ± 0%       125 ± 0%    -1.57%  (p=0.001 n=8+9)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24       63.7k ± 1%     27.8k ± 2%   -56.34%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24       1.14k ± 0%     1.14k ± 1%      ~     (p=0.515 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24       696k ± 1%      752k ± 1%    +7.97%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=get-24                         554 ± 1%       169 ± 2%   -69.52%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24                        519 ± 9%       268 ± 9%   -48.32%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24                      8.38k ± 0%     0.33k ± 3%   -96.06%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24                     5.90k ±10%     5.89k ± 5%      ~     (p=0.796 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24                     65.7k ± 0%     16.5k ± 0%   -74.87%  (p=0.002 n=8+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24                    39.5k ± 1%     27.8k ± 2%   -29.54%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24                    1.05M ± 0%     0.02M ± 0%   -98.33%  (p=0.000 n=9+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24                    741k ± 0%      679k ± 1%    -8.32%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24           559 ± 0%       182 ± 2%   -67.42%  (p=0.000 n=9+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24          438 ± 8%       404 ±13%    -7.76%  (p=0.014 n=9+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24        8.39k ± 0%     0.36k ± 5%   -95.75%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24       6.38k ±11%     5.56k ± 9%   -12.85%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24       65.7k ± 0%     16.5k ± 0%   -74.84%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24      46.8k ± 1%     67.6k ± 1%   +44.65%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24      1.05M ± 0%     0.02M ± 0%   -98.33%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24      739k ± 1%      694k ± 1%    -6.08%  (p=0.000 n=10+10)
```

The truncation loops for the optimized strategy are very similar in both
directions, so I tried to extract the differences out into an interface.
However, this showed non-trivial slow down and increase in allocations,
so I chose to have some duplicated code to get the best performance.
Here is a snippet of the comparison when the interface was prototyped:

```
name                                                 old time/op    new time/op    delta
TruncateLoop/desc/reqs=128/ranges=4/type=get-24        36.9µs ± 3%    44.5µs ± 3%  +20.55%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24       74.8µs ± 5%    88.8µs ± 3%  +18.73%  (p=0.000 n=9+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24       64.9µs ± 1%    78.3µs ± 1%  +20.72%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24       471µs ± 8%     682µs ±13%  +44.73%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24      6.34ms ± 1%    7.39ms ± 0%  +16.47%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24     11.2ms ± 1%    12.4ms ± 1%  +10.36%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24     6.40ms ± 2%    7.39ms ± 1%  +15.47%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24    70.9ms ± 1%   102.0ms ± 2%  +43.87%  (p=0.000 n=9+9)

name                                                 old alloc/op   new alloc/op   delta
TruncateLoop/desc/reqs=128/ranges=4/type=get-24        22.2kB ± 9%    30.4kB ± 0%  +36.55%  (p=0.000 n=10+7)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24       52.2kB ± 5%    67.6kB ± 4%  +29.47%  (p=0.000 n=8+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24       20.0kB ± 2%    32.2kB ± 1%  +60.86%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24       372kB ±13%     600kB ±10%  +61.29%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24      3.24MB ± 0%    4.45MB ± 0%  +37.42%  (p=0.000 n=8+7)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24     6.61MB ± 0%    7.86MB ± 0%  +18.90%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24     2.75MB ± 2%    3.74MB ± 1%  +36.03%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24    65.7MB ± 1%    97.2MB ± 1%  +47.95%  (p=0.000 n=10+10)

name                                                 old allocs/op  new allocs/op  delta
TruncateLoop/desc/reqs=128/ranges=4/type=get-24           177 ± 2%       314 ± 0%  +77.40%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24          597 ± 8%       847 ± 8%  +41.89%  (p=0.000 n=9+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24          329 ± 3%       531 ± 2%  +61.40%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24       6.02k ± 9%     9.56k ± 6%  +58.80%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24       16.5k ± 0%     32.9k ± 0%  +99.17%  (p=0.000 n=8+8)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24      53.5k ± 1%     73.1k ± 1%  +36.69%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24      17.5k ± 0%     33.9k ± 0%  +94.02%  (p=0.000 n=6+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24      727k ± 1%     1194k ± 1%  +64.31%  (p=0.000 n=10+10)
```

An additional knob is introduced to the batch truncation helper to
indicate whether the helper can take ownership of the passed-in requests
slice and reorder it as it pleases. This is the case for the Streamer,
but the DistSender relies on the order of requests not being modified,
so the helper makes a copy of the slice.

Some tests needed an adjustment since now we process requests not
necessarily in the original order, so the population of ResumeSpans
might be different.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll go ahead and merge this. Thanks for the review Irfan!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @nvanbenschoten)


pkg/kv/kvclient/kvcoord/dist_sender.go line 1529 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Oh, hm, forgot that this is possibly a stale view of the range descriptor. I'm not sure and I'll defer to you/Nathan, but the check was roughly what I was expecting

I feel like it's ok to not have an assertion here given that this function is not exported, and it seems unlikely that a new caller will be introduced any time soon.

@craig
Copy link
Contributor

craig bot commented Jul 5, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 5, 2022

Build failed:

@yuzefovich
Copy link
Member Author

Unrelated flake.

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 6, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 6, 2022

Build failed:

@yuzefovich
Copy link
Member Author

Unrelated flakes.

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 6, 2022

Build succeeded:

@craig craig bot merged commit 90b15cf into cockroachdb:master Jul 6, 2022
@yuzefovich yuzefovich deleted the truncate-next-ci branch July 6, 2022 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kv: DistSender batch slicing is quadratic with batch size
3 participants