kvcoord: optimize batch truncation loop #82865

yuzefovich · 2022-06-14T01:35:04Z

kvcoord: truncate BatchRequest to range boundaries serially

This commit refactors the DistSender's loop of iterating over the range
descriptors so that the truncation of the BatchRequest happens
serially. This incurs a minor performance hit when the requests are sent
in parallel, but it makes it possible to apply the optimizations to this
iteration in the following commits.

Release note: None

kvcoord: merge truncate_test into batch_test

This commit moves TestTruncate as well as BenchmarkTruncate into
batch_test.go file in the preparation for the refactor done in the
following commit. The code itself wasn't changed at all.

Release note: None

kvcoord: introduce batch truncation helper

This commit introduces a batch truncation helper that encompasses logic
of truncating requests to the boundaries of a single range as well as
returning the next key to seek the range iterator to. The helper is now
used both in the DistSender as well as in the Streamer. No modification
to the actual logic of Truncate nor Next/prev functions has been
done other than incorporating the return of the next seek key into
Truncate function itself. This is needed since the following commit
will tightly couple the truncation process with the next seek key
determination in order to optimize it.

The helper can be configured with a knob indicating whether Truncate
needs to return requests in the original order. This behavior is
necessary by the BatchRequests that contain writes since in several
spots we rely on the ordering assumptions (e.g. of increasing values of
Sequence).

The following adjustments were made to the tests:

BenchmarkTruncate has been renamed to BenchmarkTruncateLegacy
TestTruncate has been refactored to exercise the new and the old
code-paths
TestBatchPrevNext has been refactored to run through the new code
path, also a few test cases have been adjusted slightly.

This commit also introduces some unit tests for the new code path when
it runs in a loop over multiple ranges as well as a corresponding
benchmark.

Release note: None

kvcoord: optimize batch truncation loop

This commit optimizes the batch truncation loop for the case when the
requests only use global keys.

The optimized approach sorts the requests according to their start keys
(with the Ascending scan direction) or in the reverse order of their end
keys (with the Descending scan direction). Then on each Truncate call
it looks only at a subset of the requests (that haven't been fully
processed yet and don't start after the current range), allowing us to
avoid many unnecessary key comparisons. Please see a comment on the new
truncateAsc and truncateDesc functions for more details and examples.

The optimized approach actually has a worse time complexity in the
absolute worst case (when each request is a range-spanning one and
actually spans all of the ranges against which the requests are
truncated) - because of the need to sort the requests upfront - but in
practice, it is much faster, especially with point requests.

name                                                               old time/op    new time/op    delta
TruncateLoop/asc/reqs=128/ranges=4/type=get-24                       59.8µs ± 1%    33.9µs ± 3%   -43.38%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24                      83.0µs ± 4%    69.7µs ± 7%   -15.98%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=get-24                       865µs ± 1%      62µs ± 1%   -92.84%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24                     1.07ms ± 5%    0.46ms ± 8%   -56.95%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=get-24                     7.09ms ± 0%    5.99ms ± 1%   -15.56%  (p=0.000 n=10+9)
TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24                    9.22ms ± 0%   10.52ms ± 1%   +14.08%  (p=0.000 n=9+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=get-24                     108ms ± 0%       6ms ± 2%   -94.50%  (p=0.000 n=8+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24                    129ms ± 1%      71ms ± 1%   -45.06%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24         60.2µs ± 1%    36.0µs ± 2%   -40.14%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24        82.4µs ± 8%    72.1µs ± 4%   -12.51%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24         862µs ± 1%      72µs ± 1%   -91.65%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24       1.06ms ± 4%    0.49ms ± 8%   -53.39%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24       7.24ms ± 0%    6.46ms ± 1%   -10.74%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24      10.1ms ± 2%     9.8ms ± 2%    -2.36%  (p=0.000 n=10+9)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24       107ms ± 0%       7ms ± 0%   -93.57%  (p=0.000 n=8+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24      122ms ± 0%      80ms ± 1%   -34.32%  (p=0.000 n=9+9)
TruncateLoop/desc/reqs=128/ranges=4/type=get-24                      78.9µs ± 1%    36.4µs ± 3%   -53.81%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24                     79.4µs ± 4%    52.3µs ± 5%   -34.14%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24                     1.16ms ± 1%    0.07ms ± 1%   -94.39%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24                    1.01ms ± 4%    0.46ms ± 5%   -54.56%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24                    9.42ms ± 0%    6.26ms ± 1%   -33.52%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24                   8.41ms ± 1%    9.22ms ± 1%    +9.54%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24                    145ms ± 0%       6ms ± 1%   -95.63%  (p=0.000 n=9+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24                   125ms ± 1%      67ms ± 1%   -46.31%  (p=0.000 n=9+9)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24        77.8µs ± 1%    39.6µs ± 2%   -49.10%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24       74.0µs ± 3%    63.0µs ± 6%   -14.92%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24       1.16ms ± 1%    0.08ms ± 1%   -93.47%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24      1.04ms ± 5%    0.47ms ± 7%   -54.65%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24      9.50ms ± 0%    6.73ms ± 1%   -29.21%  (p=0.000 n=9+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24     8.88ms ± 1%   13.24ms ± 1%   +49.04%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24      146ms ± 0%       7ms ± 1%   -94.98%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24     125ms ± 1%      74ms ± 1%   -40.75%  (p=0.000 n=10+9)

name                                                               old alloc/op   new alloc/op   delta
TruncateLoop/asc/reqs=128/ranges=4/type=get-24                       7.58kB ± 0%   21.00kB ±11%  +176.84%  (p=0.000 n=7+10)
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24                      39.8kB ± 6%    49.1kB ±15%   +23.46%  (p=0.000 n=9+10)
TruncateLoop/asc/reqs=128/ranges=64/type=get-24                      6.48kB ± 5%   18.25kB ± 2%  +181.79%  (p=0.000 n=10+9)
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24                      428kB ±20%     368kB ±13%   -13.85%  (p=0.003 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=get-24                     1.60MB ± 0%    2.91MB ± 0%   +82.49%  (p=0.000 n=8+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24                    5.24MB ± 0%    5.89MB ± 0%   +12.41%  (p=0.000 n=10+8)
TruncateLoop/asc/reqs=16384/ranges=64/type=get-24                    1.15MB ± 4%    2.41MB ± 1%  +110.09%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24                   69.8MB ± 1%    64.6MB ± 1%    -7.55%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24         9.77kB ±22%   21.98kB ± 4%  +125.07%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24        38.9kB ±23%    49.9kB ± 2%   +28.28%  (p=0.000 n=10+8)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24        6.56kB ± 4%   20.20kB ± 3%  +208.11%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24        407kB ±15%     372kB ±13%    -8.68%  (p=0.043 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24       1.65MB ± 0%    3.62MB ± 0%  +118.55%  (p=0.000 n=8+8)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24      6.60MB ± 2%    5.65MB ± 1%   -14.38%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24      1.10MB ± 5%    2.77MB ± 1%  +152.58%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24     60.5MB ± 1%    67.9MB ± 1%   +12.19%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=get-24                      17.2kB ±10%    21.5kB ± 1%   +24.91%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24                     35.5kB ±12%    29.1kB ± 4%   -17.83%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24                      138kB ± 1%      20kB ± 3%   -85.34%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24                     344kB ±15%     363kB ±10%    +5.50%  (p=0.035 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24                    2.78MB ± 0%    3.35MB ± 0%   +20.24%  (p=0.000 n=8+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24                   4.42MB ± 6%    5.29MB ± 0%   +19.67%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24                   17.9MB ± 0%     2.7MB ± 2%   -85.21%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24                  65.3MB ± 0%    61.0MB ± 1%    -6.65%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24        15.9kB ± 3%    26.7kB ± 1%   +67.87%  (p=0.000 n=10+9)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24       29.4kB ± 6%    41.6kB ± 5%   +41.50%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24        138kB ± 0%      23kB ± 3%   -83.61%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24       390kB ±19%     350kB ±11%   -10.16%  (p=0.015 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24      2.69MB ± 4%    3.51MB ± 1%   +30.22%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24     4.89MB ± 1%    8.19MB ± 0%   +67.68%  (p=0.000 n=10+9)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24     17.9MB ± 0%     3.0MB ± 2%   -83.34%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24    65.4MB ± 1%    62.9MB ± 1%    -3.81%  (p=0.000 n=10+10)

name                                                               old allocs/op  new allocs/op  delta
TruncateLoop/asc/reqs=128/ranges=4/type=get-24                         50.0 ± 0%      52.4 ±10%    +4.80%  (p=0.017 n=8+10)
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24                         569 ±12%       557 ±15%      ~     (p=0.617 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=get-24                         207 ± 4%       210 ± 5%      ~     (p=0.380 n=10+10)
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24                      6.97k ±13%     6.02k ± 9%   -13.64%  (p=0.000 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=get-24                        126 ± 0%       122 ± 0%    -3.17%  (p=0.002 n=8+10)
TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24                     51.9k ± 1%     42.8k ± 1%   -17.59%  (p=0.000 n=10+9)
TruncateLoop/asc/reqs=16384/ranges=64/type=get-24                     1.12k ± 1%     1.12k ± 1%    +0.43%  (p=0.027 n=10+10)
TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24                     786k ± 1%      714k ± 1%    -9.13%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24           41.2 ± 3%      58.0 ± 3%   +40.78%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24           574 ±18%       532 ±11%      ~     (p=0.143 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24           205 ± 2%       234 ± 3%   +14.26%  (p=0.000 n=9+9)
TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24        6.70k ± 9%     6.00k ± 9%   -10.40%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24          127 ± 0%       125 ± 0%    -1.57%  (p=0.001 n=8+9)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24       63.7k ± 1%     27.8k ± 2%   -56.34%  (p=0.000 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24       1.14k ± 0%     1.14k ± 1%      ~     (p=0.515 n=10+10)
TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24       696k ± 1%      752k ± 1%    +7.97%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=get-24                         554 ± 1%       169 ± 2%   -69.52%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24                        519 ± 9%       268 ± 9%   -48.32%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24                      8.38k ± 0%     0.33k ± 3%   -96.06%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24                     5.90k ±10%     5.89k ± 5%      ~     (p=0.796 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24                     65.7k ± 0%     16.5k ± 0%   -74.87%  (p=0.002 n=8+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24                    39.5k ± 1%     27.8k ± 2%   -29.54%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24                    1.05M ± 0%     0.02M ± 0%   -98.33%  (p=0.000 n=9+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24                    741k ± 0%      679k ± 1%    -8.32%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24           559 ± 0%       182 ± 2%   -67.42%  (p=0.000 n=9+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24          438 ± 8%       404 ±13%    -7.76%  (p=0.014 n=9+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24        8.39k ± 0%     0.36k ± 5%   -95.75%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24       6.38k ±11%     5.56k ± 9%   -12.85%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24       65.7k ± 0%     16.5k ± 0%   -74.84%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24      46.8k ± 1%     67.6k ± 1%   +44.65%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24      1.05M ± 0%     0.02M ± 0%   -98.33%  (p=0.000 n=10+10)
TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24      739k ± 1%      694k ± 1%    -6.08%  (p=0.000 n=10+10)

The truncation loops for the optimized strategy are very similar in both
directions, so I tried to extract the differences out into an interface.
However, this showed non-trivial slow down and increase in allocations,
so I chose to have some duplicated code to get the best performance.
Here is a snippet of the comparison when the interface was prototyped:

name                                                 old time/op    new time/op    delta
TruncateLoop/desc/reqs=128/ranges=4/type=get-24        36.9µs ± 3%    44.5µs ± 3%  +20.55%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24       74.8µs ± 5%    88.8µs ± 3%  +18.73%  (p=0.000 n=9+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24       64.9µs ± 1%    78.3µs ± 1%  +20.72%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24       471µs ± 8%     682µs ±13%  +44.73%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24      6.34ms ± 1%    7.39ms ± 0%  +16.47%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24     11.2ms ± 1%    12.4ms ± 1%  +10.36%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24     6.40ms ± 2%    7.39ms ± 1%  +15.47%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24    70.9ms ± 1%   102.0ms ± 2%  +43.87%  (p=0.000 n=9+9)

name                                                 old alloc/op   new alloc/op   delta
TruncateLoop/desc/reqs=128/ranges=4/type=get-24        22.2kB ± 9%    30.4kB ± 0%  +36.55%  (p=0.000 n=10+7)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24       52.2kB ± 5%    67.6kB ± 4%  +29.47%  (p=0.000 n=8+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24       20.0kB ± 2%    32.2kB ± 1%  +60.86%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24       372kB ±13%     600kB ±10%  +61.29%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24      3.24MB ± 0%    4.45MB ± 0%  +37.42%  (p=0.000 n=8+7)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24     6.61MB ± 0%    7.86MB ± 0%  +18.90%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24     2.75MB ± 2%    3.74MB ± 1%  +36.03%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24    65.7MB ± 1%    97.2MB ± 1%  +47.95%  (p=0.000 n=10+10)

name                                                 old allocs/op  new allocs/op  delta
TruncateLoop/desc/reqs=128/ranges=4/type=get-24           177 ± 2%       314 ± 0%  +77.40%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=128/ranges=4/type=scan-24          597 ± 8%       847 ± 8%  +41.89%  (p=0.000 n=9+10)
TruncateLoop/desc/reqs=128/ranges=64/type=get-24          329 ± 3%       531 ± 2%  +61.40%  (p=0.000 n=10+10)
TruncateLoop/desc/reqs=128/ranges=64/type=scan-24       6.02k ± 9%     9.56k ± 6%  +58.80%  (p=0.000 n=10+8)
TruncateLoop/desc/reqs=16384/ranges=4/type=get-24       16.5k ± 0%     32.9k ± 0%  +99.17%  (p=0.000 n=8+8)
TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24      53.5k ± 1%     73.1k ± 1%  +36.69%  (p=0.000 n=10+9)
TruncateLoop/desc/reqs=16384/ranges=64/type=get-24      17.5k ± 0%     33.9k ± 0%  +94.02%  (p=0.000 n=6+10)
TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24      727k ± 1%     1194k ± 1%  +64.31%  (p=0.000 n=10+10)

An additional knob is introduced to the batch truncation helper to
indicate whether the helper can take ownership of the passed-in requests
slice and reorder it as it pleases. This is the case for the Streamer,
but the DistSender relies on the order of requests not being modified,
so the helper makes a copy of the slice.

Some tests needed an adjustment since now we process requests not
necessarily in the original order, so the population of ResumeSpans
might be different.

Fixes: #68536.
Addresses: #82159.

Release note: None

cockroach-teamcity · 2022-06-14T01:35:11Z

This change is

yuzefovich · 2022-06-15T18:41:33Z

Added the support for the descending scan direction too. I also tried performing some bounds-checks elimination in the optimized strategies, but they didn't show any improvement in the benchmark, so I abandoned that idea.

RFAL.

yuzefovich · 2022-06-16T02:00:11Z

Some macro benchmark numbers: here is the comparison of all TPCH queries (via 20 runs of tpchvec roachtest), before this PR and after this PR (with the current default settings, meaning the streamer is enabled):

Q1: 3.24s	3.25s	0.30%
Q2: 0.19s	0.19s	0.47%
Q3: 1.39s	1.35s	-2.86%
Q4: 1.04s	1.04s	0.29%
Q5: 1.81s	1.70s	-5.95%
Q6: 6.62s	5.18s	-21.78%
Q7: 5.50s	4.72s	-14.19%
Q8: 0.81s	0.71s	-12.11%
Q9: 4.26s	4.04s	-5.17%
Q10: 1.39s	1.34s	-3.37%
Q11: 0.51s	0.48s	-6.08%
Q12: 6.77s	5.23s	-22.71%
Q13: 1.03s	0.98s	-5.57%
Q14: 0.55s	0.44s	-19.58%
Q15: 4.24s	3.42s	-19.51%
Q16: 0.76s	0.74s	-2.20%
Q17: 0.70s	0.68s	-2.98%
Q18: 5.45s	5.61s	3.09%
Q19: 3.36s	3.33s	-0.91%
Q20: 11.18s	9.52s	-14.79%
Q21: 5.75s	4.84s	-15.81%
Q22: 0.53s	0.54s	1.70%

irfansharif

Haven't looked at the meaty third commit yet but cool results. For the second commit splitting out the TestTruncate move and the subsequent changes would help make reviewing it a bit easier.

Reviewed 1 of 1 files at r1, 3 of 6 files at r2, 3 of 6 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @tbg, and @yuzefovich)

pkg/kv/kvclient/kvcoord/batch.go line 34 at r2 (raw file):

//     ...
//   }
//   helper.Init(scanDir, requests)

[nit] Instead of helper.Init, how about a constructor for the helper that takes in whatever parameters you need? Looks like we'r always having to call .Init anyway.

pkg/kv/kvclient/kvcoord/batch.go line 203 at r3 (raw file):

			// We're dealing with a range-spanning request.
			if l, r := keys.IsLocal(h.headers[i].Key), keys.IsLocal(h.headers[i].EndKey); (l && !r) || (!l && r) {
				return errors.Errorf("local key mixed with global key in range")

AssertionFailedf for this and the one below?

pkg/kv/kvclient/kvcoord/batch.go line 210 at r3 (raw file):

	}
	if scanDir == Ascending {
		sort.Sort(ascBatchTruncationHelper{BatchTruncationHelper: h})

[nit] Here and below, would sort.Slice let us avoid defining two additional helper types?

pkg/kv/kvclient/kvcoord/dist_sender.go line 1540 at r1 (raw file):

// descriptor. If the underlying range seems to have split, we
// recursively invoke divideAndSendBatchToRanges to re-enumerate the
// ranges in the span and resend to each.

[nit] Add some commentary about the positions parameter (looks like you have some in the subsequent commit within the helper struct).

pkg/kv/kvclient/kvcoord/batch_test.go line 257 at r2 (raw file):

// TestTruncate verifies the truncation logic of the BatchTruncationHelper over
// a single range as well as truncateLegacy() function directly.
func TestTruncate(t *testing.T) {

Entirely optional, but a randomized form of this test that asserts on the invariants (ordering of truncated requests, valid position indexes, etc.) we expect for a given set of random-but-well-formed inputs (keys, query span, scan direction, etc.) would give me a lot of confidence. Ditto for the test below with different seek keys. I'm thinking of something like this, which shook out a lot of bugs:

cockroach/pkg/spanconfig/spanconfigstore/span_store_test.go

Lines 29 to 41 in 4124128

    
           // TestRandomized randomly sets/deletes span configs for arbitrary keyspans 
        
           // within some alphabet. For a test span, it then asserts that the config we 
        
           // retrieve is what we expect to find from the store. It also ensures that all 
        
           // ranges are non-overlapping, and that coalesced split-keys works as expected 
        
           // (adjacent configs, if identical, don't induce split points). 
        
           func TestRandomized(t *testing.T) { 
        
           	defer leaktest.AfterTest(t)() 
        
           	randutil.SeedForTests() 
        
           	ctx := context.Background() 
        
           	alphabet := "abcdefghijklmnopqrstuvwxyz" 
        
           	configs := "ABCDEF" 
        
           	ops := []string{"set", "del"}

yuzefovich

Split out the test code movement into a separate commit.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @nvanbenschoten, and @tbg)

pkg/kv/kvclient/kvcoord/batch.go line 34 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[nit] Instead of helper.Init, how about a constructor for the helper that takes in whatever parameters you need? Looks like we'r always having to call .Init anyway.

Done.

pkg/kv/kvclient/kvcoord/batch.go line 203 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

AssertionFailedf for this and the one below?

Done.

pkg/kv/kvclient/kvcoord/batch.go line 210 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[nit] Here and below, would sort.Slice let us avoid defining two additional helper types?

No, I don't think so - sort.Slice operates on a single slice whereas we have many slices which we want to order according to the headers and make the swap atomic.

pkg/kv/kvclient/kvcoord/dist_sender.go line 1540 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[nit] Add some commentary about the positions parameter (looks like you have some in the subsequent commit within the helper struct).

Done.

pkg/kv/kvclient/kvcoord/batch_test.go line 257 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Entirely optional, but a randomized form of this test that asserts on the invariants (ordering of truncated requests, valid position indexes, etc.) we expect for a given set of random-but-well-formed inputs (keys, query span, scan direction, etc.) would give me a lot of confidence. Ditto for the test below with different seek keys. I'm thinking of something like this, which shook out a lot of bugs:

cockroach/pkg/spanconfig/spanconfigstore/span_store_test.go

Lines 29 to 41 in 4124128

// TestRandomized randomly sets/deletes span configs for arbitrary keyspans

// within some alphabet. For a test span, it then asserts that the config we

// retrieve is what we expect to find from the store. It also ensures that all

// ranges are non-overlapping, and that coalesced split-keys works as expected

// (adjacent configs, if identical, don't induce split points).

func TestRandomized(t *testing.T) {

defer leaktest.AfterTest(t)()

randutil.SeedForTests()

ctx := context.Background()

alphabet := "abcdefghijklmnopqrstuvwxyz"

configs := "ABCDEF"

ops := []string{"set", "del"}

Done.

yuzefovich · 2022-06-29T01:17:10Z

Friendly ping @irfansharif @nvanbenschoten.

irfansharif

I'm pretty unfamiliar with this code and I tried understanding + reviewing it as much as I could, it . You'd definitely benefit from better reviewers than me, but that's usually always true. The benchmark results are pretty compelling!

Reviewed 1 of 3 files at r5, 2 of 5 files at r6, 6 of 6 files at r7, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @yuzefovich)

pkg/kv/kvclient/kvcoord/dist_sender.go line 1529 at r7 (raw file):

// sendPartialBatch sends the supplied batch to the range specified by desc.
//
// The batch request is supposed to be truncated already so that it contains

Is this property worth an assertion? Perhaps under test builds if too expensive to do underotherwise.

pkg/kv/kvclient/kvcoord/batch_test.go line 305 at r7 (raw file):

			keys:    [][2]string{{loc("b"), loc("e") + "\x00"}},
			expKeys: [][2]string{{loc("b"), loc("e") + "\x00"}},
			from:    "b", to: "e\x00",

Is this right? Isn't the test above setting up "e\x00" key and retrieving it with a non-inclusive bound on "e\x00"? I'm probably reading things wrong.

pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 562 at r7 (raw file):

		{
			name:  "unsorted, non-overlapping, neither satisfied",
			bound: 6,

For my own understanding, why change some of these tests instead of adding new test cases?

yuzefovich

TFTR! I'll wait till the end of the week hoping that @nvanbenschoten will have some time to review this too.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @nvanbenschoten)

pkg/kv/kvclient/kvcoord/dist_sender.go line 1529 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Is this property worth an assertion? Perhaps under test builds if too expensive to do underotherwise.

Hm, it's a good question. We could add something like

diff --git a/pkg/kv/kvclient/kvcoord/dist_sender.go b/pkg/kv/kvclient/kvcoord/dist_sender.go
index 44cdbfca8b..4f6dac24f2 100644
--- a/pkg/kv/kvclient/kvcoord/dist_sender.go
+++ b/pkg/kv/kvclient/kvcoord/dist_sender.go
@@ -32,6 +32,7 @@ import (
        "github.com/cockroachdb/cockroach/pkg/rpc/nodedialer"
        "github.com/cockroachdb/cockroach/pkg/settings"
        "github.com/cockroachdb/cockroach/pkg/settings/cluster"
+       "github.com/cockroachdb/cockroach/pkg/util/buildutil"
        "github.com/cockroachdb/cockroach/pkg/util/grpcutil"
        "github.com/cockroachdb/cockroach/pkg/util/hlc"
        "github.com/cockroachdb/cockroach/pkg/util/log"
@@ -1541,6 +1542,15 @@ func (ds *DistSender) sendPartialBatch(
        routingTok rangecache.EvictionToken,
        positions []int,
 ) response {
+       if buildutil.CrdbTestBuild {
+               baRS, err := keys.Range(ba.Requests)
+               if err != nil {
+                       return response{pErr: roachpb.NewError(err)}
+               }
+               if !routingTok.Desc().ContainsKeyRange(baRS.Key, baRS.EndKey) {
+                       return response{pErr: roachpb.NewErrorf("the batch wasn't truncated for sendPartialBatch")}
+               }
+       }
        if batchIdx == 1 {
                ds.metrics.PartialBatchCount.Inc(2) // account for first batch
        } else if batchIdx > 1 {

does this check seem correct? I'm a bit afraid of using routingTok.Desc in case it has become stale or something like that - the DistSender internally can re-execute batches from scratch in some of the scenarios instead of returning an error, but here we'd short-circuit that. Do you have an idea of such a check that would not introduce any regressions?

pkg/kv/kvclient/kvcoord/batch_test.go line 305 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Is this right? Isn't the test above setting up "e\x00" key and retrieving it with a non-inclusive bound on "e\x00"? I'm probably reading things wrong.

This hasn't changed in this PR (last update in 2015), so I'm assuming it's right 😅 It's definitely something about local vs global keys distinction which I don't quite understand either.

pkg/kv/kvclient/kvcoord/dist_sender_server_test.go line 562 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

For my own understanding, why change some of these tests instead of adding new test cases?

All changes in this commit to the tests were made to make the tests pass - previously, before the requests were sorted, the test cases assumed that partial response (because of the bound limit) would contain response to the first requests in the batch (i.e. with the smallest positions). But now since we're sorting the requests internally, that might no longer be the case. So this commit tries to change the test cases so that regardless of how requests are ordered, the expected results are always the same.

irfansharif

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @nvanbenschoten, and @yuzefovich)

pkg/kv/kvclient/kvcoord/dist_sender.go line 1529 at r7 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, it's a good question. We could add something like

diff --git a/pkg/kv/kvclient/kvcoord/dist_sender.go b/pkg/kv/kvclient/kvcoord/dist_sender.go
index 44cdbfca8b..4f6dac24f2 100644
--- a/pkg/kv/kvclient/kvcoord/dist_sender.go
+++ b/pkg/kv/kvclient/kvcoord/dist_sender.go
@@ -32,6 +32,7 @@ import (
        "github.com/cockroachdb/cockroach/pkg/rpc/nodedialer"
        "github.com/cockroachdb/cockroach/pkg/settings"
        "github.com/cockroachdb/cockroach/pkg/settings/cluster"
+       "github.com/cockroachdb/cockroach/pkg/util/buildutil"
        "github.com/cockroachdb/cockroach/pkg/util/grpcutil"
        "github.com/cockroachdb/cockroach/pkg/util/hlc"
        "github.com/cockroachdb/cockroach/pkg/util/log"
@@ -1541,6 +1542,15 @@ func (ds *DistSender) sendPartialBatch(
        routingTok rangecache.EvictionToken,
        positions []int,
 ) response {
+       if buildutil.CrdbTestBuild {
+               baRS, err := keys.Range(ba.Requests)
+               if err != nil {
+                       return response{pErr: roachpb.NewError(err)}
+               }
+               if !routingTok.Desc().ContainsKeyRange(baRS.Key, baRS.EndKey) {
+                       return response{pErr: roachpb.NewErrorf("the batch wasn't truncated for sendPartialBatch")}
+               }
+       }
        if batchIdx == 1 {
                ds.metrics.PartialBatchCount.Inc(2) // account for first batch
        } else if batchIdx > 1 {

does this check seem correct? I'm a bit afraid of using routingTok.Desc in case it has become stale or something like that - the DistSender internally can re-execute batches from scratch in some of the scenarios instead of returning an error, but here we'd short-circuit that. Do you have an idea of such a check that would not introduce any regressions?

Oh, hm, forgot that this is possibly a stale view of the range descriptor. I'm not sure and I'll defer to you/Nathan, but the check was roughly what I was expecting

This commit refactors the DistSender's loop of iterating over the range descriptors so that the truncation of the BatchRequest happens serially. This incurs a minor performance hit when the requests are sent in parallel, but it makes it possible to apply the optimizations to this iteration in the following commits. Release note: None

This commit moves `TestTruncate` as well as `BenchmarkTruncate` into `batch_test.go` file in the preparation for the refactor done in the following commit. The code itself wasn't changed at all. Release note: None

This commit introduces a batch truncation helper that encompasses logic of truncating requests to the boundaries of a single range as well as returning the next key to seek the range iterator to. The helper is now used both in the DistSender as well as in the Streamer. No modification to the actual logic of `Truncate` nor `Next`/`prev` functions has been done other than incorporating the return of the next seek key into `Truncate` function itself. This is needed since the following commit will tightly couple the truncation process with the next seek key determination in order to optimize it. The helper can be configured with a knob indicating whether `Truncate` needs to return requests in the original order. This behavior is necessary by the BatchRequests that contain writes since in several spots we rely on the ordering assumptions (e.g. of increasing values of `Sequence`). The following adjustments were made to the tests: - `BenchmarkTruncate` has been renamed to `BenchmarkTruncateLegacy` - `TestTruncate` has been refactored to exercise the new and the old code-paths - `TestBatchPrevNext` has been refactored to run through the new code path, also a few test cases have been adjusted slightly. This commit also introduces some unit tests for the new code path when it runs in a loop over multiple ranges as well as a corresponding benchmark. Release note: None

This commit optimizes the batch truncation loop for the case when the requests only use global keys. The optimized approach sorts the requests according to their start keys (with the Ascending scan direction) or in the reverse order of their end keys (with the Descending scan direction). Then on each `Truncate` call it looks only at a subset of the requests (that haven't been fully processed yet and don't start after the current range), allowing us to avoid many unnecessary key comparisons. Please see a comment on the new `truncateAsc` and `truncateDesc` functions for more details and examples. The optimized approach actually has a worse time complexity in the absolute worst case (when each request is a range-spanning one and actually spans all of the ranges against which the requests are truncated) - because of the need to sort the requests upfront - but in practice, it is much faster, especially with point requests. ``` name old time/op new time/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 59.8µs ± 1% 33.9µs ± 3% -43.38% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 83.0µs ± 4% 69.7µs ± 7% -15.98% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 865µs ± 1% 62µs ± 1% -92.84% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 1.07ms ± 5% 0.46ms ± 8% -56.95% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 7.09ms ± 0% 5.99ms ± 1% -15.56% (p=0.000 n=10+9) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 9.22ms ± 0% 10.52ms ± 1% +14.08% (p=0.000 n=9+10) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 108ms ± 0% 6ms ± 2% -94.50% (p=0.000 n=8+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 129ms ± 1% 71ms ± 1% -45.06% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 60.2µs ± 1% 36.0µs ± 2% -40.14% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 82.4µs ± 8% 72.1µs ± 4% -12.51% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 862µs ± 1% 72µs ± 1% -91.65% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 1.06ms ± 4% 0.49ms ± 8% -53.39% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 7.24ms ± 0% 6.46ms ± 1% -10.74% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 10.1ms ± 2% 9.8ms ± 2% -2.36% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 107ms ± 0% 7ms ± 0% -93.57% (p=0.000 n=8+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 122ms ± 0% 80ms ± 1% -34.32% (p=0.000 n=9+9) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 78.9µs ± 1% 36.4µs ± 3% -53.81% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 79.4µs ± 4% 52.3µs ± 5% -34.14% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 1.16ms ± 1% 0.07ms ± 1% -94.39% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 1.01ms ± 4% 0.46ms ± 5% -54.56% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 9.42ms ± 0% 6.26ms ± 1% -33.52% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 8.41ms ± 1% 9.22ms ± 1% +9.54% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 145ms ± 0% 6ms ± 1% -95.63% (p=0.000 n=9+9) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 125ms ± 1% 67ms ± 1% -46.31% (p=0.000 n=9+9) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 77.8µs ± 1% 39.6µs ± 2% -49.10% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 74.0µs ± 3% 63.0µs ± 6% -14.92% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 1.16ms ± 1% 0.08ms ± 1% -93.47% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 1.04ms ± 5% 0.47ms ± 7% -54.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 9.50ms ± 0% 6.73ms ± 1% -29.21% (p=0.000 n=9+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 8.88ms ± 1% 13.24ms ± 1% +49.04% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 146ms ± 0% 7ms ± 1% -94.98% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 125ms ± 1% 74ms ± 1% -40.75% (p=0.000 n=10+9) name old alloc/op new alloc/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 7.58kB ± 0% 21.00kB ±11% +176.84% (p=0.000 n=7+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 39.8kB ± 6% 49.1kB ±15% +23.46% (p=0.000 n=9+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 6.48kB ± 5% 18.25kB ± 2% +181.79% (p=0.000 n=10+9) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 428kB ±20% 368kB ±13% -13.85% (p=0.003 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 1.60MB ± 0% 2.91MB ± 0% +82.49% (p=0.000 n=8+10) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 5.24MB ± 0% 5.89MB ± 0% +12.41% (p=0.000 n=10+8) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 1.15MB ± 4% 2.41MB ± 1% +110.09% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 69.8MB ± 1% 64.6MB ± 1% -7.55% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 9.77kB ±22% 21.98kB ± 4% +125.07% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 38.9kB ±23% 49.9kB ± 2% +28.28% (p=0.000 n=10+8) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 6.56kB ± 4% 20.20kB ± 3% +208.11% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 407kB ±15% 372kB ±13% -8.68% (p=0.043 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 1.65MB ± 0% 3.62MB ± 0% +118.55% (p=0.000 n=8+8) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 6.60MB ± 2% 5.65MB ± 1% -14.38% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.10MB ± 5% 2.77MB ± 1% +152.58% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 60.5MB ± 1% 67.9MB ± 1% +12.19% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 17.2kB ±10% 21.5kB ± 1% +24.91% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 35.5kB ±12% 29.1kB ± 4% -17.83% (p=0.000 n=10+9) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 138kB ± 1% 20kB ± 3% -85.34% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 344kB ±15% 363kB ±10% +5.50% (p=0.035 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 2.78MB ± 0% 3.35MB ± 0% +20.24% (p=0.000 n=8+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 4.42MB ± 6% 5.29MB ± 0% +19.67% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 17.9MB ± 0% 2.7MB ± 2% -85.21% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 65.3MB ± 0% 61.0MB ± 1% -6.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 15.9kB ± 3% 26.7kB ± 1% +67.87% (p=0.000 n=10+9) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 29.4kB ± 6% 41.6kB ± 5% +41.50% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 138kB ± 0% 23kB ± 3% -83.61% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 390kB ±19% 350kB ±11% -10.16% (p=0.015 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 2.69MB ± 4% 3.51MB ± 1% +30.22% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 4.89MB ± 1% 8.19MB ± 0% +67.68% (p=0.000 n=10+9) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 17.9MB ± 0% 3.0MB ± 2% -83.34% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 65.4MB ± 1% 62.9MB ± 1% -3.81% (p=0.000 n=10+10) name old allocs/op new allocs/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 50.0 ± 0% 52.4 ±10% +4.80% (p=0.017 n=8+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 569 ±12% 557 ±15% ~ (p=0.617 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 207 ± 4% 210 ± 5% ~ (p=0.380 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 6.97k ±13% 6.02k ± 9% -13.64% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 126 ± 0% 122 ± 0% -3.17% (p=0.002 n=8+10) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 51.9k ± 1% 42.8k ± 1% -17.59% (p=0.000 n=10+9) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 1.12k ± 1% 1.12k ± 1% +0.43% (p=0.027 n=10+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 786k ± 1% 714k ± 1% -9.13% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 41.2 ± 3% 58.0 ± 3% +40.78% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 574 ±18% 532 ±11% ~ (p=0.143 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 205 ± 2% 234 ± 3% +14.26% (p=0.000 n=9+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 6.70k ± 9% 6.00k ± 9% -10.40% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 127 ± 0% 125 ± 0% -1.57% (p=0.001 n=8+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 63.7k ± 1% 27.8k ± 2% -56.34% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.14k ± 0% 1.14k ± 1% ~ (p=0.515 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 696k ± 1% 752k ± 1% +7.97% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 554 ± 1% 169 ± 2% -69.52% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 519 ± 9% 268 ± 9% -48.32% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 8.38k ± 0% 0.33k ± 3% -96.06% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 5.90k ±10% 5.89k ± 5% ~ (p=0.796 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 65.7k ± 0% 16.5k ± 0% -74.87% (p=0.002 n=8+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 39.5k ± 1% 27.8k ± 2% -29.54% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 1.05M ± 0% 0.02M ± 0% -98.33% (p=0.000 n=9+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 741k ± 0% 679k ± 1% -8.32% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 559 ± 0% 182 ± 2% -67.42% (p=0.000 n=9+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 438 ± 8% 404 ±13% -7.76% (p=0.014 n=9+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 8.39k ± 0% 0.36k ± 5% -95.75% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 6.38k ±11% 5.56k ± 9% -12.85% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 65.7k ± 0% 16.5k ± 0% -74.84% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 46.8k ± 1% 67.6k ± 1% +44.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.05M ± 0% 0.02M ± 0% -98.33% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 739k ± 1% 694k ± 1% -6.08% (p=0.000 n=10+10) ``` The truncation loops for the optimized strategy are very similar in both directions, so I tried to extract the differences out into an interface. However, this showed non-trivial slow down and increase in allocations, so I chose to have some duplicated code to get the best performance. Here is a snippet of the comparison when the interface was prototyped: ``` name old time/op new time/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 36.9µs ± 3% 44.5µs ± 3% +20.55% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 74.8µs ± 5% 88.8µs ± 3% +18.73% (p=0.000 n=9+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 64.9µs ± 1% 78.3µs ± 1% +20.72% (p=0.000 n=10+9) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 471µs ± 8% 682µs ±13% +44.73% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 6.34ms ± 1% 7.39ms ± 0% +16.47% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 11.2ms ± 1% 12.4ms ± 1% +10.36% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 6.40ms ± 2% 7.39ms ± 1% +15.47% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 70.9ms ± 1% 102.0ms ± 2% +43.87% (p=0.000 n=9+9) name old alloc/op new alloc/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 22.2kB ± 9% 30.4kB ± 0% +36.55% (p=0.000 n=10+7) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 52.2kB ± 5% 67.6kB ± 4% +29.47% (p=0.000 n=8+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 20.0kB ± 2% 32.2kB ± 1% +60.86% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 372kB ±13% 600kB ±10% +61.29% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 3.24MB ± 0% 4.45MB ± 0% +37.42% (p=0.000 n=8+7) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 6.61MB ± 0% 7.86MB ± 0% +18.90% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 2.75MB ± 2% 3.74MB ± 1% +36.03% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 65.7MB ± 1% 97.2MB ± 1% +47.95% (p=0.000 n=10+10) name old allocs/op new allocs/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 177 ± 2% 314 ± 0% +77.40% (p=0.000 n=10+8) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 597 ± 8% 847 ± 8% +41.89% (p=0.000 n=9+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 329 ± 3% 531 ± 2% +61.40% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 6.02k ± 9% 9.56k ± 6% +58.80% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 16.5k ± 0% 32.9k ± 0% +99.17% (p=0.000 n=8+8) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 53.5k ± 1% 73.1k ± 1% +36.69% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 17.5k ± 0% 33.9k ± 0% +94.02% (p=0.000 n=6+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 727k ± 1% 1194k ± 1% +64.31% (p=0.000 n=10+10) ``` An additional knob is introduced to the batch truncation helper to indicate whether the helper can take ownership of the passed-in requests slice and reorder it as it pleases. This is the case for the Streamer, but the DistSender relies on the order of requests not being modified, so the helper makes a copy of the slice. Some tests needed an adjustment since now we process requests not necessarily in the original order, so the population of ResumeSpans might be different. Release note: None

yuzefovich

Alright, I'll go ahead and merge this. Thanks for the review Irfan!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @nvanbenschoten)

pkg/kv/kvclient/kvcoord/dist_sender.go line 1529 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Oh, hm, forgot that this is possibly a stale view of the range descriptor. I'm not sure and I'll defer to you/Nathan, but the check was roughly what I was expecting

I feel like it's ok to not have an assertion here given that this function is not exported, and it seems unlikely that a new caller will be introduced any time soon.

craig · 2022-07-05T20:55:42Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2022-07-05T21:32:57Z

Build failed:

GitHub CI (Cockroach)

yuzefovich · 2022-07-05T21:34:13Z

Unrelated flake.

bors r+

craig · 2022-07-06T00:49:55Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2022-07-06T05:16:30Z

Build failed:

GitHub CI (Cockroach)

yuzefovich · 2022-07-06T05:19:31Z

Unrelated flakes.

bors r+

craig · 2022-07-06T06:36:54Z

Build succeeded:

GitHub CI (Cockroach)

yuzefovich force-pushed the truncate-next-ci branch 10 times, most recently from 26d656e to 92d7e0b Compare June 15, 2022 04:50

yuzefovich requested review from nvanbenschoten, irfansharif and a team June 15, 2022 04:53

yuzefovich marked this pull request as ready for review June 15, 2022 04:53

This was referenced Jun 15, 2022

[wip] kvcoord: improve truncate() and next() #68564

Closed

kv: DistSender batch slicing is quadratic with batch size #68536

Closed

yuzefovich force-pushed the truncate-next-ci branch 2 times, most recently from 0500074 to 924f278 Compare June 15, 2022 18:39

yuzefovich requested a review from tbg June 15, 2022 18:40

irfansharif reviewed Jun 20, 2022

View reviewed changes

yuzefovich force-pushed the truncate-next-ci branch from 924f278 to d7f981a Compare June 22, 2022 00:55

blathers-crl bot requested a review from irfansharif June 22, 2022 00:55

yuzefovich commented Jun 22, 2022

View reviewed changes

tbg removed their request for review June 22, 2022 19:50

This was referenced Jun 23, 2022

kvstreamer: improve the performance #82159

Closed

kvstreamer: improve the stability #82160

Closed

irfansharif approved these changes Jun 29, 2022

View reviewed changes

yuzefovich commented Jun 29, 2022

View reviewed changes

irfansharif reviewed Jun 30, 2022

View reviewed changes

yuzefovich added 4 commits July 5, 2022 11:03

kvcoord: merge truncate_test into batch_test

33dfff0

This commit moves `TestTruncate` as well as `BenchmarkTruncate` into `batch_test.go` file in the preparation for the refactor done in the following commit. The code itself wasn't changed at all. Release note: None

yuzefovich force-pushed the truncate-next-ci branch from d7f981a to 067db69 Compare July 5, 2022 18:07

yuzefovich commented Jul 5, 2022

View reviewed changes

craig bot merged commit 90b15cf into cockroachdb:master Jul 6, 2022

yuzefovich deleted the truncate-next-ci branch July 6, 2022 14:40

renatolabs mentioned this pull request Jul 22, 2022

roachperf: regression around 2022-07-07 #84882

Closed

30 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvcoord: optimize batch truncation loop #82865

kvcoord: optimize batch truncation loop #82865

yuzefovich commented Jun 14, 2022 •

edited

Loading

cockroach-teamcity commented Jun 14, 2022

yuzefovich commented Jun 15, 2022

yuzefovich commented Jun 16, 2022 •

edited

Loading

irfansharif left a comment

yuzefovich left a comment

yuzefovich commented Jun 29, 2022

irfansharif left a comment

yuzefovich left a comment

irfansharif left a comment

yuzefovich left a comment

craig bot commented Jul 5, 2022

craig bot commented Jul 5, 2022

yuzefovich commented Jul 5, 2022

craig bot commented Jul 6, 2022

craig bot commented Jul 6, 2022

yuzefovich commented Jul 6, 2022

craig bot commented Jul 6, 2022

	// TestRandomized randomly sets/deletes span configs for arbitrary keyspans
	// within some alphabet. For a test span, it then asserts that the config we
	// retrieve is what we expect to find from the store. It also ensures that all
	// ranges are non-overlapping, and that coalesced split-keys works as expected
	// (adjacent configs, if identical, don't induce split points).
	func TestRandomized(t *testing.T) {
	defer leaktest.AfterTest(t)()

	randutil.SeedForTests()
	ctx := context.Background()
	alphabet := "abcdefghijklmnopqrstuvwxyz"
	configs := "ABCDEF"
	ops := []string{"set", "del"}

kvcoord: optimize batch truncation loop #82865

kvcoord: optimize batch truncation loop #82865

Conversation

yuzefovich commented Jun 14, 2022 • edited Loading

cockroach-teamcity commented Jun 14, 2022

yuzefovich commented Jun 15, 2022

yuzefovich commented Jun 16, 2022 • edited Loading

irfansharif left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich commented Jun 29, 2022

irfansharif left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Jul 5, 2022

craig bot commented Jul 5, 2022

yuzefovich commented Jul 5, 2022

craig bot commented Jul 6, 2022

craig bot commented Jul 6, 2022

yuzefovich commented Jul 6, 2022

craig bot commented Jul 6, 2022

yuzefovich commented Jun 14, 2022 •

edited

Loading

yuzefovich commented Jun 16, 2022 •

edited

Loading