Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kvcoord: optimize batch truncation loop
This commit optimizes the batch truncation loop for the case when the requests only use global keys. The optimized approach sorts the requests according to their start keys (with the Ascending scan direction) or in the reverse order of their end keys (with the Descending scan direction). Then on each `Truncate` call it looks only at a subset of the requests (that haven't been fully processed yet and don't start after the current range), allowing us to avoid many unnecessary key comparisons. Please see a comment on the new `truncateAsc` and `truncateDesc` functions for more details and examples. The optimized approach actually has a worse time complexity in the absolute worst case (when each request is a range-spanning one and actually spans all of the ranges against which the requests are truncated) - because of the need to sort the requests upfront - but in practice, it is much faster, especially with point requests. ``` name old time/op new time/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 59.8µs ± 1% 33.9µs ± 3% -43.38% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 83.0µs ± 4% 69.7µs ± 7% -15.98% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 865µs ± 1% 62µs ± 1% -92.84% (p=0.000 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 1.07ms ± 5% 0.46ms ± 8% -56.95% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 7.09ms ± 0% 5.99ms ± 1% -15.56% (p=0.000 n=10+9) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 9.22ms ± 0% 10.52ms ± 1% +14.08% (p=0.000 n=9+10) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 108ms ± 0% 6ms ± 2% -94.50% (p=0.000 n=8+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 129ms ± 1% 71ms ± 1% -45.06% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 60.2µs ± 1% 36.0µs ± 2% -40.14% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 82.4µs ± 8% 72.1µs ± 4% -12.51% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 862µs ± 1% 72µs ± 1% -91.65% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 1.06ms ± 4% 0.49ms ± 8% -53.39% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 7.24ms ± 0% 6.46ms ± 1% -10.74% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 10.1ms ± 2% 9.8ms ± 2% -2.36% (p=0.000 n=10+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 107ms ± 0% 7ms ± 0% -93.57% (p=0.000 n=8+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 122ms ± 0% 80ms ± 1% -34.32% (p=0.000 n=9+9) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 78.9µs ± 1% 36.4µs ± 3% -53.81% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 79.4µs ± 4% 52.3µs ± 5% -34.14% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 1.16ms ± 1% 0.07ms ± 1% -94.39% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 1.01ms ± 4% 0.46ms ± 5% -54.56% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 9.42ms ± 0% 6.26ms ± 1% -33.52% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 8.41ms ± 1% 9.22ms ± 1% +9.54% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 145ms ± 0% 6ms ± 1% -95.63% (p=0.000 n=9+9) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 125ms ± 1% 67ms ± 1% -46.31% (p=0.000 n=9+9) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 77.8µs ± 1% 39.6µs ± 2% -49.10% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 74.0µs ± 3% 63.0µs ± 6% -14.92% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 1.16ms ± 1% 0.08ms ± 1% -93.47% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 1.04ms ± 5% 0.47ms ± 7% -54.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 9.50ms ± 0% 6.73ms ± 1% -29.21% (p=0.000 n=9+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 8.88ms ± 1% 13.24ms ± 1% +49.04% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 146ms ± 0% 7ms ± 1% -94.98% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 125ms ± 1% 74ms ± 1% -40.75% (p=0.000 n=10+9) name old alloc/op new alloc/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 7.58kB ± 0% 21.00kB ±11% +176.84% (p=0.000 n=7+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 39.8kB ± 6% 49.1kB ±15% +23.46% (p=0.000 n=9+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 6.48kB ± 5% 18.25kB ± 2% +181.79% (p=0.000 n=10+9) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 428kB ±20% 368kB ±13% -13.85% (p=0.003 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 1.60MB ± 0% 2.91MB ± 0% +82.49% (p=0.000 n=8+10) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 5.24MB ± 0% 5.89MB ± 0% +12.41% (p=0.000 n=10+8) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 1.15MB ± 4% 2.41MB ± 1% +110.09% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 69.8MB ± 1% 64.6MB ± 1% -7.55% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 9.77kB ±22% 21.98kB ± 4% +125.07% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 38.9kB ±23% 49.9kB ± 2% +28.28% (p=0.000 n=10+8) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 6.56kB ± 4% 20.20kB ± 3% +208.11% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 407kB ±15% 372kB ±13% -8.68% (p=0.043 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 1.65MB ± 0% 3.62MB ± 0% +118.55% (p=0.000 n=8+8) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 6.60MB ± 2% 5.65MB ± 1% -14.38% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.10MB ± 5% 2.77MB ± 1% +152.58% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 60.5MB ± 1% 67.9MB ± 1% +12.19% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 17.2kB ±10% 21.5kB ± 1% +24.91% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 35.5kB ±12% 29.1kB ± 4% -17.83% (p=0.000 n=10+9) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 138kB ± 1% 20kB ± 3% -85.34% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 344kB ±15% 363kB ±10% +5.50% (p=0.035 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 2.78MB ± 0% 3.35MB ± 0% +20.24% (p=0.000 n=8+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 4.42MB ± 6% 5.29MB ± 0% +19.67% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 17.9MB ± 0% 2.7MB ± 2% -85.21% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 65.3MB ± 0% 61.0MB ± 1% -6.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 15.9kB ± 3% 26.7kB ± 1% +67.87% (p=0.000 n=10+9) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 29.4kB ± 6% 41.6kB ± 5% +41.50% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 138kB ± 0% 23kB ± 3% -83.61% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 390kB ±19% 350kB ±11% -10.16% (p=0.015 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 2.69MB ± 4% 3.51MB ± 1% +30.22% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 4.89MB ± 1% 8.19MB ± 0% +67.68% (p=0.000 n=10+9) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 17.9MB ± 0% 3.0MB ± 2% -83.34% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 65.4MB ± 1% 62.9MB ± 1% -3.81% (p=0.000 n=10+10) name old allocs/op new allocs/op delta TruncateLoop/asc/reqs=128/ranges=4/type=get-24 50.0 ± 0% 52.4 ±10% +4.80% (p=0.017 n=8+10) TruncateLoop/asc/reqs=128/ranges=4/type=scan-24 569 ±12% 557 ±15% ~ (p=0.617 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=get-24 207 ± 4% 210 ± 5% ~ (p=0.380 n=10+10) TruncateLoop/asc/reqs=128/ranges=64/type=scan-24 6.97k ±13% 6.02k ± 9% -13.64% (p=0.000 n=10+10) TruncateLoop/asc/reqs=16384/ranges=4/type=get-24 126 ± 0% 122 ± 0% -3.17% (p=0.002 n=8+10) TruncateLoop/asc/reqs=16384/ranges=4/type=scan-24 51.9k ± 1% 42.8k ± 1% -17.59% (p=0.000 n=10+9) TruncateLoop/asc/reqs=16384/ranges=64/type=get-24 1.12k ± 1% 1.12k ± 1% +0.43% (p=0.027 n=10+10) TruncateLoop/asc/reqs=16384/ranges=64/type=scan-24 786k ± 1% 714k ± 1% -9.13% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=get-24 41.2 ± 3% 58.0 ± 3% +40.78% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=4/type=scan-24 574 ±18% 532 ±11% ~ (p=0.143 n=10+10) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=get-24 205 ± 2% 234 ± 3% +14.26% (p=0.000 n=9+9) TruncateLoop/asc/preserveOrder/reqs=128/ranges=64/type=scan-24 6.70k ± 9% 6.00k ± 9% -10.40% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=get-24 127 ± 0% 125 ± 0% -1.57% (p=0.001 n=8+9) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=4/type=scan-24 63.7k ± 1% 27.8k ± 2% -56.34% (p=0.000 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.14k ± 0% 1.14k ± 1% ~ (p=0.515 n=10+10) TruncateLoop/asc/preserveOrder/reqs=16384/ranges=64/type=scan-24 696k ± 1% 752k ± 1% +7.97% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=get-24 554 ± 1% 169 ± 2% -69.52% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 519 ± 9% 268 ± 9% -48.32% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 8.38k ± 0% 0.33k ± 3% -96.06% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 5.90k ±10% 5.89k ± 5% ~ (p=0.796 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 65.7k ± 0% 16.5k ± 0% -74.87% (p=0.002 n=8+10) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 39.5k ± 1% 27.8k ± 2% -29.54% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 1.05M ± 0% 0.02M ± 0% -98.33% (p=0.000 n=9+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 741k ± 0% 679k ± 1% -8.32% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=get-24 559 ± 0% 182 ± 2% -67.42% (p=0.000 n=9+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=4/type=scan-24 438 ± 8% 404 ±13% -7.76% (p=0.014 n=9+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=get-24 8.39k ± 0% 0.36k ± 5% -95.75% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=128/ranges=64/type=scan-24 6.38k ±11% 5.56k ± 9% -12.85% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=get-24 65.7k ± 0% 16.5k ± 0% -74.84% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=4/type=scan-24 46.8k ± 1% 67.6k ± 1% +44.65% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=get-24 1.05M ± 0% 0.02M ± 0% -98.33% (p=0.000 n=10+10) TruncateLoop/desc/preserveOrder/reqs=16384/ranges=64/type=scan-24 739k ± 1% 694k ± 1% -6.08% (p=0.000 n=10+10) ``` The truncation loops for the optimized strategy are very similar in both directions, so I tried to extract the differences out into an interface. However, this showed non-trivial slow down and increase in allocations, so I chose to have some duplicated code to get the best performance. Here is a snippet of the comparison when the interface was prototyped: ``` name old time/op new time/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 36.9µs ± 3% 44.5µs ± 3% +20.55% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 74.8µs ± 5% 88.8µs ± 3% +18.73% (p=0.000 n=9+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 64.9µs ± 1% 78.3µs ± 1% +20.72% (p=0.000 n=10+9) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 471µs ± 8% 682µs ±13% +44.73% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 6.34ms ± 1% 7.39ms ± 0% +16.47% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 11.2ms ± 1% 12.4ms ± 1% +10.36% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 6.40ms ± 2% 7.39ms ± 1% +15.47% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 70.9ms ± 1% 102.0ms ± 2% +43.87% (p=0.000 n=9+9) name old alloc/op new alloc/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 22.2kB ± 9% 30.4kB ± 0% +36.55% (p=0.000 n=10+7) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 52.2kB ± 5% 67.6kB ± 4% +29.47% (p=0.000 n=8+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 20.0kB ± 2% 32.2kB ± 1% +60.86% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 372kB ±13% 600kB ±10% +61.29% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 3.24MB ± 0% 4.45MB ± 0% +37.42% (p=0.000 n=8+7) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 6.61MB ± 0% 7.86MB ± 0% +18.90% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 2.75MB ± 2% 3.74MB ± 1% +36.03% (p=0.000 n=10+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 65.7MB ± 1% 97.2MB ± 1% +47.95% (p=0.000 n=10+10) name old allocs/op new allocs/op delta TruncateLoop/desc/reqs=128/ranges=4/type=get-24 177 ± 2% 314 ± 0% +77.40% (p=0.000 n=10+8) TruncateLoop/desc/reqs=128/ranges=4/type=scan-24 597 ± 8% 847 ± 8% +41.89% (p=0.000 n=9+10) TruncateLoop/desc/reqs=128/ranges=64/type=get-24 329 ± 3% 531 ± 2% +61.40% (p=0.000 n=10+10) TruncateLoop/desc/reqs=128/ranges=64/type=scan-24 6.02k ± 9% 9.56k ± 6% +58.80% (p=0.000 n=10+8) TruncateLoop/desc/reqs=16384/ranges=4/type=get-24 16.5k ± 0% 32.9k ± 0% +99.17% (p=0.000 n=8+8) TruncateLoop/desc/reqs=16384/ranges=4/type=scan-24 53.5k ± 1% 73.1k ± 1% +36.69% (p=0.000 n=10+9) TruncateLoop/desc/reqs=16384/ranges=64/type=get-24 17.5k ± 0% 33.9k ± 0% +94.02% (p=0.000 n=6+10) TruncateLoop/desc/reqs=16384/ranges=64/type=scan-24 727k ± 1% 1194k ± 1% +64.31% (p=0.000 n=10+10) ``` An additional knob is introduced to the batch truncation helper to indicate whether the helper can take ownership of the passed-in requests slice and reorder it as it pleases. This is the case for the Streamer, but the DistSender relies on the order of requests not being modified, so the helper makes a copy of the slice. Some tests needed an adjustment since now we process requests not necessarily in the original order, so the population of ResumeSpans might be different. Release note: None
- Loading branch information