storage: expand TBI support to include with diff #78974

amruss · 2022-03-29T17:42:29Z

TBI has been a huge help for changefeed users, as it significantly improves the performance of scans. This is especially important during times of instability or at a high scale of data. For example: When resuming a changefeed after 20 hrs of interrupted service, turning off with diff and using TBI caused catch up scan to go from 10 hours to 3 seconds for one workload. This was incredibly important to restore service quickly and ensure system stability during catch up. TBI support is one of the reasons we recommend anyone using changefeeds in production upgrade to 21.2 (example).

Unfortunately, changefeed users who use the WITH diff option do not benefit from this performance improvement. This came up recently with a prospect, who needs WITH diff but was looking for similar performance improvements.

From MVCCIncrementalIterator:
// TODO(ssd): The withDiff option requires us to iterate over
// values arbitrarily in the past so that we can populate the
// previous value of a key. This is possible since the
// IncrementalIterator has a non-timebound iterator
// internally, but it is not yet implemented.

Crude implementation started here: #69191

Jira issue: CRDB-14469

The text was updated successfully, but these errors were encountered:

The MVCCIncrementalIterator already has support for selectively seeing versions beyond the time bounds, via the NextIgnoringTime method. This method is sufficient for with-diff catchup scans -- the changes to MVCCIncrementalIterator here are limited to comment improvements. To utilize NextIgnoringTime in the CatchUpIterator we need to distinguish between the reasons that a catchup scan wants to iterate to an older version of a key. These are now handled by different methods via the simpleCatchupIter interface. There is a simpleCatchupIterAdapter to provide a trivial implementation when the underlying iterator is a SimpleMVCCIterator. There is more testing of MVCCIncrementalIterator's NextIgnoringTime, and there is now a TestCatchupScan that tests with and without diff. BenchmarkCatchUpScan now also benchmarks with-diff. Results for linear-keys, where time-bound iteration shows an improvement: BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-16 3 397238811 ns/op 253609173 B/op 3006262 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=50.00-16 6 192314659 ns/op 126847284 B/op 1503167 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=75.00-16 12 93243340 ns/op 63433929 B/op 751606 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=95.00-16 66 17636059 ns/op 12691501 B/op 150356 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=99.00-16 296 3566355 ns/op 2560979 B/op 30126 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16 2 587807376 ns/op 253689672 B/op 3006770 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-16 4 286110763 ns/op 126893106 B/op 1503413 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-16 8 140012384 ns/op 63458643 B/op 751740 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-16 43 26210520 ns/op 12695466 B/op 150380 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-16 200 5381206 ns/op 2561444 B/op 30133 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=0.00-16 3 379387200 ns/op 253577896 B/op 3006228 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=50.00-16 4 294533788 ns/op 130799366 B/op 1503677 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=75.00-16 4 254943713 ns/op 69418826 B/op 752487 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=95.00-16 5 220269848 ns/op 20291392 B/op 151411 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16 5 211813333 ns/op 10473760 B/op 31231 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=0.00-16 3 379575618 ns/op 253591349 B/op 3006275 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=50.00-16 4 282490494 ns/op 126852270 B/op 1503388 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=75.00-16 5 241938049 ns/op 63463176 B/op 751985 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=95.00-16 5 210848534 ns/op 12741640 B/op 150944 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16 6 196175587 ns/op 2602912 B/op 30652 allocs/op Fixes cockroachdb#78974 Release note: None

The MVCCIncrementalIterator already has support for selectively seeing versions beyond the time bounds, via the NextIgnoringTime method. This method is sufficient for with-diff catchup scans -- the changes to MVCCIncrementalIterator here are limited to comment improvements. To utilize NextIgnoringTime in the CatchUpIterator we need to distinguish between the reasons that a catchup scan wants to iterate to an older version of a key. These are now handled by different methods via the simpleCatchupIter interface. There is a simpleCatchupIterAdapter to provide a trivial implementation when the underlying iterator is a SimpleMVCCIterator. There is more testing of MVCCIncrementalIterator's NextIgnoringTime, and there is now a TestCatchupScan that tests with and without diff. BenchmarkCatchUpScan now also benchmarks with-diff. Results for linear-keys, where time-bound iteration shows an improvement: BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-16 3 397238811 ns/op 253609173 B/op 3006262 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=50.00-16 6 192314659 ns/op 126847284 B/op 1503167 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=75.00-16 12 93243340 ns/op 63433929 B/op 751606 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=95.00-16 66 17636059 ns/op 12691501 B/op 150356 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=99.00-16 296 3566355 ns/op 2560979 B/op 30126 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16 2 587807376 ns/op 253689672 B/op 3006770 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-16 4 286110763 ns/op 126893106 B/op 1503413 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-16 8 140012384 ns/op 63458643 B/op 751740 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-16 43 26210520 ns/op 12695466 B/op 150380 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-16 200 5381206 ns/op 2561444 B/op 30133 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=0.00-16 3 379387200 ns/op 253577896 B/op 3006228 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=50.00-16 4 294533788 ns/op 130799366 B/op 1503677 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=75.00-16 4 254943713 ns/op 69418826 B/op 752487 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=95.00-16 5 220269848 ns/op 20291392 B/op 151411 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16 5 211813333 ns/op 10473760 B/op 31231 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=0.00-16 3 379575618 ns/op 253591349 B/op 3006275 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=50.00-16 4 282490494 ns/op 126852270 B/op 1503388 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=75.00-16 5 241938049 ns/op 63463176 B/op 751985 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=95.00-16 5 210848534 ns/op 12741640 B/op 150944 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16 6 196175587 ns/op 2602912 B/op 30652 allocs/op Fixes #78974 Release note: None

The MVCCIncrementalIterator already has support for selectively seeing versions beyond the time bounds, via the NextIgnoringTime method. This method is sufficient for with-diff catchup scans -- the changes to MVCCIncrementalIterator here are limited to comment improvements. To utilize NextIgnoringTime in the CatchUpIterator we need to distinguish between the reasons that a catchup scan wants to iterate to an older version of a key. These are now handled by different methods via the simpleCatchupIter interface. There is a simpleCatchupIterAdapter to provide a trivial implementation when the underlying iterator is a SimpleMVCCIterator. There is more testing of MVCCIncrementalIterator's NextIgnoringTime, and there is now a TestCatchupScan that tests with and without diff. BenchmarkCatchUpScan now also benchmarks with-diff. Results for linear-keys, where time-bound iteration shows an improvement: BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=0.00-16 3 397238811 ns/op 253609173 B/op 3006262 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=50.00-16 6 192314659 ns/op 126847284 B/op 1503167 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=75.00-16 12 93243340 ns/op 63433929 B/op 751606 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=95.00-16 66 17636059 ns/op 12691501 B/op 150356 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=true/perc=99.00-16 296 3566355 ns/op 2560979 B/op 30126 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=0.00-16 2 587807376 ns/op 253689672 B/op 3006770 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=50.00-16 4 286110763 ns/op 126893106 B/op 1503413 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=75.00-16 8 140012384 ns/op 63458643 B/op 751740 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=95.00-16 43 26210520 ns/op 12695466 B/op 150380 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=true/withDiff=false/perc=99.00-16 200 5381206 ns/op 2561444 B/op 30133 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=0.00-16 3 379387200 ns/op 253577896 B/op 3006228 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=50.00-16 4 294533788 ns/op 130799366 B/op 1503677 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=75.00-16 4 254943713 ns/op 69418826 B/op 752487 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=95.00-16 5 220269848 ns/op 20291392 B/op 151411 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=true/perc=99.00-16 5 211813333 ns/op 10473760 B/op 31231 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=0.00-16 3 379575618 ns/op 253591349 B/op 3006275 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=50.00-16 4 282490494 ns/op 126852270 B/op 1503388 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=75.00-16 5 241938049 ns/op 63463176 B/op 751985 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=95.00-16 5 210848534 ns/op 12741640 B/op 150944 allocs/op BenchmarkCatchUpScan/linear-keys/useTBI=false/withDiff=false/perc=99.00-16 6 196175587 ns/op 2602912 B/op 30652 allocs/op Fixes cockroachdb#78974 Release note: None

amruss added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Mar 29, 2022

sumeerbhola added A-storage Relating to our storage engine (Pebble) on-disk storage. T-storage Storage Team labels Mar 29, 2022

sumeerbhola self-assigned this Apr 25, 2022

sumeerbhola mentioned this issue Apr 26, 2022

rangefeed,storage: use time-bound iterator for with-diff catchup scan #80591

Merged

craig bot closed this as completed in 0782bef Apr 27, 2022

blathers-crl bot mentioned this issue Apr 27, 2022

release-22.1: rangefeed,storage: use time-bound iterator for with-diff catchup scan #80673

Merged

jlinder added the sync-me-3 label May 24, 2022

jayshrivastava mentioned this issue Oct 24, 2022

release-21.2: rangefeed,storage: use time-bound iterator for with-diff catchup scan #90565

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: expand TBI support to include with diff #78974

storage: expand TBI support to include with diff #78974

amruss commented Mar 29, 2022 •

edited

Loading

storage: expand TBI support to include with diff #78974

storage: expand TBI support to include with diff #78974

Comments

amruss commented Mar 29, 2022 • edited Loading

amruss commented Mar 29, 2022 •

edited

Loading