Improved performance of concatenating non-aligned validities (15x) #291

jorgecarleitao · 2021-08-16T23:57:20Z

This PR significantly improves the performance of concatenating arrays whose lengths are not a multiple of 8 by improving the performance of concatenating bitmaps.

Before this PR, we concatenated bitmaps by iterating bit by bit and setting bit by bit. However, there is a more efficient way of doing this via byte operations. Specifically, given a mutable bitmap [10101000, --101010] (length = 8+6=14) a bitmap can be concatenated to by shifts. E.g. [00000011, a, b, ..., c] can be concatenated to it by something like

00000011 << 6 and OR it
merge a with 00000011 with an offset of 2 and append
merge b with a with an offset of 2 and append
...
append c

This results in a significantly less number of instructions, lookups, etc.

This improves performance of almost operations that in some way concatenate validities. It includes:

Growable API (concat, filter, merge-sort)
lower-level bitmap concatenation

git checkout afb05d2511d495075180436dcd16af2e4b6ed71a
cargo bench --no-default-features --features benchmarks,compute --bench bitmap --bench concat --bench filter_kernels -- "2\^20"
git checkout improve_perf
cargo bench --no-default-features --features benchmarks,compute --bench bitmap --bench concat --bench filter_kernels -- "2\^20"

bitmap extend aligned 2^20                                                                             
                        time:   [3.7567 us 3.7847 us 3.8217 us]
                        change: [-3.0540% +1.6941% +6.6158%] (p = 0.49 > 0.05)

bitmap extend unaligned 2^20                                                                            
                        time:   [247.46 us 248.23 us 249.13 us]
                        change: [-75.411% -75.289% -75.172%] (p = 0.00 < 0.05)

bitmap extend_constant aligned 2^20                                                                             
                        time:   [2.6766 us 2.6822 us 2.6883 us]
                        change: [-99.536% -99.534% -99.532%] (p = 0.00 < 0.05)

bitmap extend_constant unaligned 2^20                                                                             
                        time:   [2.6916 us 2.6970 us 2.7026 us]
                        change: [-99.531% -99.529% -99.527%] (p = 0.00 < 0.05)

int32 concat aligned 2^20                                                                            
                        time:   [487.53 us 488.09 us 488.75 us]
                        change: [-93.566% -93.548% -93.530%] (p = 0.00 < 0.05)

int32 concat unaligned 2^20                                                                            
                        time:   [758.76 us 759.91 us 761.29 us]
                        change: [-89.977% -89.951% -89.923%] (p = 0.00 < 0.05)

boolean concat aligned 2^20                                                                            
                        time:   [224.02 us 224.50 us 225.02 us]
                        change: [-98.193% -98.187% -98.181%] (p = 0.00 < 0.05)

boolean concat unaligned 2^20                                                                            
                        time:   [708.63 us 710.23 us 712.14 us]
                        change: [-94.305% -94.286% -94.268%] (p = 0.00 < 0.05)

filter 2^20 f32         time:   [2.5137 ms 2.5199 ms 2.5274 ms]                             
                        change: [-2.6963% -2.3576% -1.9729%] (p = 0.00 < 0.05)

filter null 2^20 f32    time:   [7.6607 ms 7.6773 ms 7.6954 ms]                                 
                        change: [-12.051% -11.757% -11.473%] (p = 0.00 < 0.05)

codecov · 2021-08-17T00:09:07Z

Codecov Report

Merging #291 (b556812) into main (0742edd) will increase coverage by 0.10%.
The diff coverage is 97.12%.

@@            Coverage Diff             @@
##             main     #291      +/-   ##
==========================================
+ Coverage   77.25%   77.35%   +0.10%     
==========================================
  Files         315      315              
  Lines       20791    20911     +120     
==========================================
+ Hits        16062    16176     +114     
- Misses       4729     4735       +6

Impacted Files	Coverage Δ
src/bitmap/utils/chunk_iterator/mod.rs	`85.91% <ø> (ø)`
src/bitmap/utils/mod.rs	`100.00% <ø> (ø)`
src/bitmap/mutable.rs	`89.13% <93.84%> (+1.06%)`	⬆️
src/array/growable/boolean.rs	`80.76% <100.00%> (ø)`
src/array/growable/utils.rs	`100.00% <100.00%> (ø)`
tests/it/bitmap/mutable.rs	`100.00% <100.00%> (ø)`
src/io/json_integration/write.rs	`0.00% <0.00%> (-6.25%)`	⬇️
src/io/csv/write/mod.rs	`72.00% <0.00%> (-4.00%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0742edd...b556812. Read the comment docs.

jorgecarleitao · 2021-08-17T08:29:31Z

cc @ritchie46 and @Dandandan , since you like these things :)

ritchie46 · 2021-08-17T08:41:38Z

Love it! I see some interesting bit comments. Have you got a summary of what you do? A memcpy instead of iterators?

Dandandan · 2021-08-17T08:46:02Z

cc @ritchie46 and @Dandandan , since you like these things :)

Amazing 😎

jorgecarleitao · 2021-08-17T10:01:01Z

Love it! I see some interesting bit comments. Have you got a summary of what you do? A memcpy instead of iterators?

:) Updated the description with the idea 👍

sundy-li · 2021-08-24T05:48:13Z

A bug was found in #325

jorgecarleitao added the enhancement An improvement to an existing feature label Aug 16, 2021

Added bench.

afb05d2

jorgecarleitao force-pushed the improve_perf branch 2 times, most recently from e03cde4 to 17cade2 Compare August 17, 2021 08:24

jorgecarleitao changed the title ~~Improved performance of concatenating non-aligned validities (+4x)~~ Improved performance of concatenating non-aligned validities (15x) Aug 17, 2021

Improved performance of extend bitmap.

b556812

jorgecarleitao force-pushed the improve_perf branch from 17cade2 to b556812 Compare August 17, 2021 10:00

jorgecarleitao merged commit fa3b003 into main Aug 18, 2021

jorgecarleitao deleted the improve_perf branch August 18, 2021 01:15

Dandandan mentioned this pull request Aug 20, 2021

Optimize MutableArrayData::extend for null buffers apache/arrow-rs#397

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved performance of concatenating non-aligned validities (15x) #291

Improved performance of concatenating non-aligned validities (15x) #291

jorgecarleitao commented Aug 16, 2021 •

edited

Loading

codecov bot commented Aug 17, 2021 •

edited

Loading

jorgecarleitao commented Aug 17, 2021

ritchie46 commented Aug 17, 2021

Dandandan commented Aug 17, 2021

jorgecarleitao commented Aug 17, 2021

sundy-li commented Aug 24, 2021 •

edited

Loading

Improved performance of concatenating non-aligned validities (15x) #291

Improved performance of concatenating non-aligned validities (15x) #291

Conversation

jorgecarleitao commented Aug 16, 2021 • edited Loading

codecov bot commented Aug 17, 2021 • edited Loading

Codecov Report

jorgecarleitao commented Aug 17, 2021

ritchie46 commented Aug 17, 2021

Dandandan commented Aug 17, 2021

jorgecarleitao commented Aug 17, 2021

sundy-li commented Aug 24, 2021 • edited Loading

jorgecarleitao commented Aug 16, 2021 •

edited

Loading

codecov bot commented Aug 17, 2021 •

edited

Loading

sundy-li commented Aug 24, 2021 •

edited

Loading