Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improved performance of bitmap::from_trusted (3x) #578

Merged
merged 3 commits into from
Nov 6, 2021
Merged

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Nov 5, 2021

Optimizes the creation and extending of MutableBitmap. It mostly re-arranges the code to leverage bitmap operations and the trustedLen invariant inside the hot loop.

Gnuplot not found, using plotters backend
bitmap from_trusted_len 2^10                                                                            
                        time:   [343.17 ns 344.70 ns 346.28 ns]
                        change: [-61.785% -61.465% -61.171%] (p = 0.00 < 0.05)

bitmap from_trusted_len 2^12                                                                             
                        time:   [1.2747 us 1.2815 us 1.2886 us]
                        change: [-63.765% -63.286% -62.869%] (p = 0.00 < 0.05)

bitmap from_trusted_len 2^14                                                                             
                        time:   [5.0600 us 5.0821 us 5.1083 us]
                        change: [-62.807% -62.486% -62.179%] (p = 0.00 < 0.05)

bitmap from_trusted_len 2^16                                                                             
                        time:   [20.037 us 20.147 us 20.256 us]
                        change: [-63.013% -62.660% -62.334%] (p = 0.00 < 0.05)

bitmap from_trusted_len 2^18                                                                            
                        time:   [80.056 us 80.641 us 81.273 us]
                        change: [-62.689% -62.358% -62.044%] (p = 0.00 < 0.05)

bitmap from_trusted_len 2^20                                                                            
                        time:   [317.06 us 318.74 us 320.50 us]
                        change: [-62.787% -62.497% -62.187%] (p = 0.00 < 0.05)

bitmap extend_from_trusted_len_iter 2^10                                                                            
                        time:   [454.37 ns 456.53 ns 458.83 ns]
                        change: [-54.421% -54.039% -53.670%] (p = 0.00 < 0.05)

bitmap extend_from_trusted_len_iter 2^12                                                                             
                        time:   [1.3878 us 1.3938 us 1.4003 us]
                        change: [-60.631% -60.227% -59.785%] (p = 0.00 < 0.05)

bitmap extend_from_trusted_len_iter 2^14                                                                             
                        time:   [5.1372 us 5.1573 us 5.1786 us]
                        change: [-61.752% -61.251% -60.609%] (p = 0.00 < 0.05)

bitmap extend_from_trusted_len_iter 2^16                                                                             
                        time:   [20.073 us 20.211 us 20.355 us]
                        change: [-62.762% -62.438% -62.100%] (p = 0.00 < 0.05)

bitmap extend_from_trusted_len_iter 2^18                                                                            
                        time:   [79.685 us 80.196 us 80.758 us]
                        change: [-62.818% -62.511% -62.186%] (p = 0.00 < 0.05)

bitmap extend_from_trusted_len_iter 2^20                                                                            
                        time:   [319.68 us 321.64 us 323.73 us]
                        change: [-62.872% -62.617% -62.346%] (p = 0.00 < 0.05)

@jorgecarleitao jorgecarleitao added the enhancement An improvement to an existing feature label Nov 5, 2021
@codecov
Copy link

codecov bot commented Nov 5, 2021

Codecov Report

Merging #578 (967b19a) into main (0dda942) will increase coverage by 0.02%.
The diff coverage is 90.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #578      +/-   ##
==========================================
+ Coverage   78.88%   78.90%   +0.02%     
==========================================
  Files         395      395              
  Lines       24678    24713      +35     
==========================================
+ Hits        19467    19500      +33     
- Misses       5211     5213       +2     
Impacted Files Coverage Δ
src/bitmap/mutable.rs 89.81% <89.58%> (+0.49%) ⬆️
tests/it/bitmap/mutable.rs 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0dda942...967b19a. Read the comment docs.

@jorgecarleitao jorgecarleitao changed the title Improved performance of bitmap::from_trusted (5x) Improved performance of bitmap::from_trusted Nov 5, 2021
@jorgecarleitao jorgecarleitao marked this pull request as draft November 5, 2021 07:08
@jorgecarleitao jorgecarleitao changed the title Improved performance of bitmap::from_trusted Improved performance of bitmap::from_trusted (2x) Nov 5, 2021
@ritchie46
Copy link
Collaborator

Yeah buddy! 🚀

@jorgecarleitao jorgecarleitao marked this pull request as ready for review November 5, 2021 08:40
@jorgecarleitao jorgecarleitao changed the title Improved performance of bitmap::from_trusted (2x) Improved performance of bitmap::from_trusted (3x) Nov 5, 2021
@jorgecarleitao
Copy link
Owner Author

jorgecarleitao commented Nov 5, 2021

Optimized it a bit more (to 3x in total) by using chunks of 64 bits instead of chunks of 8. A tinny bit more complex, but 33% is worth imo. 🚀

unsafe fn get_byte_unchecked(len: usize, iterator: &mut impl Iterator<Item = bool>) -> u8 {
let mut byte_accum: u8 = 0;
let mut mask: u8 = 1;
for _ in 0..len {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this could be improved by chunking the iterator, and doing something like:

          chunk.iter()
                .enumerate()
                .for_each(|(i, b)| {
                    *byte |= if b { 1 << i } else { 0 };
                });

At least for the binary comparison code, this was leading to great, unrolled, code.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't chunking only available in slices, e.g. &[bool].chunk_exact (or in bitmaps, via u8 chunks)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are right. There might be some remaining optimizations possible, let me take a look :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants