Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genericize RoaringBitmap and implements Roaring32 and Roaring64 #260

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

grim7reaper
Copy link
Contributor

Roaring32 is the new name of RoaringBitmap (which is now the core generic implementation, RoaringBitmap<V>).
Roaring64 is the 64-bit version and replaces RoaringTreemap.

@grim7reaper grim7reaper force-pushed the roaring64 branch 5 times, most recently from d8bb37b to 494327b Compare August 13, 2023 14:37
/// A generic compressed bitmap using the [Roaring bitmap compression scheme](https://roaringbitmap.org/).
#[derive(PartialEq)]
pub struct RoaringBitmap<V: Value> {
containers: Vec<container::Container<V>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that using a Vec for RoaringBitmap64 might not be the best idea. For RoaringBitmap32, in the worst case, we're moving around 64k containers, but for RoaringBitmap64, inserting a new container may involve quite a lot of copying, which is why the original used a BTreeMap instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's depends on the distribution of your dataset/workload, using a BTreeMap isn't great either in term locality/pointer chasing.

I would say that for most of the case it should be fine, at least the paper didn't notice large difference in memory usage.

But maybe both approach should be offered?

This will allows to easily implement Roaring64, a 64-bit RoaringBitmap
using the TwoLevelRoaringBitmap approach, which is up to 11x faster than
the RoaringTreemap approach and reuse most of Roaring32 code.

Note that RoaringBitmap::full have been removed because it won't scale
for Roaring64.

Comment out RoaringTreemap for now, will be replaced in the next commit.
Benchmark results on insert_range (only bench available for 64-bit):

    group                       roaring64                               treemap
    -----                       ---------                               -------
    from_empty_1000             1.00     87.6±4.76ns 10.6 GElem/sec     1.65    144.2±1.28ns  6.5 GElem/sec
    from_empty_10000            1.00    166.8±9.32ns 55.8 GElem/sec     1.28    213.6±7.26ns 43.6 GElem/sec
    from_empty_8589934590       1.01    151.5±0.99ms 52.8 GElem/sec     1.00    149.7±1.00ms 53.5 GElem/sec
    pre_populated_1000          1.00   139.3±19.83ns  6.7 GElem/sec     1.30   180.8±20.70ns  5.2 GElem/sec
    pre_populated_10000         1.00   235.4±83.29ns 39.6 GElem/sec     1.26  295.9±106.25ns 31.5 GElem/sec
    pre_populated_8589934590    1.00     74.8±2.56ms 107.0 GElem/sec    1.01     75.3±1.82ms 106.3 GElem/sec
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for this big PR 🤗

Would it be possible to make the IntoIter type public, please? I was trying your new branch and was unfortunately unable to use it. I don't know if there is a flag or something to make sure cargo warns about it...

@grim7reaper
Copy link
Contributor Author

Would it be possible to make the IntoIter type public, please? I was trying your new branch and was unfortunately unable to use it. I don't know if there is a flag or something to make sure cargo warns about it...

Done in the last commit
Dunno about a flag/clippy lint either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants