Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add iter ops #139

Closed
wants to merge 1 commit into from
Closed

Add iter ops #139

wants to merge 1 commit into from

Conversation

saik0
Copy link
Contributor

@saik0 saik0 commented Jan 12, 2022

Closes #58
Closes #109
Fixes #57

Adds operators to iterators of bitmaps

TODO

  • Treemap ops
  • Tests
  • Examples in docs

Comment on lines +55 to +56
where
T: private::Roaring,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why you want to restrict this trait to our library types? What's dangerous in implementing that on, for example, a wrapper of RoaringBitmap/Treemap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding new non-defaulted methods would be a breaking change downstream. I was thinking this was for public API compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that it would be a "soft" breaking if we add new methods to a trait. I would prefer that we make it free for users to implement, at least for a first release.

///
/// This trait is parameterized by a sealed trait and cannot be implemented for types outside
/// of this crate
pub trait IterExt<T>: IntoIterator<Item = T>
Copy link
Member

@Kerollmops Kerollmops Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could find a better name for this trait, as it is a trait that provides set operations on multiple sets at the same time, maybe MultiOps, RoaringExt or something like that? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We could also create a Roaring prelude that includes iter extensions, RoaringBitmap, RoaringTreemap

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum... not sure we have enough types and traits for now for that to be interesting, but maybe later with more types.

@saik0
Copy link
Contributor Author

saik0 commented Jan 19, 2022

I see value in exposing an API for lazy ops regardless of whether or not we expose the iterator extensions that use them internally. To generalize to any kind of aggregation the user might need to do. The intuition is to have an API that allows users to do lazy operations with a newtype, that gets "repaired" (to use the same term as other lang impls) when it's unwrapped.

Thinking out loud in pseudo-rust about the shape of the API, not the names.

struct Lazy {
  bitmap: RoaringBitmap
}

impl BitOrAssign<&Lazy> for Lazy {
 // .. do a lazy or
}

impl BitXorAssign<&Lazy> for Lazy {
 // .. do a lazy xor
}

impl RoaringBitmap {
  fn into_lazy(self) -> Lazy
}

impl Lazy {
  /// ensure all the stores are the correct type before unwrapping (repairAfterLazy in other langs)
  fn into_bitmap(self) -> RoaringBitmap
}

fn main() -> {
  // Roughly equivalent to current IterExt (this does not use Cow)
  let bitmaps: Vec<RoaringBitmap> = todo!();
  bitmaps.into_iter().map(RoaringBitmap::into_lazy).reduce(|acc, other| { acc |= other; acc }).into_bitmap();

  // Now do it in parallel!
  use rayon::prelude::*;
  let more_bitmaps: Vec<RoaringBitmap> = todo!();
  bitmaps.into_par_iter().map(RoaringBitmap::into_lazy).reduce(|acc, other| { acc |= other; acc }).into_bitmap();
}

I'd need to think more about what it might look like for borrowed bitmaps. 🤔

@Kerollmops
Copy link
Member

To generalize to any kind of aggregation the user might need to do. The intuition is to have an API that allows users to do lazy operations with a newtype, that gets "repaired" (to use the same term as other lang impls) when it's unwrapped.

I like the idea, but we must absolutely make sure to document it enough. That's not clear what a Lazy type does. I like the idea of wrapping the bitmaps instead of methods to do lazy operations on them. I like the Lazy wrapping type as it look a lot like the Wrapping struct of the std.

@Kerollmops Kerollmops mentioned this pull request Mar 9, 2022
@bors bors bot closed this in 2828be5 Aug 30, 2022
not-jan pushed a commit to not-jan/roaring-rs that referenced this pull request Aug 31, 2022
223: Implements multioperation for the bitmaps and tree maps r=Kerollmops a=irevoire

Fixes RoaringBitmap#57, closes RoaringBitmap#58, closes RoaringBitmap#109, closes RoaringBitmap#139, and closes RoaringBitmap#219.

There is a lot of performance improvement, but here is a before / after on the operations that were the faster currently (when we can do assign between owned bitmaps).

## And
```
group                                                    after                                  before
-----                                                    -----                                  ------
Successive And/Multi And Owned/census-income             1.00     14.6±0.25µs        ? ?/sec    15.42   224.9±0.76µs        ? ?/sec
Successive And/Multi And Owned/census-income_srt         1.00     14.2±0.25µs        ? ?/sec    3.98     56.4±8.22µs        ? ?/sec
Successive And/Multi And Owned/census1881                1.00     20.7±0.33µs        ? ?/sec    37.18   770.1±1.62µs        ? ?/sec
Successive And/Multi And Owned/census1881_srt            1.00     25.8±1.29µs        ? ?/sec    1.12     28.8±0.09µs        ? ?/sec
Successive And/Multi And Owned/weather_sept_85           1.00     60.7±2.48µs        ? ?/sec    2.15    130.2±2.96µs        ? ?/sec
Successive And/Multi And Owned/weather_sept_85_srt       1.00     48.3±2.21µs        ? ?/sec    2.32    112.2±1.07µs        ? ?/sec
Successive And/Multi And Owned/wikileaks-noquotes        1.00     24.4±0.50µs        ? ?/sec    2.73     66.6±0.27µs        ? ?/sec
Successive And/Multi And Owned/wikileaks-noquotes_srt    1.00     20.3±0.58µs        ? ?/sec    1.09     22.0±0.30µs        ? ?/sec
```

## Or
```
group                                                    after                                  before
-----                                                    -----                                  ------
Successive Or/Multi Or Owned/census-income               1.00    629.3±4.46µs        ? ?/sec    2.29  1441.4±41.36µs        ? ?/sec
Successive Or/Multi Or Owned/census-income_srt           1.00    582.5±1.81µs        ? ?/sec    1.61    937.8±4.03µs        ? ?/sec
Successive Or/Multi Or Owned/census1881                  1.00   1143.4±4.55µs        ? ?/sec    3.48      4.0±0.07ms        ? ?/sec
Successive Or/Multi Or Owned/census1881_srt              1.00    743.4±4.40µs        ? ?/sec    3.49      2.6±0.02ms        ? ?/sec
Successive Or/Multi Or Owned/weather_sept_85             1.00      2.9±0.02ms        ? ?/sec    1.06      3.1±0.01ms        ? ?/sec
Successive Or/Multi Or Owned/weather_sept_85_srt         1.00   1344.5±7.80µs        ? ?/sec    1.06  1426.5±38.08µs        ? ?/sec
Successive Or/Multi Or Owned/wikileaks-noquotes          1.00    476.3±4.43µs        ? ?/sec    5.27      2.5±0.01ms        ? ?/sec
Successive Or/Multi Or Owned/wikileaks-noquotes_srt      1.00    259.4±3.90µs        ? ?/sec    7.17   1860.0±3.30µs        ? ?/sec
```

Co-authored-by: saik0 <[email protected]>
Co-authored-by: Kerollmops <[email protected]>
Co-authored-by: Tamo <[email protected]>
Co-authored-by: Irevoire <[email protected]>
Co-authored-by: Tamo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Set operations for multiple sets at a time
2 participants