Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branchless array-bitmap operations #232

Merged
merged 1 commit into from
Aug 15, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 25 additions & 8 deletions src/bitmap/store/array_store/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,23 @@ impl ArrayStore {
pub fn as_slice(&self) -> &[u16] {
&self.vec
}

/// Retains only the elements specified by the predicate.
pub fn retain(&mut self, mut f: impl FnMut(u16) -> bool) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be unsafe? If the supplied f doesn't uphold the invariant

// SAFETY: pos is always at most i because f(val) as usize is at most 1.

I suspect this will lead to undefined behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f(val) is bool

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops didn't look at the return type in the signature.

// Idea to avoid branching from "Engineering Fast Indexes for Big Data
// Applications" talk by Daniel Lemire
// (https://youtu.be/1QMgGxiCFWE?t=1242).
let slice = self.vec.as_mut_slice();
let mut pos = 0;
for i in 0..slice.len() {
let val = slice[i];
// We want to do `slice[pos] = val` but we don't need the bounds check.
// SAFETY: pos is always at most i because `f(val) as usize` is at most 1.
unsafe { *slice.get_unchecked_mut(pos) = val }
pos += f(val) as usize;
}
self.vec.truncate(pos);
}
}

impl Default for ArrayStore {
Expand Down Expand Up @@ -300,17 +317,17 @@ impl BitAndAssign<&Self> for ArrayStore {
#[cfg(not(feature = "simd"))]
{
let mut i = 0;
self.vec.retain(|x| {
i += rhs.iter().skip(i).position(|y| y >= x).unwrap_or(rhs.vec.len());
rhs.vec.get(i).map_or(false, |y| x == y)
self.retain(|x| {
i += rhs.iter().skip(i).position(|y| *y >= x).unwrap_or(rhs.vec.len());
rhs.vec.get(i).map_or(false, |y| x == *y)
});
}
}
}

impl BitAndAssign<&BitmapStore> for ArrayStore {
fn bitand_assign(&mut self, rhs: &BitmapStore) {
self.vec.retain(|x| rhs.contains(*x));
self.retain(|x| rhs.contains(x));
}
}

Expand Down Expand Up @@ -339,17 +356,17 @@ impl SubAssign<&Self> for ArrayStore {
#[cfg(not(feature = "simd"))]
{
let mut i = 0;
self.vec.retain(|x| {
i += rhs.iter().skip(i).position(|y| y >= x).unwrap_or(rhs.vec.len());
rhs.vec.get(i).map_or(true, |y| x != y)
self.retain(|x| {
i += rhs.iter().skip(i).position(|y| *y >= x).unwrap_or(rhs.vec.len());
rhs.vec.get(i).map_or(true, |y| x != *y)
});
}
}
}

impl SubAssign<&BitmapStore> for ArrayStore {
fn sub_assign(&mut self, rhs: &BitmapStore) {
self.vec.retain(|x| !rhs.contains(*x));
self.retain(|x| !rhs.contains(x));
}
}

Expand Down