You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm seeing something surprising - the performance of querying the filter seems to depend somewhat on how many elements are contained within the filter. Is that expected? I would have thought lookups are O(1).
Contains performance with 1M elements in the filter:
╰─ bfuse 3.48 ms │ 4.601 ms │ 3.571 ms │ 3.577 ms │ 100 │ 100
287.2 Mitem/s │ 217.3 Mitem/s │ 280 Mitem/s │ 279.5 Mitem/s │ │
Contains performance with 9M elements in the filter:
╰─ bfuse 12.22 ms │ 16.63 ms │ 13.38 ms │ 13.48 ms │ 100 │ 100
81.76 Mitem/s │ 60.1 Mitem/s │ 74.73 Mitem/s │ 74.17 Mitem/s │ │
Contains performance with 99M elements in the filter:
╰─ bfuse 18.22 ms │ 24.79 ms │ 19.63 ms │ 19.97 ms │ 100 │ 100
54.86 Mitem/s │ 40.32 Mitem/s │ 50.92 Mitem/s │ 50.07 Mitem/s │ │
It does seem to level off from there, but that's still a 4x difference reduction in speed for a 100x increase in filter size. BinaryFuse8 vs BinaryFuse32 doesn't seem to matter. 1M elements is 4 MiB, 9M filter is 40 MiB, while 100M elements is a 2 GiB filter so maybe this is just an artifact of the microbenchmark sitting in the cache? Just wanted to double-check my assumptions.
Benchmark code:
#[divan::bench]
pub fn bfuse(bencher: divan::Bencher) {
#[derive(Clone, Copy)]
struct Counter(u64, u64);
impl Iterator for Counter {
type Item = u64;
fn next(&mut self) -> Option<Self::Item> {
let next = self.0;
if next >= self.1 {
None
} else {
self.0 += 1;
Some(next)
}
}
}
impl ExactSizeIterator for Counter {
fn len(&self) -> usize {
(self.1 - self.0) as usize
}
}
let containment_range = 1_000_000..500_000_000;
const fn lookup_range() -> std::ops::Range<u64> {
500_000..1_500_000
}
let filter =
BinaryFuse32::try_from_iterator(Counter(containment_range.start, containment_range.end))
.unwrap();
bencher
.counter(divan::counter::ItemsCount::new(
lookup_range().end - lookup_range().start,
))
.bench_local(|| {
for i in lookup_range() {
std::hint::black_box(filter.contains(&i));
}
});
}
The text was updated successfully, but these errors were encountered:
I can't say without more investigation but I wonder if indeed it has to do with the cache.
It is the case that filters with a lower false-positive rate have a slower query time. So that may be part of it too.
The implementation of lookups is here. You can see that indexing into an array of fingerprints is a part of the function, so if the array is large and not in the cache I can see how that would lead to subpar results.
I'm seeing something surprising - the performance of querying the filter seems to depend somewhat on how many elements are contained within the filter. Is that expected? I would have thought lookups are O(1).
Contains performance with 1M elements in the filter:
Contains performance with 9M elements in the filter:
Contains performance with 99M elements in the filter:
It does seem to level off from there, but that's still a 4x difference reduction in speed for a 100x increase in filter size. BinaryFuse8 vs BinaryFuse32 doesn't seem to matter. 1M elements is 4 MiB, 9M filter is 40 MiB, while 100M elements is a 2 GiB filter so maybe this is just an artifact of the microbenchmark sitting in the cache? Just wanted to double-check my assumptions.
Benchmark code:
The text was updated successfully, but these errors were encountered: