Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary fuse filter #4389

Merged
merged 9 commits into from
Aug 9, 2024
Merged

Binary fuse filter #4389

merged 9 commits into from
Aug 9, 2024

Conversation

SirTyson
Copy link
Contributor

@SirTyson SirTyson commented Jul 18, 2024

Description

Addresses #4393.

This PR adds a Binary Fuse Filter library and switches BucketListDB to use the new filter.

The Binary Fuse Filter library is based on this library, but I ended up essentially rewriting it in C++. The original library had a bug where some filter bytes were non-deterministic. While this did not affect correctness or the false positive rate, we need determinism because State Archival will hash these filters into the ledger. Additionally, the library did not support 32 bit filters, which are required for State Archival.

Currently a draft until the (non protocol breaking) associated XDR changes are merged: stellar/stellar-xdr#195

While Binary Fuse Filters are necessary for State Archival, they also significantly improve BucketListDB. Compared to the older bloom filter implementation, Bucket indexing time has decreased by 67% (which is blocking on startup in some cases), decreases index memory size by 20%, and reduces redundant disk reads by ~130x.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

graydon
graydon previously approved these changes Jul 21, 2024
Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this looks good! I would appreciate if you could back out a bunch of the gratuitous diffs from the original upstream code -- variable renames (Foo -> _foo) and reformatting (moving braces across lines, reflowing comments, etc.) -- just because they made it challenging to review the diff, but I am fairly confident the code is functionally the same. (If you wanted to make tracking diffs with upstream easer by integrating these superficial formatting changes, I have posted them over here)

I also was wondering if you considered a version that doesn't have the "populated" flag, but that returns an optional<> from the populate function, i.e. treat it as a sort of fallible constructor. I don't have a super strong preference for this -- it's not really idiomatic C++ the way it would be idiomatic Rust -- but I'm not thrilled with the 2-phase initialization and need to check-and-fail on the non-populated state. As far as I can tell, no client should ever have a reason to hold a non-populated filter, right? I wonder if there's a way to bake that further into the API.

Oh, also it looks like you didn't actually remove the previous bloom filter code from lib. If it's dead it should go.

(None of these are critical and I'll be out monday so I'm marking this approved as-is, but I'd appreciate a couple of these touchups as followups at some point, I don't think they change anything on-disk or functional)

src/util/BinaryFuseFilter.cpp Outdated Show resolved Hide resolved
src/util/BinaryFuseFilter.h Outdated Show resolved Hide resolved
src/util/BinaryFuseFilter.h Outdated Show resolved Hide resolved
@MonsieurNicolas
Copy link
Contributor

we'll have to put a note in the release notes that this will cause a reindexing of buckets on startup which will cause nodes to be offline for longer than normal

src/util/BinaryFuseFilter.h Show resolved Hide resolved
src/util/BinaryFuseFilter.cpp Outdated Show resolved Hide resolved
@SirTyson
Copy link
Contributor Author

I also was wondering if you considered a version that doesn't have the "populated" flag, but that returns an optional<> from the populate function, i.e. treat it as a sort of fallible constructor. I don't have a super strong preference for this -- it's not really idiomatic C++ the way it would be idiomatic Rust -- but I'm not thrilled with the 2-phase initialization and need to check-and-fail on the non-populated state. As far as I can tell, no client should ever have a reason to hold a non-populated filter, right? I wonder if there's a way to bake that further into the API.

I tried to enforce this with the [[nodiscard]] bool on the populate function. I agree the interface is not ideal, but because the caller constructing the filter may need to rotate the input seed, there is a "valid" reason that a caller may need to hold a non populated filter during construction while finding the correct seed. I'm not sure if an optional is right here. I think I'll just increase the max attempt count to a highly ridiculous number (as this is filter is part of consensus and construction should never fail) and throw in the constructor if it fails.

@SirTyson SirTyson force-pushed the binary-fuse-filter branch 2 times, most recently from 000a6bc to aa15f5d Compare July 31, 2024 05:11
@SirTyson SirTyson marked this pull request as ready for review July 31, 2024 05:12
@dmkozh dmkozh enabled auto-merge August 9, 2024 15:32
@dmkozh dmkozh added this pull request to the merge queue Aug 9, 2024
Merged via the queue into stellar:master with commit 55ec348 Aug 9, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants