-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary fuse filter #4389
Binary fuse filter #4389
Conversation
ea33f1e
to
efdc609
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this looks good! I would appreciate if you could back out a bunch of the gratuitous diffs from the original upstream code -- variable renames (Foo -> _foo) and reformatting (moving braces across lines, reflowing comments, etc.) -- just because they made it challenging to review the diff, but I am fairly confident the code is functionally the same. (If you wanted to make tracking diffs with upstream easer by integrating these superficial formatting changes, I have posted them over here)
I also was wondering if you considered a version that doesn't have the "populated" flag, but that returns an optional<> from the populate function, i.e. treat it as a sort of fallible constructor. I don't have a super strong preference for this -- it's not really idiomatic C++ the way it would be idiomatic Rust -- but I'm not thrilled with the 2-phase initialization and need to check-and-fail on the non-populated state. As far as I can tell, no client should ever have a reason to hold a non-populated filter, right? I wonder if there's a way to bake that further into the API.
Oh, also it looks like you didn't actually remove the previous bloom filter code from lib. If it's dead it should go.
(None of these are critical and I'll be out monday so I'm marking this approved as-is, but I'd appreciate a couple of these touchups as followups at some point, I don't think they change anything on-disk or functional)
we'll have to put a note in the release notes that this will cause a reindexing of buckets on startup which will cause nodes to be offline for longer than normal |
I tried to enforce this with the |
000a6bc
to
aa15f5d
Compare
aa15f5d
to
bb86caa
Compare
bb86caa
to
5e61b27
Compare
Description
Addresses #4393.
This PR adds a Binary Fuse Filter library and switches BucketListDB to use the new filter.
The Binary Fuse Filter library is based on this library, but I ended up essentially rewriting it in C++. The original library had a bug where some filter bytes were non-deterministic. While this did not affect correctness or the false positive rate, we need determinism because State Archival will hash these filters into the ledger. Additionally, the library did not support 32 bit filters, which are required for State Archival.
Currently a draft until the (non protocol breaking) associated XDR changes are merged: stellar/stellar-xdr#195
While Binary Fuse Filters are necessary for State Archival, they also significantly improve BucketListDB. Compared to the older bloom filter implementation, Bucket indexing time has decreased by 67% (which is blocking on startup in some cases), decreases index memory size by 20%, and reduces redundant disk reads by ~130x.
Checklist
clang-format
v8.0.0 (viamake format
or the Visual Studio extension)