-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize bloomfilter issue 7346 #7494
Conversation
Refactored optimalNumOfBits and optimalNumOfHashFunctions to improve efficiency and clarity.
static int optimalNumOfHashFunctions(long n, long m) { | ||
// (m / n) * log(2), but avoid truncation due to division! | ||
return max(1, (int) Math.round((double) m / n * Math.log(2))); | ||
static int optimalNumOfHashFunctions(double p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here replaces the optimalNumOfHashFunctions method, which originally took parameters n (expected insertions) and m (total number of bits), with a new version that instead uses p (desired false positive probability). While this new approach aligns with cases where p is known, it reduces flexibility for use cases where m and n are more readily available or preferred for direct configuration.
To maintain backward compatibility and allow both methods of calculation, it would be more effective to overload optimalNumOfHashFunctions rather than replacing it. Overloading the method would allow users to specify either n and m or p as needed, based on their use case, without introducing breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this comment is referring to. This method is not part of the public API, so why would we be concerned with backward compatibility?
for (int m = 0; m < 1000; m++) { | ||
assertTrue(BloomFilter.optimalNumOfHashFunctions(n, m) > 0); | ||
for (double p = 0.1; p > 1e-10; p/=10) { | ||
assertTrue(BloomFilter.optimalNumOfHashFunctions(p) > 0); | ||
} | ||
} | ||
} | ||
|
||
// https://github.com/google/guava/issues/1781 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still relevant?
for (int m = 0; m < 1000; m++) { | ||
assertTrue(BloomFilter.optimalNumOfHashFunctions(n, m) > 0); | ||
for (double p = 0.1; p > 1e-10; p/=10) { | ||
assertTrue(BloomFilter.optimalNumOfHashFunctions(p) > 0); | ||
} | ||
} | ||
} | ||
|
||
// https://github.com/google/guava/issues/1781 | ||
public void testOptimalNumOfHashFunctionsRounding() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this name be changed?
I realize that the existing test doesn't do this, but I think it would be good to spell out the calculation that goes from 0.03 to 5 in a comment.
… `BloomFilter`. Closes #7494. Fixes #7346. Thanks to @longlong354 for the original idea. RELNOTES=n/a PiperOrigin-RevId: 700826220
Context
This PR optimizes the calculation of the number of hash functions and bits used in BloomFilter, reducing redundant computations and improving readability.
Changes
Implemented a more efficient algorithm for
optimalNumOfBits
Replaced hash calculations in
optimalNumOfHashFunctions
Updated tests to reflect these changes
Impact
Improves performance when using
BloomFilter
.Tests affected
Tests have been modified to validate the new calculation methods.