Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize bloomfilter issue 7346 #7494

Closed

Conversation

Romain-E
Copy link
Contributor

Context

This PR optimizes the calculation of the number of hash functions and bits used in BloomFilter, reducing redundant computations and improving readability.

Changes

Implemented a more efficient algorithm for optimalNumOfBits
Replaced hash calculations in optimalNumOfHashFunctions
Updated tests to reflect these changes

Impact

Improves performance when using BloomFilter.

Tests affected

Tests have been modified to validate the new calculation methods.

Romain-E and others added 2 commits October 27, 2024 13:48
Refactored optimalNumOfBits and optimalNumOfHashFunctions to improve efficiency and clarity.
static int optimalNumOfHashFunctions(long n, long m) {
// (m / n) * log(2), but avoid truncation due to division!
return max(1, (int) Math.round((double) m / n * Math.log(2)));
static int optimalNumOfHashFunctions(double p) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here replaces the optimalNumOfHashFunctions method, which originally took parameters n (expected insertions) and m (total number of bits), with a new version that instead uses p (desired false positive probability). While this new approach aligns with cases where p is known, it reduces flexibility for use cases where m and n are more readily available or preferred for direct configuration.

To maintain backward compatibility and allow both methods of calculation, it would be more effective to overload optimalNumOfHashFunctions rather than replacing it. Overloading the method would allow users to specify either n and m or p as needed, based on their use case, without introducing breaking changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this comment is referring to. This method is not part of the public API, so why would we be concerned with backward compatibility?

for (int m = 0; m < 1000; m++) {
assertTrue(BloomFilter.optimalNumOfHashFunctions(n, m) > 0);
for (double p = 0.1; p > 1e-10; p/=10) {
assertTrue(BloomFilter.optimalNumOfHashFunctions(p) > 0);
}
}
}

// https://github.com/google/guava/issues/1781
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still relevant?

for (int m = 0; m < 1000; m++) {
assertTrue(BloomFilter.optimalNumOfHashFunctions(n, m) > 0);
for (double p = 0.1; p > 1e-10; p/=10) {
assertTrue(BloomFilter.optimalNumOfHashFunctions(p) > 0);
}
}
}

// https://github.com/google/guava/issues/1781
public void testOptimalNumOfHashFunctionsRounding() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this name be changed?

I realize that the existing test doesn't do this, but I think it would be good to spell out the calculation that goes from 0.03 to 5 in a comment.

copybara-service bot pushed a commit that referenced this pull request Dec 8, 2024
… `BloomFilter`.

Closes #7494. Fixes #7346.

RELNOTES=n/a
PiperOrigin-RevId: 700826220
copybara-service bot pushed a commit that referenced this pull request Dec 8, 2024
… `BloomFilter`.

Closes #7494. Fixes #7346.

Thanks to @longlong354 for the original idea.

RELNOTES=n/a
PiperOrigin-RevId: 700826220
@copybara-service copybara-service bot closed this in 3bb6101 Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package=hash type=performance Related to performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants