Optimize bloomfilter issue 7346 #7494

Romain-E · 2024-10-27T12:56:01Z

Context

This PR optimizes the calculation of the number of hash functions and bits used in BloomFilter, reducing redundant computations and improving readability.

Changes

Implemented a more efficient algorithm for optimalNumOfBits
Replaced hash calculations in optimalNumOfHashFunctions
Updated tests to reflect these changes

Impact

Improves performance when using BloomFilter.

Tests affected

Tests have been modified to validate the new calculation methods.

Refactored optimalNumOfBits and optimalNumOfHashFunctions to improve efficiency and clarity.

rohitnandi12 · 2024-11-12T14:22:45Z

guava/src/com/google/common/hash/BloomFilter.java

-  static int optimalNumOfHashFunctions(long n, long m) {
-    // (m / n) * log(2), but avoid truncation due to division!
-    return max(1, (int) Math.round((double) m / n * Math.log(2)));
+  static int optimalNumOfHashFunctions(double p) {


The change here replaces the optimalNumOfHashFunctions method, which originally took parameters n (expected insertions) and m (total number of bits), with a new version that instead uses p (desired false positive probability). While this new approach aligns with cases where p is known, it reduces flexibility for use cases where m and n are more readily available or preferred for direct configuration.

To maintain backward compatibility and allow both methods of calculation, it would be more effective to overload optimalNumOfHashFunctions rather than replacing it. Overloading the method would allow users to specify either n and m or p as needed, based on their use case, without introducing breaking changes.

I'm not sure what this comment is referring to. This method is not part of the public API, so why would we be concerned with backward compatibility?

eamonnmcmanus · 2024-11-27T23:49:36Z

guava-tests/test/com/google/common/hash/BloomFilterTest.java

-      for (int m = 0; m < 1000; m++) {
-        assertTrue(BloomFilter.optimalNumOfHashFunctions(n, m) > 0);
+      for (double p = 0.1; p > 1e-10; p/=10) {
+        assertTrue(BloomFilter.optimalNumOfHashFunctions(p) > 0);
      }
    }
  }

  // https://github.com/google/guava/issues/1781


Is this still relevant?

eamonnmcmanus · 2024-11-27T23:50:20Z

guava-tests/test/com/google/common/hash/BloomFilterTest.java

-      for (int m = 0; m < 1000; m++) {
-        assertTrue(BloomFilter.optimalNumOfHashFunctions(n, m) > 0);
+      for (double p = 0.1; p > 1e-10; p/=10) {
+        assertTrue(BloomFilter.optimalNumOfHashFunctions(p) > 0);
      }
    }
  }

  // https://github.com/google/guava/issues/1781
  public void testOptimalNumOfHashFunctionsRounding() {


Should this name be changed?

I realize that the existing test doesn't do this, but I think it would be good to spell out the calculation that goes from 0.03 to 5 in a comment.

… `BloomFilter`. Closes #7494. Fixes #7346. RELNOTES=n/a PiperOrigin-RevId: 700826220

@longlong354

… `BloomFilter`. Closes #7494. Fixes #7346. Thanks to @longlong354 for the original idea. RELNOTES=n/a PiperOrigin-RevId: 700826220

Romain-E and others added 2 commits October 27, 2024 13:48

google#7346 feat(BloomFilter): Optimize hash function calculation

f4845c6

Refactored optimalNumOfBits and optimalNumOfHashFunctions to improve efficiency and clarity.

Merge branch 'google:master' into optimize-bloomfilter-issue-7346

5c182fd

rohitnandi12 reviewed Nov 12, 2024

View reviewed changes

kluever assigned eamonnmcmanus Nov 14, 2024

kluever added package=hash type=performance Related to performance labels Nov 14, 2024

eamonnmcmanus requested changes Nov 27, 2024

View reviewed changes

copybara-service bot pushed a commit that referenced this pull request Dec 8, 2024

Improve the calculation of the optimal number of hash functions for a…

3858557

… `BloomFilter`. Closes #7494. Fixes #7346. RELNOTES=n/a PiperOrigin-RevId: 700826220

copybara-service bot mentioned this pull request Dec 8, 2024

Improve the calculation of the optimal number of hash functions for a BloomFilter. #7538

Merged

copybara-service bot closed this in 3bb6101 Dec 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize bloomfilter issue 7346 #7494

Optimize bloomfilter issue 7346 #7494

Romain-E commented Oct 27, 2024

rohitnandi12 Nov 12, 2024

eamonnmcmanus Nov 27, 2024

eamonnmcmanus Nov 27, 2024

eamonnmcmanus Nov 27, 2024

Optimize bloomfilter issue 7346 #7494

Optimize bloomfilter issue 7346 #7494

Conversation

Romain-E commented Oct 27, 2024

Context

Changes

Impact

Tests affected

rohitnandi12 Nov 12, 2024

Choose a reason for hiding this comment

eamonnmcmanus Nov 27, 2024

Choose a reason for hiding this comment

eamonnmcmanus Nov 27, 2024

Choose a reason for hiding this comment

eamonnmcmanus Nov 27, 2024

Choose a reason for hiding this comment