Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better bloom filter metrics #3612

Merged
merged 1 commit into from
Dec 15, 2022

Conversation

SirTyson
Copy link
Contributor

Description

Resolves #3604

This PR provides better metrics and log messages for BucketListDB bloom filters. Previously, it appeared that the bloom filter false positive rate was too high due to element count underestimation. However, this was due to faulty metrics. In practice, the false positive rate is significantly better than our target and underestimation is not an issue.

To calculate false positive rates previously, we used the number of ledger entries queried. However, one ledger entry query may lead to several bloom filter queries since we have to check multiple levels of the BucketList for an entry. This PR adds an explicit bloom filter query metric to provide more accurate false positive rates and also logs bloom filter setup parameters.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

"Underestimated bloom filter size. Estimated entry "
"count: {}, Actual: {}",
estimatedNumElems, count);
CLOG_DEBUG(Bucket,
Copy link
Contributor

@MonsieurNicolas MonsieurNicolas Nov 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason we had this at WARNING by default was that

  • we expect this to not happen that often
  • when it does, I thought you had done research that showed that underestimating could have fairly dramatic (bad) perf impact, and should probably be surfaced

@SirTyson SirTyson force-pushed the bloom-filter-estimation branch 2 times, most recently from ee5ceb0 to 3fe913c Compare December 5, 2022 19:01
@MonsieurNicolas
Copy link
Contributor

r+ ecae078

@latobarita latobarita merged commit 5498336 into stellar:master Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate impact of bloom filter element underestimation
3 participants