-
-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LogMergePolicy knob del_docs_percentage_before_merge
#1238
Conversation
…hreshold of deleted docs
Codecov Report
@@ Coverage Diff @@
## main #1238 +/- ##
==========================================
- Coverage 94.19% 94.16% -0.04%
==========================================
Files 206 206
Lines 34891 34913 +22
==========================================
+ Hits 32866 32876 +10
- Misses 2025 2037 +12
Continue to review full report at Codecov.
|
src/indexer/log_merge_policy.rs
Outdated
@@ -8,6 +8,7 @@ const DEFAULT_LEVEL_LOG_SIZE: f64 = 0.75; | |||
const DEFAULT_MIN_LAYER_SIZE: u32 = 10_000; | |||
const DEFAULT_MIN_NUM_SEGMENTS_IN_MERGE: usize = 8; | |||
const DEFAULT_MAX_DOCS_BEFORE_MERGE: usize = 10_000_000; | |||
const DEFAULT_MAX_DEL_DOCS_PCT: u8 = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lucene default for a similar knob is 33% https://github.com/apache/lucene/blob/c64e5fe84c4990968844193e3a62f4ebbba638ea/lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java#L91
100% is effectively a no-op over the current policy. Lowering it to 33% causes some tests to fail, probably worth working through it though, if the approach makes sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should lower it, not sure if 33% is not too early though.
…icy tests" This reverts commit 425c29b.
del_docs_percentage_before_merge
@shikhar LGTM! Thank you for taking care of that very old issue! |
I added a bunch of unit test, and switch back to from (percentage, u8) into (ratio, f32) for different "soft reasons". I had to make a change anyway, because in the previous implem, the the actual first value where we observed the switch was pretty unexpected. |
Addresses #115
If this % of deleted documents is exceeded for a segment, it will always be proposed for a merge, even if it is the only one at that level.
TODOs
determine what the default threshold should beChange DEFAULT_DEL_DOCS_RATIO_BEFORE_MERGE to 0.3 #1240