-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drain unnecessary levels when level_compaction_dynamic_level_bytes=true
#11340
Conversation
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR, thanks!
// Otherwise L5 will always be eligible for compaction and | ||
// WaitForCompact(true) below will be stuck waiting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting means sleeping, not infinitely rewriting L5, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be infinite rewrite of L5 since it always has a eligible compaction score but nothing is written to the last level during each compaction of L5. This causes the WaitForCompact(true)
in this unit test to wait forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm that isn't ideal. We should see if there's a way to not pick a draining compaction when the output level is not necessarily lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I've updated the lowest_unnecessary_level_
logic to exclude penultimate level with per key placement. Note that this problem may still exist in general with level compaction + per key placement. It can happen when penultimate level is over target size but with all hot data.
options.level_compaction_dynamic_level_bytes = true; | ||
Reopen(options); | ||
// note that last level (L6) should be empty | ||
ASSERT_EQ("1,0,0,1,2", FilesPerLevel()); | ||
verify_db(); | ||
|
||
// turning the options on and off should both be safe | ||
options.level_compaction_dynamic_level_bytes = false; | ||
Reopen(options); | ||
MoveFilesToLevel(1); | ||
ASSERT_EQ("0,1,0,1,2", FilesPerLevel()); | ||
verify_db(); | ||
|
||
// newly flushed file is also pushed down | ||
options.level_compaction_dynamic_level_bytes = true; | ||
Reopen(options); | ||
ASSERT_EQ("0,0,1,1,2", FilesPerLevel()); | ||
verify_db(); | ||
|
||
// files will be pushed down to last level (L6) | ||
options.allow_ingest_behind = false; | ||
Reopen(options); | ||
ASSERT_EQ("0,0,0,1,1,2", FilesPerLevel()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this deleted because it's difficult to adapt? Maybe the assertions can just verify the number of files in the bottom level (L5 since num_levels=6) is zero before the final Reopen()
, and nonzero after it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did find it difficult to adapt since there is some unnecessary level in the LSM which can be drained in the background. I could not just disable auto compaction since it would also disallow trivially moving files down the LSM during DB open. However, I eventually found out a better reason to remove the test :) Otherwise, I should probably just verify the number of last level as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, IIRC there was a comment elsewhere in this PR referring to the better reason - but allow_ingest_behind feels so close to working with leveled compaction that we might as well start testing it when convenient and when there's known risks. Maybe the test can say support is not officially released since on rare occasions people infer what's supported by looking at test expectations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just verify the number of files in the bottom level (L5 since num_levels=6) is zero before the final Reopen(), and nonzero after it.
I tried fixing this, but realize this does not work (without adding support for level compaction + ingest behind) as the second last level can become unnecessary and compacted down even when the last level is reserve for ingest behind. Making it work requires more changes that almost feels like supporting allow_ingest_behind with level compaction in this PR. I filed a task and plan to follow up with a separate PR for official support.
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
// Otherwise L5 will always be eligible for compaction and | ||
// WaitForCompact(true) below will be stuck waiting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I've updated the lowest_unnecessary_level_
logic to exclude penultimate level with per key placement. Note that this problem may still exist in general with level compaction + per key placement. It can happen when penultimate level is over target size but with all hot data.
options.level_compaction_dynamic_level_bytes = true; | ||
Reopen(options); | ||
// note that last level (L6) should be empty | ||
ASSERT_EQ("1,0,0,1,2", FilesPerLevel()); | ||
verify_db(); | ||
|
||
// turning the options on and off should both be safe | ||
options.level_compaction_dynamic_level_bytes = false; | ||
Reopen(options); | ||
MoveFilesToLevel(1); | ||
ASSERT_EQ("0,1,0,1,2", FilesPerLevel()); | ||
verify_db(); | ||
|
||
// newly flushed file is also pushed down | ||
options.level_compaction_dynamic_level_bytes = true; | ||
Reopen(options); | ||
ASSERT_EQ("0,0,1,1,2", FilesPerLevel()); | ||
verify_db(); | ||
|
||
// files will be pushed down to last level (L6) | ||
options.allow_ingest_behind = false; | ||
Reopen(options); | ||
ASSERT_EQ("0,0,0,1,1,2", FilesPerLevel()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just verify the number of files in the bottom level (L5 since num_levels=6) is zero before the final Reopen(), and nonzero after it.
I tried fixing this, but realize this does not work (without adding support for level compaction + ingest behind) as the second last level can become unnecessary and compacted down even when the last level is reserve for ingest behind. Making it work requires more changes that almost feels like supporting allow_ingest_behind with level compaction in this PR. I filed a task and plan to follow up with a separate PR for official support.
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ed0e801
to
697e66f
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
697e66f
to
91d0486
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
91d0486
to
ddb7479
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…11525) Summary: after #11321 and #11340 (both included in RocksDB v8.2), migration from `level_compaction_dynamic_level_bytes=false` to `level_compaction_dynamic_level_bytes=true` is automatic by RocksDB and requires no manual compaction from user. Making the option true by default as it has several advantages: 1. better space amplification guarantee (a more stable LSM shape). 2. compaction is more adaptive to write traffic. 3. automatic draining of unneeded levels. Wiki is updated with more detail: https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#option-level_compaction_dynamic_level_bytes-and-levels-target-size. The PR mostly contains fixes for unit tests as they assumed `level_compaction_dynamic_level_bytes=false`. Most notable change is commit f742be3 and b1928e4 which override the default option in DBTestBase to still set `level_compaction_dynamic_level_bytes=false` by default. This helps to reduce the change needed for unit tests. I think this default option override in unit tests is okay since the behavior of `level_compaction_dynamic_level_bytes=true` is tested by explicitly setting this option. Also, `level_compaction_dynamic_level_bytes=false` may be more desired in unit tests as it makes it easier to create a desired LSM shape. Comment for option `level_compaction_dynamic_level_bytes` is updated to reflect this change and change made in #10057. Pull Request resolved: #11525 Test Plan: `make -j32 J=32 check` several times to try to catch flaky tests due to this option change. Reviewed By: ajkr Differential Revision: D46654256 Pulled By: cbi42 fbshipit-source-id: 6b5827dae124f6f1fdc8cca2ac6f6fcd878830e1
…rue` (facebook#11340) Summary: When a user migrates to level compaction + `level_compaction_dynamic_level_bytes=true`, or when a DB shrinks, there can be unnecessary levels in the DB. Before this PR, this is no way to remove these levels except a manual compaction. These extra unnecessary levels make it harder to guarantee max_bytes_for_level_multiplier and can cause extra space amp. This PR boosts compaction score for these levels to allow RocksDB to automatically drain these levels. Together with facebook#11321, this makes migration to `level_compaction_dynamic_level_bytes=true` automatic without needing user to do a one time full manual compaction. Credit: this PR is modified from facebook#3921. Pull Request resolved: facebook#11340 Test Plan: - New unit tests - `python3 tools/db_crashtest.py whitebox --simple` which randomly sets level_compaction_dynamic_level_bytes in each run. Reviewed By: ajkr Differential Revision: D44563884 Pulled By: cbi42 fbshipit-source-id: e20d3620bd73dff22be18c5a91a07f340740bcc8 Signed-off-by: tabokie <[email protected]>
…rue` (facebook#11340) Summary: When a user migrates to level compaction + `level_compaction_dynamic_level_bytes=true`, or when a DB shrinks, there can be unnecessary levels in the DB. Before this PR, this is no way to remove these levels except a manual compaction. These extra unnecessary levels make it harder to guarantee max_bytes_for_level_multiplier and can cause extra space amp. This PR boosts compaction score for these levels to allow RocksDB to automatically drain these levels. Together with facebook#11321, this makes migration to `level_compaction_dynamic_level_bytes=true` automatic without needing user to do a one time full manual compaction. Credit: this PR is modified from facebook#3921. Pull Request resolved: facebook#11340 Test Plan: - New unit tests - `python3 tools/db_crashtest.py whitebox --simple` which randomly sets level_compaction_dynamic_level_bytes in each run. Reviewed By: ajkr Differential Revision: D44563884 Pulled By: cbi42 fbshipit-source-id: e20d3620bd73dff22be18c5a91a07f340740bcc8 Signed-off-by: tabokie <[email protected]>
Summary: When a user migrates to level compaction +
level_compaction_dynamic_level_bytes=true
, or when a DB shrinks, there can be unnecessary levels in the DB. Before this PR, this is no way to remove these levels except a manual compaction. These extra unnecessary levels make it harder to guarantee max_bytes_for_level_multiplier and can cause extra space amp. This PR boosts compaction score for these levels to allow RocksDB to automatically drain these levels. Together with #11321, this makes migration tolevel_compaction_dynamic_level_bytes=true
automatic without needing user to do a one time full manual compaction. Credit: this PR is modified from #3921.Test plan:
python3 tools/db_crashtest.py whitebox --simple
which randomly sets level_compaction_dynamic_level_bytes in each run.