-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Fix buildup of RocksDB SST files #6606
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems both simple, and oddly scary
for (const auto& cf_name : kDomains) { | ||
if (cf_name != kEvents) { | ||
auto compact_status = compactFiles(cf_name); | ||
if (!compact_status.ok()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can't compact, is it safe to proceed? It seems weird to consider a success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes. Previously we were only automatically compacting within a background operation that would not effect the continued use. In my tests the only time this failed was if a compaction was concurrently scheduled. I'd rather let failures go here and instead catch more fatal failures when using the actual get/put/etc APIs.
|
||
if (!input_file_names.empty()) { | ||
auto s = db_->CompactFiles( | ||
rocksdb::CompactionOptions(), handle, input_file_names, level.level); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand https://github.com/facebook/rocksdb/wiki/Manual-Compaction and what this does. I'm specifically wondering about this be level.level
, that's the output level for the compaction, should it just be -1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this compaction call is to merge files of the same level and not actually gain benefits of compaction. Thus I only want to compact each level by level to that same level. We will then allow normal background/automatic compaction to make optimization decisions.
This changes how we build RocksDB such that the entire library is built (not just the "lite" version), which includes important SST file management utilities. One of these utils allows us to compact a range of smaller SST files.
Why does our usage create small SST files?
When we restart the database to flush large allocations, we flush or sync the data into an immutable SST. This happens behind the scenes as part of RocksDB's normal operating procedure. There's really no problem with this other than esthetics and occasionally lots of file IO if you build up a lot of buffered logs and then need to flush them quickly.
What does this change 'cost'?
Expect longer startup times and more initial I/O when starting as the immutable files will be scanned and checked. With multi-100MB files this should be only slightly noticeable, occur only if significant merging is required, and will complete in sub-1s wall time.
Should we still 'restart' the DB plugin?
I think so, yes, but we should do so at an hour interval instead of 5 minutes. This will not lead to small SST files as the flushed table with be compacted when it is opened. This will cause RocksDB to slow down compared to recommended usage, but will cause it to "speed up" compared to our current usage.
Are there other side effects?
I have checked that this is not creating more levels and that subsequent compactions are idempotent (if no data is written). So I feel relatively safe about this but more testing is welcomed. I will try to publish a follow up PR that adds a
--rocksdb_debug
hidden flag to enable options and stats printing.