-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize compaction strategy to avoid merging SST files without overlapping time ranges #3416
Comments
We compact small files with a large file. e.g.
becomes
We might stop compacting a window if the output file is too large. However, we need to choose files carefully since we remove deleted keys and the |
For active windows, we enforce max_sorted_runs (maybe apply to both L0 and L1) and max_files (apply to all files in active window or just L0 in active window?). If current runs exceed that value, we find a cheapest way to achieve that goal. For example, for an active window with max_sorted_runs=1 For inactive windows, consider that sometimes there will be out of order insertions When an active window shifts to inactive window, we check if there is only one sorted run in both l0 and l1 (this does not require another rountine, every compaction should check every time window in current implementation). |
We should ensure all overlapping files in the same time window are compacted into one sorted run. Since we only have 2 levels and always remove the delete tombstone after compaction. Otherwise, we should add a flag to the merger reader to control whether to remove the delete tombstone. We only filter deleted items when we ensure all overlapping files are compacted. |
Just like Cassandra's size-tiered strategy, tombstones are only dropped when all overlapping files are involved in current compaction. |
I also found a potential issue that compaction jobs might block flush jobs. While using object stores, a compaction job with lots of input files will result in a large amount of file purge tasks. Deleting an object in the object store is slower than in the local disk. These purge tasks use the same background scheduler as flush jobs and slow down flush jobs.
What's more, compaction jobs and flush jobs also use the same background worker pool and job limit. |
I created a PR #3621 for this. For flush and compaction jobs, I wonder if we should use dedicated schedulers. |
Geat job! |
@v0y4g3r Assign a doc task to you. GreptimeTeam/docs#1009 |
I'll add a separate doc section to describe how compaction works now in v0.9 |
What type of enhancement is this?
Performance
What does the enhancement do?
With the introduction of new merge tree memtable, we may find SST files locates in L0 can as large as several GiBs. In this case, merging L0 files regardless of their size can be expensive and do little with read performance. We should leverage the "tiered" merging manner that skips large files and those does overlap with others in terms of time ranges.
Tasks
filter_deleted
option to avoid removing deletion markers #3707Implementation challenges
No response
The text was updated successfully, but these errors were encountered: