Add cheaper compact strategy #2255

MalikHou · 2023-12-29T07:10:22Z

Which PikiwiDB functionalities are relevant/related to the feature request?

other

Description

Currently, pika's manual compact function monitors key operations and compacts the key separately when the set threshold is reached (this function only takes effect for four data structures containing fields: hash, set, zset, and list). However, there is additional overhead for key monitoring in design, and a lot of additional compacts may be generated, so it is possible to simplify the compact logic and add a new compact type - the longest unused file compact type.

Proposed solution

The benign effects of manual compaction are:

Remove tombstones & reduce space enlargement:
Tombstones generated by hash, set, zset, list;
Data rearrangement
Let the discrete arrangement of hash, set, zset, list be converted into a compact arrangement.

Learn from Lethe: A Tunable Delete-Aware LSM Engine and Constructing and Analyzing the LSM Compaction Design Space practices in two documents:
“A naïve way to choose file(s) is at random or by using a roundrobin policy [32, 35]. These data movement policies do not focus on optimizing for any particular performance metric, but help in reducing space amplification. To optimize for read throughput, many production data stores [30, 34] select the łcoldestž file(s) in a level once a compaction is triggered. Another common optimization goal is to minimize write amplification. In this policy, files with the least overlap with the target level are marked for compaction [13, 27]. To reduce space amplification, some storage engines choose files with the highest number of tombstones and/or updates [30]. Another delete-aware approach introduces a tombstone-age driven file picking policy that aims to timely persist logical deletes [51]. “

I suggest that pika’s new compact type can be solved by manually compacting the SSTable with the longest uncompacted state. This is actually an extension of the existing method of monitoring the number of tombstones.

Use the ability provided by rocksdb GetPropertiesOfAllTables to extract all sstable indicators to calculate file age and file deletion rate. When the file age is too large and has a certain deletion rate, the upper and lower boundaries of compact are determined as the start_key & end_key of the file. If the files are all young, then select the files with the highest deletion rate and perform compaction as above.

The benefits of doing this are:

No longer monitor the status of the key, and execute it regularly according to the state machine. The final dynamic result must be that the overlap of each layer is small;
Io is still controllable.

Alternatives considered

keep status quo~

The text was updated successfully, but these errors were encountered:

MalikHou added the ✏️ Feature New feature or request label Dec 29, 2023

AlexStocks assigned chenbt-hz Dec 29, 2023

This was referenced Mar 12, 2024

Malikhou/add compact strategy #2498

Closed

Add A New Compact Strategy For Pika MalikHou/pika#2

Open

QlQlqiqi added a commit to QlQlqiqi/pika that referenced this issue Jul 16, 2024

new compact strategy(OpenAtomFoundation#2255)

b60488e

QlQlqiqi mentioned this issue Jul 21, 2024

feat: a new strategy for auto compaction (ospp 2024) #2816

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cheaper compact strategy #2255

Add cheaper compact strategy #2255

MalikHou commented Dec 29, 2023 •

edited

Loading

Add cheaper compact strategy #2255

Add cheaper compact strategy #2255

Comments

MalikHou commented Dec 29, 2023 • edited Loading

Which PikiwiDB functionalities are relevant/related to the feature request?

Description

Proposed solution

Alternatives considered

MalikHou commented Dec 29, 2023 •

edited

Loading