You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Which PikiwiDB functionalities are relevant/related to the feature request?
other
Description
Currently, pika's manual compact function monitors key operations and compacts the key separately when the set threshold is reached (this function only takes effect for four data structures containing fields: hash, set, zset, and list). However, there is additional overhead for key monitoring in design, and a lot of additional compacts may be generated, so it is possible to simplify the compact logic and add a new compact type - the longest unused file compact type.
Proposed solution
The benign effects of manual compaction are:
Remove tombstones & reduce space enlargement:
Tombstones generated by hash, set, zset, list;
Data rearrangement
Let the discrete arrangement of hash, set, zset, list be converted into a compact arrangement.
Learn from Lethe: A Tunable Delete-Aware LSM Engine and Constructing and Analyzing the LSM Compaction Design Space practices in two documents:
“A naïve way to choose file(s) is at random or by using a roundrobin policy [32, 35]. These data movement policies do not focus on optimizing for any particular performance metric, but help in reducing space amplification. To optimize for read throughput, many production data stores [30, 34] select the łcoldestž file(s) in a level once a compaction is triggered. Another common optimization goal is to minimize write amplification. In this policy, files with the least overlap with the target level are marked for compaction [13, 27]. To reduce space amplification, some storage engines choose files with the highest number of tombstones and/or updates [30]. Another delete-aware approach introduces a tombstone-age driven file picking policy that aims to timely persist logical deletes [51]. “
I suggest that pika’s new compact type can be solved by manually compacting the SSTable with the longest uncompacted state. This is actually an extension of the existing method of monitoring the number of tombstones.
Use the ability provided by rocksdb GetPropertiesOfAllTables to extract all sstable indicators to calculate file age and file deletion rate. When the file age is too large and has a certain deletion rate, the upper and lower boundaries of compact are determined as the start_key & end_key of the file. If the files are all young, then select the files with the highest deletion rate and perform compaction as above.
The benefits of doing this are:
No longer monitor the status of the key, and execute it regularly according to the state machine. The final dynamic result must be that the overlap of each layer is small;
Io is still controllable.
Alternatives considered
keep status quo~
The text was updated successfully, but these errors were encountered:
Which PikiwiDB functionalities are relevant/related to the feature request?
other
Description
Currently, pika's manual compact function monitors key operations and compacts the key separately when the set threshold is reached (this function only takes effect for four data structures containing fields: hash, set, zset, and list). However, there is additional overhead for key monitoring in design, and a lot of additional compacts may be generated, so it is possible to simplify the compact logic and add a new compact type - the longest unused file compact type.
Proposed solution
The benign effects of manual compaction are:
Tombstones generated by hash, set, zset, list;
Let the discrete arrangement of hash, set, zset, list be converted into a compact arrangement.
Learn from Lethe: A Tunable Delete-Aware LSM Engine and Constructing and Analyzing the LSM Compaction Design Space practices in two documents:
“A naïve way to choose file(s) is at random or by using a roundrobin policy [32, 35]. These data movement policies do not focus on optimizing for any particular performance metric, but help in reducing space amplification. To optimize for read throughput, many production data stores [30, 34] select the łcoldestž file(s) in a level once a compaction is triggered. Another common optimization goal is to minimize write amplification. In this policy, files with the least overlap with the target level are marked for compaction [13, 27]. To reduce space amplification, some storage engines choose files with the highest number of tombstones and/or updates [30]. Another delete-aware approach introduces a tombstone-age driven file picking policy that aims to timely persist logical deletes [51]. “
I suggest that pika’s new compact type can be solved by manually compacting the SSTable with the longest uncompacted state. This is actually an extension of the existing method of monitoring the number of tombstones.
Use the ability provided by rocksdb
GetPropertiesOfAllTables
to extract all sstable indicators to calculate file age and file deletion rate. When the file age is too large and has a certain deletion rate, the upper and lower boundaries of compact are determined as thestart_key
&end_key
of the file. If the files are all young, then select the files with the highest deletion rate and perform compaction as above.The benefits of doing this are:
Alternatives considered
keep status quo~
The text was updated successfully, but these errors were encountered: