-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: override __sequence
on creating SST to save space and CPU
#5252
Conversation
Signed-off-by: Ruihang Xia <[email protected]>
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Signed-off-by: Ruihang Xia <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5252 +/- ##
==========================================
- Coverage 83.93% 83.69% -0.24%
==========================================
Files 1203 1204 +1
Lines 224150 224613 +463
==========================================
- Hits 188143 187998 -145
- Misses 36007 36615 +608 |
good for merging ssts when ingesting them, better if the |
We can share the same sequence because here we assume we always dedup all rows in the SST in non append-only mode. |
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…eptimeTeam#5252) * override memtable sequence Signed-off-by: Ruihang Xia <[email protected]> * override sst sequence Signed-off-by: Ruihang Xia <[email protected]> * chore changes per to CR comments Signed-off-by: Ruihang Xia <[email protected]> * use correct sequence number Signed-off-by: Ruihang Xia <[email protected]> * wrap a method to get max sequence Signed-off-by: Ruihang Xia <[email protected]> * fix typo Signed-off-by: Ruihang Xia <[email protected]> --------- Signed-off-by: Ruihang Xia <[email protected]>
…eptimeTeam#5252) * override memtable sequence Signed-off-by: Ruihang Xia <[email protected]> * override sst sequence Signed-off-by: Ruihang Xia <[email protected]> * chore changes per to CR comments Signed-off-by: Ruihang Xia <[email protected]> * use correct sequence number Signed-off-by: Ruihang Xia <[email protected]> * wrap a method to get max sequence Signed-off-by: Ruihang Xia <[email protected]> * fix typo Signed-off-by: Ruihang Xia <[email protected]> --------- Signed-off-by: Ruihang Xia <[email protected]>
) * override memtable sequence Signed-off-by: Ruihang Xia <[email protected]> * override sst sequence Signed-off-by: Ruihang Xia <[email protected]> * chore changes per to CR comments Signed-off-by: Ruihang Xia <[email protected]> * use correct sequence number Signed-off-by: Ruihang Xia <[email protected]> * wrap a method to get max sequence Signed-off-by: Ruihang Xia <[email protected]> * fix typo Signed-off-by: Ruihang Xia <[email protected]> --------- Signed-off-by: Ruihang Xia <[email protected]>
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
The sequence is
u64
and is allocated with write request. They are different from row to row and thus has bad compression efficiency in both CPU and space. We can assign the greatest sequence number to the file to reduce this overhead while keeping this mechanism works.In TSBS test scenario with 100M rows, it can save 48MB of disk space. And in the metric monitor scenario, it can save up to 50%~60% in file size. We can also observe an (unstable) flush performance improvement.
PR Checklist
Please convert it to a draft if some of the following conditions are not met.