-
Notifications
You must be signed in to change notification settings - Fork 66
Support to specify a disk quota for intermediate files #446
Comments
by "check point file" you mean those "SST files" in the local backend? |
yes。 I use immediate files instead. |
Seems we can use https://pkg.go.dev/github.com/cockroachdb/pebble#DB.EstimateDiskUsage to fetch the disk usage. AbstractPeriodically, before a This will cause subsequent imports to suffer from range overlapping, which we have to accept as trade-off. Checkpoint validityThe flushing design must be compatible with checkpoints, that is no data will be lost if we Ctrl+C → resume in the middle of a process. Checkpoints may be earlier than the actual progress, so some data (process) duplication should be acceptable and ignored. Now let's consider the flush process:
Let's consider what happens regarding the place of interruption (I) and actual saved checkpoint (C): Case I=3, C<3Currently, with Local backend, a checkpoint is flushed only when the entire engine is written because Flush() is expensive (#326 (comment)). So the end of step 3 is a good point to save the checkpoint. If step 3's checkpoint is not recorded, we will restart from the beginning, while the engine contained some incomplete data. This makes us to hit step 2 quicker, and some "future" data will be ingested. But this is still fine since those duplicated KV in the future are ignored. Case I=4, C<4If step 4 is actually completed, all data will have been copied to TiKV. So whether C=1 (restart from scratch) or C=3 (import again) should be fine in terms of data, just slower. Case I=5, C<5If step 5 is actually completed, the local data is cleaned up. Starting from C=1 should be fine. Starting from C=3 or C=4 will lead to importing an empty database, which is also fine because the data are already sent to TiKV. Considering these, it should be fine to place a checkpoint immediately before flushing, importing and resetting the engine. Implementation
|
Also, I suggest we should maintain an approximate size which can be |
could you elaborate how this works? |
Feature Request
Is your feature request related to a problem? Please describe:
The lightning need a volume to save intermediate file, It’s hard to predict the size of this disk, So we must prepare as large a disk as possible to install these temporary files, For example , If I need to restore 2T data, we have to prepare a 2T volume for lightning, It is a bad experience to use on the cloud。
Describe the feature you'd like:
We want to specify the volume size and the checkpoint will exceed this size in the lightning process.
Describe alternatives you've considered:
none
Teachability, Documentation, Adoption, Optimization:
The text was updated successfully, but these errors were encountered: