-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MetadataCleanUp may make the present versions not available #606
Comments
@YannByron could you explain a bit what's the behavior you expect? Making |
@zsxwing IMO, MetadataCleanup should keep the front checkpoint file (if it exists) closest to the smallest non-deleted version and the delta-log files(json) in between. So we should 'cleanup as much as possible' and 'guarantee available within expectations'. After all, users don't intend to clean up the Version 8 and 9 as in the above example. |
Rather that keep those files, making checkpoint for the smallest non-deleted version may be the better solution, so that we don't need to adjust |
@vkorukanti, is there any update about this issue? Thanks. |
@zsxwing, @vkorukanti |
MetadataCleanup will delete the expired delta logs(json and checkpoint.parquet). But if the present version depends on these which will be cleaned up, it cannot replay to the whole commits. For example: we have delta logs from version 0 to version 10 as following: 000.json ~ 009.json, 010.json, 010.checkpoint.parquet.
When Commit 10 is operated, MetadataCleanup works. If we assume the logs before 9 (not contained) should be cleaned up, then the rest files are: 009.json, 010.json, 010.checkpoint.parquet.
In fact, Version 9 is not available, and only Version 10 is shown by
desc history
.The text was updated successfully, but these errors were encountered: