-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleaner should now use commit timeline and not include deltacommits #539
Conversation
how did you run into this ? |
Basically, the cleaner was retaining fewer versions for trips table before this. On debugging, realized with deltacommit being considered as part of the number of commits in the cleaner is the problem |
Looks like its touching lot more than just cleaner ?let me spend more time understanding, since its around timeline. |
Yeah, basically I introduced a new method in the timeline for allCommit and changed the existing one to allCommits. So, the changes you see across the files are just as a result of rename of the method. The new method is only used in the cleaner. Hope that helps |
@vinothchandar @bvaradar Please take a pass at this one as well when you get a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
/** | ||
* Get only the completed (no-inflights) commit timeline | ||
*/ | ||
public HoodieTimeline getCompletedCommitTimeline() { | ||
return metaClient.getCommitsTimeline().filterCompletedInstants(); | ||
return metaClient.getCommitTimeline().filterCompletedInstants(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess cleaner is still going to call this and you just changed other usages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
For MOR tables, there will be 2 commits, delta + commit (ideally for one following compaction). In this scenario, we'd have to double the number of commits to be retained. Ideally, each commit action decides how many versions should be kept, hence making the timeline to read only from "commit" action.