Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7974] Adding support for empty cleans #12799

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nsivabalan
Copy link
Contributor

@nsivabalan nsivabalan commented Feb 6, 2025

Change Logs

  • Add configuration for periodically creating empty clean commits when there are no files to clean.
  • Added a new metadata entry to Clean plan keyed as "EarliestCommitToNotArchive". This will be leveraged by archival to ensure it will not proceed beyond the configured value for EarliestCommitToNotArchive. On rare scenarios, when cleaner and archival are executed separately, where cleaner is disabled and archival is executed, chances that archival could result in duplicates by archiving instants which are not yet cleaned by the cleaner. So, fixing that by adding this additional metadata.

Impact

Fixes an issue in append only tables where the archival does not kick in since the archival is now dependent on information in the cleaner metadata. w/o empty cleans, active timeline for append only tables kept on growing forever.

Pending:
Fixing TimelineArchiverV2 and adding tests.
Adding tests for TimelineArchiverV1

Risk level (write none, low medium or high below)

If medium or high, explain what verification was done to mitigate the risks.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

Co-authored-by: Surya Prasanna Kumar Yalla <[email protected]>
Co-authored-by: Timothy Brown <[email protected]>
@nsivabalan nsivabalan marked this pull request as draft February 6, 2025 17:53
@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Feb 6, 2025
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan marked this pull request as ready for review February 10, 2025 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L PR with lines of changes in (300, 1000]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants