Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API,Core: Support Conditional Commits #6513

Closed
wants to merge 2 commits into from

Conversation

fqaiser94
Copy link
Contributor

@fqaiser94 fqaiser94 commented Jan 2, 2023

Context

Adds support for committing changes to an iceberg table based on whether or not a condition is true at commit time.
Not before the commit.
Not after the commit.
At commit time.

This is useful in scenarios where users need a robust guard against potential concurrent commits. For example, some use cases require maintaining a monotonically increasing watermark in the snapshot properties. Our recently released iceberg-kafka-connect connector does this however it can only do this on a best-effort basis because Iceberg does not offer any API for expressing conditional commits. As a result, there is a risk of duplicate-file-appends there. This PR would enable closing that loophole.

For more history/context/usecases, please see the discussion in #6514.

Incidentally, Delta (the competing table format) offers a similar feature albeit through a much more restricted API: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#transaction-identifiers

API design

We need to introduce a new API to allow users to declare the conditions under which a commit is allowed to proceed or not. There are two main options here:

  1. Add a new void commitIf(List<Validation> validations) method to the PendingUpdate interface.
    1. See latest commit for implementation
    2. This offers a fluent-style API
  2. Add a new void validate(List<Validation> validations) method to the PendingUpdate interface.
    1. See first commit for implementation
    2. I feel this is simpler but the API is not as fluent as option 1 which some reviewers raised concerns about

Note

  • Please don't be put off by the size of the PR; 90% of it is purely tests.

@fqaiser94 fqaiser94 changed the title API: Add Commit Condition Check WIP API: Support Conditional Transaction Commits WIP Jan 2, 2023
@fqaiser94 fqaiser94 changed the title API: Support Conditional Transaction Commits WIP API: Support Conditional Commits - WIP Feb 6, 2023
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch 4 times, most recently from 47247aa to 3b28cb9 Compare February 7, 2023 03:32
@github-actions github-actions bot added the build label Feb 7, 2023
@fqaiser94 fqaiser94 marked this pull request as ready for review February 7, 2023 15:27
Copy link
Contributor Author

@fqaiser94 fqaiser94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue would love to get some feedback from you (or someone from your team) on this PR.

Only about 200 lines have changed on the main side, the rest is all just test side changes.
To make it easier to review, here's what I've done so far:

  • Introduced a new interface ValidatablePendingUpdate
  • Added an abstract class BaseValidatablePendingUpdate that implements the ValidatablePendingUpdate interface
  • Using the above, I've migrated the following interfaces/classes from PendingUpdate to ValidatablePendingUpdate so far:
    • UpdateProperties
      • PropertiesUpdate
    • ExpireSnapshots
      • RemoveSnapshots
    • SnapshotUpdate
      • SnapshotProducer
        • BaseOverwriteFiles
        • BaseReplacePartitions
        • BaseRewriteFiles
        • BaseRewriteManifests
        • BaseRowDelta
        • CherryPickOperation
        • FastAppend
        • MergeAppend
        • MergingSnapshotProducer
        • StreamingDelete
  • Also, modified the BaseTransaction class to be able handle ValidatablePendingUpdates.
    • Note: no changes were needed to the Transaction interface.

I can migrate the rest of the PendingUpdate implementors as well.
So far, I haven't found any PendingUpdate interface where it doesn't make sense to migrate it to the new ValidatablePendingInterface.
I would appreciate any feedback in the meantime on the current approach.

@nastra nastra self-requested a review February 7, 2023 17:59
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch from 0375f0e to 7aca17e Compare February 21, 2023 01:45
@fqaiser94 fqaiser94 changed the title API: Support Conditional Commits - WIP API: Support Conditional Commits Feb 22, 2023
@jackye1995 jackye1995 self-requested a review February 22, 2023 17:52
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch from 72de00d to 1936950 Compare February 28, 2023 00:41
@stevenzwu
Copy link
Contributor

@fqaiser94 I added a comment to the issue regarding the motivation use cases: #6514 (comment).

@fqaiser94
Copy link
Contributor Author

@stevenzwu sorry, I've responded in the issue now, let's continue the conversation there.

@fqaiser94 fqaiser94 changed the title API: Support Conditional Commits API,Core: Support Conditional Commits Mar 2, 2023
@fqaiser94
Copy link
Contributor Author

All comments have been addressed.
This is now ready for a second round of reviews.

@fqaiser94 fqaiser94 requested review from rdblue and removed request for nastra and jackye1995 March 2, 2023 23:58
@fqaiser94
Copy link
Contributor Author

fqaiser94 commented Mar 3, 2023

Sorry @nastra @jackye1995, I didn't mean to remove you both as reviewers.
For some reason, the UI won't even let me add you folks back as reviewers 😵‍💫
Please feel free to add yourselves back as reviewers.

@nastra nastra self-requested a review March 3, 2023 07:19
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch from a21af5f to 213800b Compare March 20, 2023 21:44
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch from 213800b to 1b875c5 Compare June 17, 2023 22:24
@fqaiser94
Copy link
Contributor Author

I took a little break from this PR because reviews were moving a little slowly and it seemed like this feature wasn't considered a high priority. I have since had/seen a couple of conversations with people interested in this feature and affirmed it's value so I'm thinking now might be a good time to try reviving this PR.

I've rebased the changes on top of latest master and addressed all of the existing comments. Please take a look :)

@fqaiser94 fqaiser94 requested review from nastra and rdblue June 18, 2023 00:06
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch from 1b875c5 to 03c375b Compare November 7, 2023 22:44
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch from 03c375b to 1407940 Compare March 4, 2024 21:31
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch from 1407940 to 55ad87f Compare June 3, 2024 02:48
Comment on lines +186 to +192
@Override
public void commitIf(List<Validation> validations) {
commitIfRefUpdatesExist();
// Add a no-op UpdateProperties to add given validations to transaction
transaction.updateProperties().commitIf(validations);
commit();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SnapshotManager is the only PendingUpdate implementation where I have to implement the commitIf method "by hand" i.e. I can't just extend BasePendingUpdate like all the other implementations. This is because of the way SnapshotManager is implemented in terms of Transaction which means I don't have access to any base TableMetadata to validate directly. Instead, I add a conditional, no-op UpdateProperties to the underlying transaction which then validates the current table state as part of the Transaction commit process.

@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch from 55ad87f to e048626 Compare July 29, 2024 17:38
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch 2 times, most recently from ac2acd9 to 74ebbcd Compare July 29, 2024 19:42
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 29, 2024
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch from 74ebbcd to 024385a Compare September 1, 2024 15:21
@github-actions github-actions bot removed the stale label Sep 2, 2024
Copy link

github-actions bot commented Oct 4, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Oct 4, 2024
@fqaiser94 fqaiser94 force-pushed the add-commit-condition-check branch from 024385a to 4baa137 Compare October 9, 2024 14:19
@nastra nastra dismissed their stale review October 9, 2024 16:48

outdated

@nastra nastra removed the stale label Oct 9, 2024
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 16, 2025
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants