-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API,Core: Support Conditional Commits #6513
Conversation
47247aa
to
3b28cb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue would love to get some feedback from you (or someone from your team) on this PR.
Only about 200 lines have changed on the main
side, the rest is all just test
side changes.
To make it easier to review, here's what I've done so far:
- Introduced a new
interface ValidatablePendingUpdate
- Added an
abstract class BaseValidatablePendingUpdate
that implements theValidatablePendingUpdate
interface - Using the above, I've migrated the following interfaces/classes from
PendingUpdate
toValidatablePendingUpdate
so far:UpdateProperties
PropertiesUpdate
ExpireSnapshots
RemoveSnapshots
SnapshotUpdate
SnapshotProducer
BaseOverwriteFiles
BaseReplacePartitions
BaseRewriteFiles
BaseRewriteManifests
BaseRowDelta
CherryPickOperation
FastAppend
MergeAppend
MergingSnapshotProducer
StreamingDelete
- Also, modified the
BaseTransaction
class to be able handleValidatablePendingUpdates
.- Note: no changes were needed to the
Transaction
interface.
- Note: no changes were needed to the
I can migrate the rest of the PendingUpdate
implementors as well.
So far, I haven't found any PendingUpdate
interface where it doesn't make sense to migrate it to the new ValidatablePendingInterface
.
I would appreciate any feedback in the meantime on the current approach.
api/src/main/java/org/apache/iceberg/ValidatablePendingUpdate.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/ValidatablePendingUpdate.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/ValidatablePendingUpdate.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/ValidatablePendingUpdate.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/BaseValidatablePendingUpdate.java
Outdated
Show resolved
Hide resolved
0375f0e
to
7aca17e
Compare
api/src/main/java/org/apache/iceberg/BaseValidatablePendingUpdate.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/BaseValidatablePendingUpdate.java
Outdated
Show resolved
Hide resolved
72de00d
to
1936950
Compare
@fqaiser94 I added a comment to the issue regarding the motivation use cases: #6514 (comment). |
@stevenzwu sorry, I've responded in the issue now, let's continue the conversation there. |
All comments have been addressed. |
Sorry @nastra @jackye1995, I didn't mean to remove you both as reviewers. |
a21af5f
to
213800b
Compare
213800b
to
1b875c5
Compare
I took a little break from this PR because reviews were moving a little slowly and it seemed like this feature wasn't considered a high priority. I have since had/seen a couple of conversations with people interested in this feature and affirmed it's value so I'm thinking now might be a good time to try reviving this PR. I've rebased the changes on top of latest master and addressed all of the existing comments. Please take a look :) |
1b875c5
to
03c375b
Compare
03c375b
to
1407940
Compare
1407940
to
55ad87f
Compare
@Override | ||
public void commitIf(List<Validation> validations) { | ||
commitIfRefUpdatesExist(); | ||
// Add a no-op UpdateProperties to add given validations to transaction | ||
transaction.updateProperties().commitIf(validations); | ||
commit(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SnapshotManager
is the only PendingUpdate
implementation where I have to implement the commitIf
method "by hand" i.e. I can't just extend BasePendingUpdate
like all the other implementations. This is because of the way SnapshotManager
is implemented in terms of Transaction
which means I don't have access to any base TableMetadata
to validate
directly. Instead, I add a conditional, no-op UpdateProperties
to the underlying transaction which then validates the current table state as part of the Transaction
commit process.
55ad87f
to
e048626
Compare
ac2acd9
to
74ebbcd
Compare
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
74ebbcd
to
024385a
Compare
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
024385a
to
4baa137
Compare
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Context
Adds support for committing changes to an iceberg table based on whether or not a condition is true at commit time.
Not before the commit.
Not after the commit.
At commit time.
This is useful in scenarios where users need a robust guard against potential concurrent commits. For example, some use cases require maintaining a monotonically increasing watermark in the snapshot properties. Our recently released iceberg-kafka-connect connector does this however it can only do this on a best-effort basis because Iceberg does not offer any API for expressing conditional commits. As a result, there is a risk of duplicate-file-appends there. This PR would enable closing that loophole.
For more history/context/usecases, please see the discussion in #6514.
Incidentally, Delta (the competing table format) offers a similar feature albeit through a much more restricted API: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#transaction-identifiers
API design
We need to introduce a new API to allow users to declare the conditions under which a commit is allowed to proceed or not. There are two main options here:
void commitIf(List<Validation> validations)
method to thePendingUpdate
interface.void validate(List<Validation> validations)
method to thePendingUpdate
interface.Note