Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle partition spec evolutions gracefully #202

Merged
merged 1 commit into from
Mar 26, 2024

Conversation

fqtab
Copy link
Contributor

@fqtab fqtab commented Mar 5, 2024

Currently the connector will get stuck if it ever encounters a batch of commit responses containing files with different partition specs.
This is because the append API currently does not support appending files with different partition specs.
There is an open PR to potentially address this limitation.
In the meantime, we can "hack" around this limitation by appending files with different specs as separate append operations in a single transaction.
It's not ideal but on balance, the trade-off should be worth it; partition spec evolution should not be a frequent event for most use cases.

@fqtab fqtab force-pushed the support_partition_spec_changes branch from 3c2e734 to d4ba143 Compare March 5, 2024 20:46
branch.ifPresent(appendOp::toBranch);
entry.getValue().forEach(appendOp::appendFile);
appendOp.set(COMMIT_ID_SNAPSHOT_PROP, commitState.currentCommitId().toString());
if (Objects.equals(entry.getKey(), maxSpecId)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only include the controlTopicOffsets and vtts properties in the last snapshot to be appended as part of this transaction. That's because we can only really be sure that these values are valid for the last snapshot to be appended.

AppendFiles appendOp = transaction.newAppend();
branch.ifPresent(appendOp::toBranch);
entry.getValue().forEach(appendOp::appendFile);
appendOp.set(COMMIT_ID_SNAPSHOT_PROP, commitState.currentCommitId().toString());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note however that we include the commit-id snapshot property on each snapshot we add as part of this transaction. We don't have to do this but it could be helpful for debugging purposes and it's easy enough to do.

@fqtab fqtab marked this pull request as ready for review March 5, 2024 20:59
@fqtab fqtab changed the title Support inflight partition evolution Handle partition spec evolutions gracefully Mar 5, 2024
@fqtab fqtab force-pushed the support_partition_spec_changes branch from d4ba143 to fa9e2fc Compare March 5, 2024 21:44
@fqtab fqtab marked this pull request as draft March 6, 2024 00:14
@fqtab fqtab force-pushed the support_partition_spec_changes branch 2 times, most recently from 08e91e0 to fc32130 Compare March 6, 2024 00:38
@fqtab fqtab force-pushed the support_partition_spec_changes branch from fc32130 to 38a46fb Compare March 6, 2024 20:45
@fqtab fqtab changed the base branch from main to refactor_channel_test_base March 6, 2024 20:46
@fqtab fqtab marked this pull request as ready for review March 6, 2024 20:57
@bryanck
Copy link
Contributor

bryanck commented Mar 9, 2024

Can you just include a patched version of Iceberg that includes your upstream fix, rather than working around the issue here?

Base automatically changed from refactor_channel_test_base to main March 11, 2024 14:44
@fqtab fqtab force-pushed the support_partition_spec_changes branch from 74f3692 to dbb96ef Compare March 16, 2024 13:18
@fqtab
Copy link
Contributor Author

fqtab commented Mar 16, 2024

Can you just include a patched version of Iceberg that includes your upstream fix, rather than working around the issue here?

That feels more hacky to me, I'd rather not resort to that just yet.
The upstream fix is a relatively fundamental change in iceberg-core, I'm still waiting for more experienced iceberg contributors to confirm it's safe, and it's not clear to me if the community will accept it.
I'd rather not end up in a situation where I have to maintain patches long-term.

Is you have a particular concern with this work around, please let me know! 😃
Some of our users have reported running into this issue recently and I consider this work-around to be an effective short-term solution whilst we hopefully towards a long-term solution in iceberg-core.

@fqtab fqtab merged commit cdd54f3 into main Mar 26, 2024
1 check passed
@fqtab fqtab deleted the support_partition_spec_changes branch March 26, 2024 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants