Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for changing row type in Iceberg #15808

Merged
merged 2 commits into from
Jan 25, 2023

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Jan 23, 2023

Description

Add support for changing fields in row type in Iceberg. The supported operations:

  • Add a new field
  • Update an existing field type
  • Delete an existing field
  • Reorder fields

Release notes

(x) Release notes are required, with the following suggested text if this PR isn't merged in 406:

# Iceberg
* Add support for changing fields in `row` type . ({issue}`15808`)

Additionally, use SELECT instead of VALUES to handle row type correctly.
@ebyhr ebyhr force-pushed the ebi/iceberg-set-data-type-complex branch from 7319d57 to 8d5a7fd Compare January 24, 2023 00:58
@ebyhr ebyhr self-assigned this Jan 24, 2023
@ebyhr ebyhr force-pushed the ebi/iceberg-set-data-type-complex branch from 8d5a7fd to 5ea6aa0 Compare January 24, 2023 09:18
Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All nitpicky things, looks good to me

Comment on lines 1592 to 1600
if (newType.isPrimitiveType()) {
return schema.updateColumn(name, newType.asPrimitiveType());
}
if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter, functionality-wise, but can we make these checks symmetrical so that they both check the source and new type?

Suggested change
if (newType.isPrimitiveType()) {
return schema.updateColumn(name, newType.asPrimitiveType());
}
if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) {
if (sourceType.isPrimitiveTyple() && newType.isPrimitiveType()) {
return schema.updateColumn(name, newType.asPrimitiveType());
}
if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) {

if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) {
// Add, update or delete fields
for (NestedField field : concat(sourceRowType.fields(), newRowType.fields())) {
if (sourceRowType.equals(newRowType)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could move this outside of the loop, right?

.commit();
}
catch (RuntimeException e) {
throw new TrinoException(ICEBERG_COMMIT_ERROR, "Failed to set column type: " + firstNonNull(e.getMessage(), e), e);
}
}

private static UpdateSchema buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I see that returning the UpdateSchema makes calling commit easier above, but personally find this easier to read.

Suggested change
private static UpdateSchema buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schema)
private static void buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schemaUpdate)

}
if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) {
// Add, update or delete fields
for (NestedField field : concat(sourceRowType.fields(), newRowType.fields())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have duplicates for fields in both old and new type. Also doesn't matter functionality wise but would be nice not to have to iterate over them twice.

@ebyhr ebyhr force-pushed the ebi/iceberg-set-data-type-complex branch from 5ea6aa0 to 3096374 Compare January 25, 2023 02:14
@ebyhr
Copy link
Member Author

ebyhr commented Jan 25, 2023

plugin/trino-sqlserver hit #12535

@ebyhr ebyhr requested a review from findepi January 25, 2023 04:20
@ebyhr ebyhr merged commit 7cb8f4d into trinodb:master Jan 25, 2023
@ebyhr ebyhr deleted the ebi/iceberg-set-data-type-complex branch January 25, 2023 12:00
@ebyhr ebyhr added the no-release-notes This pull request does not require release notes entry label Jan 25, 2023
@github-actions github-actions bot added this to the 406 milestone Jan 25, 2023
Comment on lines +83 to +84
// TODO https://github.com/trinodb/trino/issues/15822 The connector returns incorrect NULL when a field in row type doesn't exist in Parquet files
return Optional.of(setup.withNewValueLiteral("NULL"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-field row types aren't very common.
Let's make sure we have a test for renaming some, but not all fields in a row (eg have a row with two fields).

(We can keep this one too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent #15957

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

3 participants