-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for changing row type in Iceberg #15808
Add support for changing row type in Iceberg #15808
Conversation
Additionally, use SELECT instead of VALUES to handle row type correctly.
7319d57
to
8d5a7fd
Compare
8d5a7fd
to
5ea6aa0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All nitpicky things, looks good to me
if (newType.isPrimitiveType()) { | ||
return schema.updateColumn(name, newType.asPrimitiveType()); | ||
} | ||
if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter, functionality-wise, but can we make these checks symmetrical so that they both check the source and new type?
if (newType.isPrimitiveType()) { | |
return schema.updateColumn(name, newType.asPrimitiveType()); | |
} | |
if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) { | |
if (sourceType.isPrimitiveTyple() && newType.isPrimitiveType()) { | |
return schema.updateColumn(name, newType.asPrimitiveType()); | |
} | |
if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) { |
if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) { | ||
// Add, update or delete fields | ||
for (NestedField field : concat(sourceRowType.fields(), newRowType.fields())) { | ||
if (sourceRowType.equals(newRowType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could move this outside of the loop, right?
.commit(); | ||
} | ||
catch (RuntimeException e) { | ||
throw new TrinoException(ICEBERG_COMMIT_ERROR, "Failed to set column type: " + firstNonNull(e.getMessage(), e), e); | ||
} | ||
} | ||
|
||
private static UpdateSchema buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I see that returning the UpdateSchema makes calling commit
easier above, but personally find this easier to read.
private static UpdateSchema buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schema) | |
private static void buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schemaUpdate) |
} | ||
if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) { | ||
// Add, update or delete fields | ||
for (NestedField field : concat(sourceRowType.fields(), newRowType.fields())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will have duplicates for fields in both old and new type. Also doesn't matter functionality wise but would be nice not to have to iterate over them twice.
5ea6aa0
to
3096374
Compare
|
// TODO https://github.com/trinodb/trino/issues/15822 The connector returns incorrect NULL when a field in row type doesn't exist in Parquet files | ||
return Optional.of(setup.withNewValueLiteral("NULL")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single-field row types aren't very common.
Let's make sure we have a test for renaming some, but not all fields in a row
(eg have a row with two fields).
(We can keep this one too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent #15957
Description
Add support for changing fields in
row
type in Iceberg. The supported operations:Release notes
(x) Release notes are required, with the following suggested text if this PR isn't merged in 406: