Add support for changing row type in Iceberg #15808

ebyhr · 2023-01-23T09:47:04Z

Description

Add support for changing fields in row type in Iceberg. The supported operations:

Add a new field
Update an existing field type
Delete an existing field
Reorder fields

Release notes

(x) Release notes are required, with the following suggested text if this PR isn't merged in 406:

# Iceberg
* Add support for changing fields in `row` type . ({issue}`15808`)

Additionally, use SELECT instead of VALUES to handle row type correctly.

alexjo2144

All nitpicky things, looks good to me

alexjo2144 · 2023-01-24T16:06:08Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+        if (newType.isPrimitiveType()) {
+            return schema.updateColumn(name, newType.asPrimitiveType());
+        }
+        if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) {


It doesn't matter, functionality-wise, but can we make these checks symmetrical so that they both check the source and new type?

Suggested change

if (newType.isPrimitiveType()) {

return schema.updateColumn(name, newType.asPrimitiveType());

}

if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) {

if (sourceType.isPrimitiveTyple() && newType.isPrimitiveType()) {

return schema.updateColumn(name, newType.asPrimitiveType());

}

if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) {

alexjo2144 · 2023-01-24T16:07:54Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+        if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) {
+            // Add, update or delete fields
+            for (NestedField field : concat(sourceRowType.fields(), newRowType.fields())) {
+                if (sourceRowType.equals(newRowType)) {


Could move this outside of the loop, right?

alexjo2144 · 2023-01-24T16:19:01Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

                    .commit();
        }
        catch (RuntimeException e) {
            throw new TrinoException(ICEBERG_COMMIT_ERROR, "Failed to set column type: " + firstNonNull(e.getMessage(), e), e);
        }
    }

+    private static UpdateSchema buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schema)


Nit: I see that returning the UpdateSchema makes calling commit easier above, but personally find this easier to read.

Suggested change

private static UpdateSchema buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schema)

private static void buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schemaUpdate)

alexjo2144 · 2023-01-24T16:25:52Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

+        }
+        if (sourceType instanceof StructType sourceRowType && newType instanceof StructType newRowType) {
+            // Add, update or delete fields
+            for (NestedField field : concat(sourceRowType.fields(), newRowType.fields())) {


This will have duplicates for fields in both old and new type. Also doesn't matter functionality wise but would be nice not to have to iterate over them twice.

ebyhr · 2023-01-25T02:55:17Z

plugin/trino-sqlserver hit #12535

findepi · 2023-02-02T15:07:38Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergParquetConnectorTest.java

+                // TODO https://github.com/trinodb/trino/issues/15822 The connector returns incorrect NULL when a field in row type doesn't exist in Parquet files
+                return Optional.of(setup.withNewValueLiteral("NULL"));


Single-field row types aren't very common.
Let's make sure we have a test for renaming some, but not all fields in a row (eg have a row with two fields).

(We can keep this one too)

Sent #15957

cla-bot bot added the cla-signed label Jan 23, 2023

github-actions bot added the tests:hive label Jan 23, 2023

Add more row type test cases for changing column types

5c02c00

Additionally, use SELECT instead of VALUES to handle row type correctly.

ebyhr force-pushed the ebi/iceberg-set-data-type-complex branch from 7319d57 to 8d5a7fd Compare January 24, 2023 00:58

ebyhr self-assigned this Jan 24, 2023

ebyhr force-pushed the ebi/iceberg-set-data-type-complex branch from 8d5a7fd to 5ea6aa0 Compare January 24, 2023 09:18

alexjo2144 approved these changes Jan 24, 2023

View reviewed changes

Add support for changing row type in Iceberg

3096374

ebyhr force-pushed the ebi/iceberg-set-data-type-complex branch from 5ea6aa0 to 3096374 Compare January 25, 2023 02:14

ebyhr requested a review from findepi January 25, 2023 04:20

findepi approved these changes Jan 25, 2023

View reviewed changes

ebyhr merged commit 7cb8f4d into trinodb:master Jan 25, 2023

ebyhr deleted the ebi/iceberg-set-data-type-complex branch January 25, 2023 12:00

ebyhr added the no-release-notes This pull request does not require release notes entry label Jan 25, 2023

github-actions bot added this to the 406 milestone Jan 25, 2023

colebow mentioned this pull request Jan 25, 2023

Add Trino 406 release notes #15625

Merged

findepi reviewed Feb 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for changing row type in Iceberg #15808

Add support for changing row type in Iceberg #15808

ebyhr commented Jan 23, 2023 •

edited

Loading

alexjo2144 left a comment

alexjo2144 Jan 24, 2023

alexjo2144 Jan 24, 2023

alexjo2144 Jan 24, 2023

alexjo2144 Jan 24, 2023

ebyhr commented Jan 25, 2023

findepi Feb 2, 2023

ebyhr Feb 2, 2023

	private static UpdateSchema buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schema)
	private static void buildUpdateSchema(String name, Type sourceType, Type newType, UpdateSchema schemaUpdate)

		// TODO https://github.com/trinodb/trino/issues/15822 The connector returns incorrect NULL when a field in row type doesn't exist in Parquet files
		return Optional.of(setup.withNewValueLiteral("NULL"));

Add support for changing row type in Iceberg #15808

Add support for changing row type in Iceberg #15808

Conversation

ebyhr commented Jan 23, 2023 • edited Loading

Description

Release notes

alexjo2144 left a comment

Choose a reason for hiding this comment

alexjo2144 Jan 24, 2023

Choose a reason for hiding this comment

alexjo2144 Jan 24, 2023

Choose a reason for hiding this comment

alexjo2144 Jan 24, 2023

Choose a reason for hiding this comment

alexjo2144 Jan 24, 2023

Choose a reason for hiding this comment

ebyhr commented Jan 25, 2023

findepi Feb 2, 2023

Choose a reason for hiding this comment

ebyhr Feb 2, 2023

Choose a reason for hiding this comment

ebyhr commented Jan 23, 2023 •

edited

Loading