New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add Iceberg $properties system table #10480

Merged

findepi merged 1 commit into trinodb:master from Buktoria:iceberg-add-properties-table

Jan 17, 2022

Member

Buktoria commented Jan 5, 2022 •

edited

Loading

Iceberg supports a collection of system tables for access metadata. Unlike when using Iceberg with Spark where you have full access to the Iceberg Java API, tooling dependent on Trino do not have a way of getting the tables properties of a given table.

My team has found it useful to have this "Properties" table so that we can inspect the tables properties via Trino. More specifically we have DBT based modeling framework written in Python that utilizes Trino as its engine. It relies on specific metadata that we are currently storing in the table properties of our Iceberg tables. Iceberg's Python API is lacking in features and support, thus didn't want to depend on it yet. These few small changes to add this "Properties" table to Trino was ideal solution to our predicament.

It's also important to note that the original Iceberg Spec does not include such a table in Spark SQL, though I do not see a huge necessity for it given the Java API which can be imported into a Spark based application.

cla-bot bot added the cla-signed label

Buktoria changed the title ~~added Iceberg Properties tables to access table properties~~ added Iceberg "Properties" table to access table properties

Buktoria force-pushed the iceberg-add-properties-table branch 3 times, most recently from 8edea11 to 50df67b Compare

January 6, 2022 14:40

Buktoria marked this pull request as ready for review

January 6, 2022 16:14

ckdarby mentioned this pull request

Document special tables exposed by Iceberg #10514

Merged

Contributor

findinpath commented Jan 11, 2022

Please follow the guideline https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#commits-and-pull-requests regarding the commit comment.

findinpath reviewed

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java Outdated

    
                  {

                      assertUpdate("CREATE TABLE test_iceberg_get_table_props (x BIGINT) WITH (format='PARQUET')");

                      MaterializedResult result = computeActual("SELECT * FROM \"test_iceberg_get_table_props$properties\"");

                      assertEquals(result.getRowCount(), 1);

Contributor

findinpath Jan 11, 2022 •

edited

Loading

A more concise of doing these checks would be

        assertThat(query("SELECT * FROM \"test_iceberg_get_table_props$properties\""))
                .matches(format("VALUES (VARCHAR 'write.format.default', VARCHAR '%s')", format.name()));

findinpath reviewed

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java Outdated

    
                  @Test

                  public void testGetTableProperties()

                  {

                      assertUpdate("CREATE TABLE test_iceberg_get_table_props (x BIGINT) WITH (format='PARQUET')");

Contributor

findinpath Jan 11, 2022

Please drop WITH (format='PARQUET').
If you have a look at https://github.com/trinodb/trino/blob/master/plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java#L138 you notice that the file format is specified at the connector level to be either ORC or PARQUET depending on the concrete test class in which the test is executed.

findinpath reviewed

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java Show resolved Hide resolved

findinpath suggested changes

View reviewed changes

Contributor

findinpath left a comment

Improve the readability of the changes in BaseIcebergConnectorTest

findinpath requested review from losipiuk and findepi

January 11, 2022 11:41

findepi approved these changes

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java Outdated

    
                  @Test

                  public void testGetTableProperties()

                  {

                      assertUpdate("CREATE TABLE test_iceberg_get_table_props (x BIGINT) WITH (format='PARQUET')");

Member

findepi Jan 11, 2022

Add partitioning to show the $properties is not the contents of the WITH (...) clause.

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java Show resolved Hide resolved

Member

findepi commented Jan 11, 2022

Please change commit and PR title to Add Iceberg $properties system table.

Member

findepi commented Jan 11, 2022

cc @kokosing for access checks if any.

kokosing reviewed

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

    
            @@ -238,6 +238,8 @@ public IcebergTableHandle getTableHandle(ConnectorSession session, SchemaTableNa
          
                              return Optional.of(new ManifestsTable(systemTableName, table, getSnapshotId(table, name.getSnapshotId())));

                          case FILES:

                              return Optional.of(new FilesTable(systemTableName, typeManager, table, getSnapshotId(table, name.getSnapshotId())));

                          case PROPERTIES:

Member

kokosing Jan 11, 2022

Why do we need this table at all? Why not to model table properties as table properties (io.trino.spi.connector.ConnectorTableMetadata#properties)?

Member Author

Buktoria Jan 14, 2022

"TableProperties" seems to be an overloaded term in Trino. The ConnectorTableMetadata#properties seems only to have three options for properties, which are required for tables creation (format, partitioning, and location).

Iceberg has a large list of configuration that can be set on a table that have different behaviour, either to readers / writers / hive / etc (Iceberg Configuration). This configuration is on the Iceberg Table properties, which exist in the Iceberg Metadata file, which is ledger that is pointed to via the catalog (in our case hive).

properties | A string to string map of table properties. This is used to control settings that affect reading and writing and is not intended to be used for arbitrary metadata. For example, commit.retry.num-retries is used to control the number of commit retries.

These properties can be updated via the updateProperties operation in the Iceberg api (list of api operations)

Given that the ConnectorTableMetadata#properties functionality seems to limit the properties intentionally, it did not seems appropriate to change its that behaviour. Instead I took the approach of creating a "systems table" to dig out the properties from the iceberg metadata file.

In addition, when we met with Iceberg folks about how we intend to work with Iceberg. They agreed with us putting custom entries in the Table Properties that are pivotal to data interpretation internally at Shopify. Meaning that data we put in there is not "arbitrary", but required. Given that fact, we need the table to be unrestricted in the configs it returns.

Member

kokosing commented Jan 11, 2022

for access checks if any.

This PR avoid access checks, which is fine. Notice to get the access to t$properties typically you need an explicit grant to that table. Access to just t does not give you an access to system tables for that table.

Contributor

findinpath commented Jan 12, 2022

My team has found it useful to have this "Properties" table so that we can inspect the tables properties via Trino. More specifically we have DBT based modeling framework written in Python that utilizes Trino as its engine. It relies on specific metadata that we are currently storing in the table properties of our Iceberg tables.

@Buktoria is your request somehow related to https://github.com/starburstdata/dbt-trino ? If yes, can you share some details on how you plan to use it with this new feature?

Buktoria changed the title ~~added Iceberg "Properties" table to access table properties~~ Add Iceberg $properties system table

Buktoria force-pushed the iceberg-add-properties-table branch from 50df67b to c27c243 Compare

January 14, 2022 23:12

Member Author

Buktoria commented Jan 14, 2022 •

edited

Loading

This is the first time of me hearing about dbt-trino. I need to look into it further.

Our current use case though is that our various modeling tools need some information about the dataset to help them interpret the table. We currently have Iceberg tables that represent different kinds of data,
ex: CDC, schematized_kafka_events, non_schematized_kafka_events, etc.

Also information about the sensitivity of the table, and if privacy enforcement currently running on it, as well data retention,
ex:

privacy.sensitive=true/false
privacy.enforced=true/false
privacy.retention=P30D
prviacy.retention.field=<field which is used to purge data by>
etc.

Our Spark based modeling tool utilizes the iceberg api to get this information stored in the table properties. catalog.loadTable(...).properties.

Our other modeling tool that right now is using Python + DBT to format SQL queries to run on Trino also needs access to the metadata stored in the properties. Given that the Python Iceberg API is not fully fleshed out, and in our teams opinion not reliable at the moment, an alternative was to have the same information accessible via Trino through a query. This also had the additional benefit of not adding an additional dependancy.

Thus we landed here, providing direct access to the Iceberg Table properties that are found in the Metadata File.

github-actions bot added the tests:hive label

Member

findepi commented Jan 15, 2022

@Buktoria can you please check the build results?
seems some checks failed.

Contributor

findinpath commented Jan 15, 2022

Pls squash the commits

findinpath reviewed

View reviewed changes

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java Outdated

    
                              storageFormat.toString(),

                              specVersion));

                      assertThat(onTrino().executeQuery(format("SELECT" +

Contributor

findinpath Jan 15, 2022

format is not needed

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

    
                              .containsOnly(

                                      row("custom.table-property", "my_custom_value"),

                                      row("write.format.default", storageFormat.name()),

                                      row("owner", "hive"));

Contributor

findinpath Jan 15, 2022

onSpark().executeQuery("DROP TABLE " + sparkTableName);

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java Outdated

    
                              specVersion));

                      assertThat(onTrino().executeQuery(format("SELECT" +

                              "  key" +

Contributor

findinpath Jan 15, 2022

There is no need for extra new lines for the sql statement.

...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java

    
                      String sparkTableName = sparkTableName(baseTableName);

                      String trinoPropertiesTableName = trinoTableName(propertiesTableName);

                      onSpark().executeQuery(format(

Contributor

findinpath Jan 15, 2022

onSpark().executeQuery("DROP TABLE IF EXISTS " + sparkTableName);


          Add Iceberg $properties system table

747bcf1

A systems table by the name of 'properties' has been added, which
provides a querable table that represents the properties from the
Iceberg Data File of the table (this is the ledger of the table).
The contents of the properties can chanage over time.

Buktoria force-pushed the iceberg-add-properties-table branch from 6917aa6 to 747bcf1 Compare

January 15, 2022 21:45

findinpath approved these changes

View reviewed changes

findepi merged commit 8abb12f into trinodb:master

findepi mentioned this pull request

Release notes for 369 #10552

Closed

github-actions bot added this to the 369 milestone

Buktoria deleted the iceberg-add-properties-table branch

January 17, 2022 14:02

mosabua mentioned this pull request

Add Trino 369 release notes #10553

Merged

MichaelTiemannOSC mentioned this pull request

Upgrade Trino to release 371 os-climate/os_c_data_commons#116

Closed

findinpath mentioned this pull request

System tables writes are not properly filtered out #10701

Open

manishmalhotrawork mentioned this pull request

Add support for changing Iceberg table properties #11190

Closed

findinpath mentioned this pull request

Add properties metadata table for Delta Lake tables #17294

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels