Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Delta Lake $properties system table #17592

Merged
merged 1 commit into from
Jul 19, 2023

Conversation

jkylling
Copy link
Contributor

@jkylling jkylling commented May 22, 2023

Description

Adds a $properties system table for Delta Lake, similar to what exists for Iceberg.

The main motivation for adding the $properties system table is to be able to test extra_properties which will be added as part of #17428

Fixes #17294

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake 
* Add `$properties` system table which can be queried to inspect Delta Lake table properties. ({issue}`17294`)

@cla-bot cla-bot bot added the cla-signed label May 22, 2023
@jkylling jkylling added the delta-lake Delta Lake connector label May 22, 2023
@jkylling jkylling requested a review from findinpath May 22, 2023 13:29
@ebyhr ebyhr self-requested a review May 22, 2023 23:14
@jkylling jkylling requested a review from ebyhr May 23, 2023 08:58
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkylling jkylling force-pushed the delta-lake-properties-table branch 2 times, most recently from 7c36524 to e5b8e68 Compare May 23, 2023 13:20
@jkylling
Copy link
Contributor Author

The failing product test seems to be unrelated. The test io.trino.tests.product.deltalake.TestDeltaLakeColumnMappingMode.testColumnMappingModeNameAddColumn fails because of

io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: Query failed (#20221125_205454_00169_t7w8x): Error opening Hive split s3://trino-ci-test/databricks-compatibility-test-test_dl_column_mapping_mode_add_co
lumn_3cay48eeem/another_varchar=new%20column/part-00000-ca7d4a38-5908-4690-98cd-c627d12f7bda.c000.snappy.parquet (offset=0, length=470): com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: N
oSuchKey; Request ID: 172AEF45AE130D2C; S3 Extended Request ID: 10f54869-4c1f-453c-af65-14792dd45501; Proxy: null), S3 Extended Request ID: 10f54869-4c1f-453c-af65-14792dd45501 (Path: s3://trino-ci-test/databricks-compatibility-test-test_dl_column_mapping_mode_add_column_3c
ay48eeem/another_varchar=new%20column/part-00000-ca7d4a38-5908-4690-98cd-c627d12f7bda.c000.snappy.parquet)

@jkylling jkylling requested a review from ebyhr May 23, 2023 16:10
@ebyhr
Copy link
Member

ebyhr commented May 24, 2023

Please check CI failure.

@jkylling jkylling force-pushed the delta-lake-properties-table branch 2 times, most recently from c443e16 to a9112b2 Compare May 24, 2023 10:04
@jkylling jkylling requested a review from ebyhr May 24, 2023 13:26
@jkylling
Copy link
Contributor Author

@ebyhr CI is green for this PR again. Would you have time for another review?

``$properties`` table
~~~~~~~~~~~~~~~~~~~~~

The ``$properties`` table provides access to general information about Delta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend keeping things simple and mention that the table gives access to the properties names and values of the Delta Lake table. See https://trino.io/docs/current/connector/hive.html#metadata-tables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the wording slightly to distinguish between table configuration (minReaderVersion, etc.), table features (items of the features field of protocol entries), and table properties (custom properties). Arguably this is an implementation detail of Delta. Open to other ways to phrase this.

@jkylling jkylling force-pushed the delta-lake-properties-table branch from a9112b2 to 8cd28fd Compare June 5, 2023 18:05
@@ -877,6 +877,24 @@ public void testViewReferencingHiveAndDeltaTable(boolean legacyHiveViewTranslati
}
}

@Test(groups = {DELTA_LAKE_DATABRICKS, DELTA_LAKE_OSS, PROFILE_SPECIFIC_TESTS})
@Flaky(issue = DATABRICKS_COMMUNICATION_FAILURE_ISSUE, match = DATABRICKS_COMMUNICATION_FAILURE_MATCH)
public void testDeltaToHivePropertiesRedirect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality has worked also before this PR.

We should be testing the other way around

test_redirect_to_delta_properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified both tests to verify redirection to the Delta table $properties table.

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comment

@findinpath
Copy link
Contributor

spark-sql> alter table tiny.table1 set tblproperties (delta.minReaderVersion = 1, delta.minWriterVersion = 2, someStringProperty = 'someStringValue');
trino:tiny> select * from "table1$properties";
          key           |      value      
------------------------+-----------------
 someStringProperty     | someStringValue 
 delta.minReaderVersion | 1               
 delta.minWriterVersion | 2               
(3 rows)

It is worth showcasing that the $properties table can read properties names and values with case sensitive.

@jkylling jkylling force-pushed the delta-lake-properties-table branch from 8cd28fd to d20f38a Compare June 7, 2023 18:43
Comment on lines 54 to 57
ImmutableList.<ColumnMetadata>builder()
.add(new ColumnMetadata("key", VARCHAR))
.add(new ColumnMetadata("value", VARCHAR))
.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this list of ColumnMetadata to the constant.

import java.util.List;

import static io.trino.tempto.assertions.QueryAssert.Row.row;
import static io.trino.tempto.assertions.QueryAssert.assertThat;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is deprecated. Please use AssertJ's assertThat.

@jkylling jkylling force-pushed the delta-lake-properties-table branch from b7c19d8 to d29c4e4 Compare June 28, 2023 09:31
@jkylling jkylling force-pushed the delta-lake-properties-table branch from d29c4e4 to 184162c Compare June 28, 2023 10:16
@ebyhr
Copy link
Member

ebyhr commented Jun 28, 2023

/test-with-secrets sha=184162cedd6b5ca986deca803ee472f5c1d35222

@github-actions
Copy link

github-actions bot commented Jun 28, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5400162270

@ebyhr
Copy link
Member

ebyhr commented Jun 28, 2023

Please fix CI failure with secrets.

@jkylling
Copy link
Contributor Author

This is some matrix of failures to fix:

suite-delta-lake-databricks73

org.apache.spark.sql.AnalysisException: Unknown configuration was specified: delta.minWriterVersion

 java.lang.AssertionError: 
2023-06-28T12:04:09.2752389Z tests               | Expecting actual:
2023-06-28T12:04:09.2752797Z tests               |   [["Foo", "Baz"],
2023-06-28T12:04:09.2753191Z tests               |     ["foo", "bar"],
2023-06-28T12:04:09.2753624Z tests               |     ["delta.minReaderVersion", "1"],
2023-06-28T12:04:09.2754120Z tests               |     ["delta.minWriterVersion", "2"]]
2023-06-28T12:04:09.2754594Z tests               | to contain exactly in any order:
2023-06-28T12:04:09.2755119Z tests               |   [["Type", "EXTERNAL"], ["Foo", "Baz"], ["foo", "bar"]]
2023-06-28T12:04:09.2755571Z tests               | elements not found:
2023-06-28T12:04:09.2755990Z tests               |   [["Type", "EXTERNAL"]]
2023-06-28T12:04:09.2756405Z tests               | and elements not expected:
2023-06-28T12:04:09.2756949Z tests               |   [["delta.minReaderVersion", "1"], ["delta.minWriterVersion", "2"]]

suite-delta-lake-databricks91

java.lang.IllegalArgumentException: requirement failed: delta.minWriterVersion needs to be an integer between [1, 5].

    | java.lang.AssertionError: 
2023-06-28T12:28:38.9760862Z tests               | Expecting actual:
2023-06-28T12:28:38.9761191Z tests               |   [["Foo", "Baz"],
2023-06-28T12:28:38.9761506Z tests               |     ["foo", "bar"],
2023-06-28T12:28:38.9761856Z tests               |     ["delta.minReaderVersion", "1"],
2023-06-28T12:28:38.9762236Z tests               |     ["delta.minWriterVersion", "2"]]
2023-06-28T12:28:38.9762608Z tests               | to contain exactly in any order:
2023-06-28T12:28:38.9762959Z tests               |   [["Type", "EXTERNAL"],
2023-06-28T12:28:38.9763275Z tests               |     ["Foo", "Baz"],
2023-06-28T12:28:38.9763610Z tests               |     ["delta.minWriterVersion", "2"],
2023-06-28T12:28:38.9764090Z tests               |     ["delta.minReaderVersion", "1"],
2023-06-28T12:28:38.9764410Z tests               |     ["foo", "bar"]]
2023-06-28T12:28:38.9764761Z tests               | but could not find the following elements:
2023-06-28T12:28:38.9765122Z tests               |   [["Type", "EXTERNAL"]]

suite-delta-lake-databricks104

java.lang.IllegalArgumentException: requirement failed: delta.minWriterVersion needs to be an integer between [1, 6].

     | Expecting actual:
2023-06-28T12:32:26.3006668Z tests               |   [["Foo", "Baz"],
2023-06-28T12:32:26.3007054Z tests               |     ["foo", "bar"],
2023-06-28T12:32:26.3007494Z tests               |     ["delta.minReaderVersion", "1"],
2023-06-28T12:32:26.3007990Z tests               |     ["delta.minWriterVersion", "2"]]
2023-06-28T12:32:26.3008451Z tests               | to contain exactly in any order:
2023-06-28T12:32:26.3008865Z tests               |   [["Foo", "Baz"],
2023-06-28T12:32:26.3009224Z tests               |     ["Type", "EXTERNAL"],
2023-06-28T12:32:26.3009657Z tests               |     ["delta.minReaderVersion", "1"],
2023-06-28T12:32:26.3010285Z tests               |     ["delta.minWriterVersion", "2"],
2023-06-28T12:32:26.3010705Z tests               |     ["foo", "bar"]]
2023-06-28T12:32:26.3011158Z tests               | but could not find the following elements:
2023-06-28T12:32:26.3011607Z tests               |   [["Type", "EXTERNAL"]]

suite-delta-lake-databricks113

delta.minWriterVersion needs to be an integer between [1, 6].

@jkylling
Copy link
Contributor Author

@ebyhr I'm leaning towards excluding version 73, 91, 104 and 113 from the new product tests? I'll make this change. We could potentially refactor the tests to test less, or be version specific.

@jkylling jkylling force-pushed the delta-lake-properties-table branch from 184162c to b00b09f Compare June 28, 2023 16:34
@jkylling jkylling requested a review from ebyhr June 29, 2023 10:03
@findinpath
Copy link
Contributor

findinpath commented Jun 30, 2023

@jkylling excluding the tests will likely hide potential issues in Trino.

Update 05.07.2023

I'm leaning towards excluding version 73, 91, 104 and 113 from the new product tests.

On a second thought, the change makes sense. We're testing actually Trino for accuracy and not Databricks

@findinpath
Copy link
Contributor

@ebyhr can you please trigger the build with secrets?

@ebyhr
Copy link
Member

ebyhr commented Jul 5, 2023

/test-with-secrets sha=b00b09f7dc6b2cd0818696bb8f9ad19ea419fe06

@github-actions
Copy link

github-actions bot commented Jul 6, 2023

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/5470965065

@findinpath
Copy link
Contributor

@ebyhr could you please do another check whether this PR is good to land?

@jkylling jkylling force-pushed the delta-lake-properties-table branch from b00b09f to 214d759 Compare July 17, 2023 20:37
@jkylling jkylling requested a review from ebyhr July 17, 2023 20:37
@jkylling jkylling force-pushed the delta-lake-properties-table branch from 214d759 to 9f1b159 Compare July 18, 2023 08:08
@jkylling jkylling requested a review from ebyhr July 18, 2023 08:09
@ebyhr
Copy link
Member

ebyhr commented Jul 18, 2023

/test-with-secrets sha=9f1b159d2c75485d8dfe81cfe1a895db7c3cef34

@github-actions
Copy link

github-actions bot commented Jul 18, 2023

The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5584978296

@jkylling jkylling force-pushed the delta-lake-properties-table branch from 9f1b159 to 3c61b82 Compare July 19, 2023 08:26
@ebyhr ebyhr merged commit 1e188a1 into trinodb:master Jul 19, 2023
@github-actions github-actions bot added this to the 423 milestone Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

Add properties metadata table for Delta Lake tables
3 participants