-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to redirect table operations from Iceberg to Hive #11356
Add support to redirect table operations from Iceberg to Hive #11356
Conversation
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
...product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergRedirectionToHive.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
9501f3e
to
ebd2fbb
Compare
...t-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergHiveTablesCompatibility.java
Outdated
Show resolved
Hide resolved
0d33751
to
acc1604
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSessionProperties.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoCatalog.java
Outdated
Show resolved
Hide resolved
@@ -732,6 +735,67 @@ public void renameMaterializedView(ConnectorSession session, SchemaTableName sou | |||
return listNamespaces(session); | |||
} | |||
|
|||
@Override | |||
public Optional<CatalogSchemaTableName> redirectTable(ConnectorSession session, SchemaTableName tableName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like an Iceberg-catalog level method. Can we narrow this down to just the things that are dependent on the catalog implementation and move the rest up to IcebergMetadata
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like an Iceberg-catalog level method.
Initially I created a TableRedirectionHandler
abstraction which was created at the same time as the TrinoCatalog
instance, but I dropped afterwards the idea because the redirection is tightly linked to the metastore implementation (hive/glue).
The same boilerplate code used for creating the TrinoCatalog would need to be done for redirection handling as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second thought, it probably makes sense to take out this operation from TrinoCatalog in hindsight of JDBC / REST catalogs which will not have anymore Hive related content in them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified the code again so that the redirectTable
method lives in the TrinoHiveCatalog
and TrinoGlueCatalog
, but it is not exposed in the TrinoCatalog
interface in order to keep the interface free from the concept of table redirection, but still be able to provide this functionality for hive & glue.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergConfig.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
acc1604
to
6b75248
Compare
6b75248
to
d863ff6
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
152eafb
to
909f1cc
Compare
|
||
// test via redirection with just schema filter | ||
// test via redirection with just schema filter - consistent with the functionality of the command `SHOW TABLES` command on the Iceberg connector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the change in functionality here for the Iceberg connector. Please provide feedback whether the provided functionality is incorrect from your perspective.
1058105
to
50e3430
Compare
Rebased on |
@@ -205,7 +206,9 @@ public IcebergMetadata( | |||
public IcebergTableHandle getTableHandle(ConnectorSession session, SchemaTableName tableName) | |||
{ | |||
IcebergTableName name = IcebergTableName.from(tableName.getTableName()); | |||
verify(name.getTableType() == DATA, "Wrong table type: " + name.getTableNameWithType()); | |||
if (name.getTableType() != DATA) { | |||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the table redirection towards Hive connector is not enabled,
in case of trying to query on the Iceberg connector
a metadata table of a Hive connector table,
the user will receive a table not found exception.
From io.trino.tests.product.iceberg.TestIcebergHiveTablesCompatibility#testIcebergSelectFromHiveTable
assertQueryFailure(() -> onTrino().executeQuery("SELECT * FROM iceberg.default.\"" + tableName + "$data\""))
.hasMessageMatching("Query failed \\(#\\w+\\):\\Q Not an Iceberg table: default." + tableName);
assertQueryFailure(() -> onTrino().executeQuery("SELECT * FROM iceberg.default.\"" + tableName + "$files\""))
.hasMessageMatching("Query failed \\(#\\w+\\):\\Q line 1:15: Table 'iceberg.default." + tableName + "$files' does not exist");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keeping verify
in such case seems "OK" too, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping verify
would issue for the statement:
onTrino().executeQuery("SELECT * FROM iceberg.default.\"" + hiveTableName + "$files\"")
error messages like the following:
Wrong table type: test_iceberg_select_from_hive_63u5u11q3c70$files
I tend to say that the error message
Table 'iceberg.default." + hiveTableName + "$files' does not exist"
fits better (not ideal, but better).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires code comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make it clear from the code perspective:
io.trino.plugin.iceberg.IcebergMetadata#getTableHandle
is reached by a select from hive_table_name$files
because there is no system table in Iceberg found for the hive_table_name
(IcebergMetadata#getSystemTable
returns Optional.empty()
in such cases).
In StatementAnalyzer#visitTable
the logic of the method assumes that if the identifier doesn't correspond to a MV or to a view, it is certainly a table. For this reason IcebergMetadata#getTableHandle
is called with the rather unexpected argument hive_table_name$files
.
Probably a refactoring of StatementAnalyzer#visitTable
method which verifies in the beginning whether we're dealing with an redirected table and acts accordingly would be the right way to go, but such a change (if it makes sense) would rather fit in a different PR.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadataFactory.java
Outdated
Show resolved
Hide resolved
if (isIcebergTable(table.get())) { | ||
// After redirecting, use the original table name, with "$partitions" and similar suffixes | ||
return targetCatalogName.map(catalog -> new CatalogSchemaTableName(catalog, tableName)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should redirect non-Iceberg tables, rather than redirect Iceberg tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this bug because I didn't know I can run the Glue tests locally.
@findepi Once the PR is in a good shape, please run it against the CI with AWS GLue secrets.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalog.java
Show resolved
Hide resolved
...t-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergHiveTablesCompatibility.java
Show resolved
Hide resolved
50e3430
to
13196c8
Compare
Rebased on |
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestSharedHiveMetastore.java
Outdated
Show resolved
Hide resolved
863c172
to
bc81a1a
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Show resolved
Hide resolved
} | ||
if (!isIcebergTable(table.get())) { | ||
// After redirecting, use the original table name, with "$partitions" and similar suffixes | ||
return targetCatalogName.map(catalog -> new CatalogSchemaTableName(catalog, tableName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, I'd use get
instead of map
here, makes it clear that an empty catalog name shouldn't ever show up at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a few lines above:
Optional<String> targetCatalogName = getHiveCatalogName(session);
if (targetCatalogName.isEmpty()) {
return Optional.empty();
}
Also note that the method returns an Optional
, so I'd have to do .get()
and then wrap the result back to Optional
.
|
||
Optional<com.amazonaws.services.glue.model.Table> table = getTable(new SchemaTableName(tableNameBase.getSchemaName(), tableNameBase.getTableName())); | ||
|
||
if (table.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that Glue views are merged we should probably match what the Hive catalog has for this line: (table.isEmpty() || VIRTUAL_VIEW.name().equals(table.get().getTableType()))
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
bc81a1a
to
58426ff
Compare
Rebased on top of |
@@ -205,7 +206,10 @@ public IcebergMetadata( | |||
public IcebergTableHandle getTableHandle(ConnectorSession session, SchemaTableName tableName) | |||
{ | |||
IcebergTableName name = IcebergTableName.from(tableName.getTableName()); | |||
verify(name.getTableType() == DATA, "Wrong table type: " + name.getTableNameWithType()); | |||
if (name.getTableType() != DATA) { | |||
// Avoid dealing with non DATA table types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Pretend the table does not exist to produce better message in case of redirects to Hive
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
fd3366f
to
101291d
Compare
Rebased on top of |
The Iceberg connector can make use of the `iceberg.hive-catalog-name` configuration property for enable table redirects towards the Hive tables. When the table redirection towards Hive connector is not enabled, in case of trying to query on the Iceberg connector a metadata table of a Hive connector table, the user will receive a table not found exception.
101291d
to
b2a73ff
Compare
Description
New property introduced for the Iceberg connector:
iceberg.hive-catalog-name
New feature
The changes in this PR affect mainly the Iceberg connector.
In an environment which makes use of a shared metastore it may come in handy to have table redirects to automatically allow Trino to translate a table name like
iceberg.default.hive_table_name
towards the namehive.default.hive_table_name
.Note that the translation can happen quite transparently when the user connects to a predefined catalog and schema (e.g. :
iceberg.default
) and the table operation looks like:SELECT * FROM hive_table_name
.Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: