Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to redirect table operations from Iceberg to Hive #11356

Merged
merged 1 commit into from
Apr 6, 2022

Conversation

findinpath
Copy link
Contributor

Description

New property introduced for the Iceberg connector: iceberg.hive-catalog-name

Is this change a fix, improvement, new feature, refactoring, or other?

New feature

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

The changes in this PR affect mainly the Iceberg connector.

How would you describe this change to a non-technical end user or system administrator?

In an environment which makes use of a shared metastore it may come in handy to have table redirects to automatically allow Trino to translate a table name like iceberg.default.hive_table_name towards the name hive.default.hive_table_name.

Note that the translation can happen quite transparently when the user connects to a predefined catalog and schema (e.g. : iceberg.default) and the table operation looks like: SELECT * FROM hive_table_name.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Iceberg
* Add support to redirect table operations from Iceberg to Hive

@findinpath findinpath marked this pull request as draft March 8, 2022 11:28
@findepi findepi requested review from homar and alexjo2144 March 9, 2022 10:10
@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch 2 times, most recently from 9501f3e to ebd2fbb Compare March 9, 2022 16:10
@findinpath findinpath marked this pull request as ready for review March 9, 2022 16:13
@findinpath findinpath requested a review from jklamer March 9, 2022 16:13
@findinpath findinpath requested review from phd3 and findepi March 11, 2022 10:26
@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from 0d33751 to acc1604 Compare March 11, 2022 10:27
@@ -732,6 +735,67 @@ public void renameMaterializedView(ConnectorSession session, SchemaTableName sou
return listNamespaces(session);
}

@Override
public Optional<CatalogSchemaTableName> redirectTable(ConnectorSession session, SchemaTableName tableName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like an Iceberg-catalog level method. Can we narrow this down to just the things that are dependent on the catalog implementation and move the rest up to IcebergMetadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like an Iceberg-catalog level method.

Initially I created a TableRedirectionHandler abstraction which was created at the same time as the TrinoCatalog instance, but I dropped afterwards the idea because the redirection is tightly linked to the metastore implementation (hive/glue).

The same boilerplate code used for creating the TrinoCatalog would need to be done for redirection handling as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, it probably makes sense to take out this operation from TrinoCatalog in hindsight of JDBC / REST catalogs which will not have anymore Hive related content in them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the code again so that the redirectTable method lives in the TrinoHiveCatalog and TrinoGlueCatalog, but it is not exposed in the TrinoCatalog interface in order to keep the interface free from the concept of table redirection, but still be able to provide this functionality for hive & glue.

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from acc1604 to 6b75248 Compare March 14, 2022 12:08
@findinpath findinpath requested a review from alexjo2144 March 14, 2022 12:08
@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from 6b75248 to d863ff6 Compare March 14, 2022 14:31
@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch 6 times, most recently from 152eafb to 909f1cc Compare March 17, 2022 10:25

// test via redirection with just schema filter
// test via redirection with just schema filter - consistent with the functionality of the command `SHOW TABLES` command on the Iceberg connector
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the change in functionality here for the Iceberg connector. Please provide feedback whether the provided functionality is incorrect from your perspective.

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from 1058105 to 50e3430 Compare March 31, 2022 08:51
@findinpath
Copy link
Contributor Author

Rebased on master to adapt the redirect test io.trino.tests.product.iceberg.TestIcebergRedirectionToHive#testAlterTableRename

@@ -205,7 +206,9 @@ public IcebergMetadata(
public IcebergTableHandle getTableHandle(ConnectorSession session, SchemaTableName tableName)
{
IcebergTableName name = IcebergTableName.from(tableName.getTableName());
verify(name.getTableType() == DATA, "Wrong table type: " + name.getTableNameWithType());
if (name.getTableType() != DATA) {
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the table redirection towards Hive connector is not enabled,
in case of trying to query on the Iceberg connector
a metadata table of a Hive connector table,
the user will receive a table not found exception.

From io.trino.tests.product.iceberg.TestIcebergHiveTablesCompatibility#testIcebergSelectFromHiveTable

        assertQueryFailure(() -> onTrino().executeQuery("SELECT * FROM iceberg.default.\"" + tableName + "$data\""))
                .hasMessageMatching("Query failed \\(#\\w+\\):\\Q Not an Iceberg table: default." + tableName);

        assertQueryFailure(() -> onTrino().executeQuery("SELECT * FROM iceberg.default.\"" + tableName + "$files\""))
                .hasMessageMatching("Query failed \\(#\\w+\\):\\Q line 1:15: Table 'iceberg.default." + tableName + "$files' does not exist");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping verify in such case seems "OK" too, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping verify would issue for the statement:

onTrino().executeQuery("SELECT * FROM iceberg.default.\"" + hiveTableName + "$files\"")

error messages like the following:

Wrong table type: test_iceberg_select_from_hive_63u5u11q3c70$files

I tend to say that the error message

Table 'iceberg.default." + hiveTableName + "$files' does not exist"

fits better (not ideal, but better).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires code comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it clear from the code perspective:

io.trino.plugin.iceberg.IcebergMetadata#getTableHandle is reached by a select from hive_table_name$files because there is no system table in Iceberg found for the hive_table_name (IcebergMetadata#getSystemTable returns Optional.empty() in such cases).

In StatementAnalyzer#visitTable the logic of the method assumes that if the identifier doesn't correspond to a MV or to a view, it is certainly a table. For this reason IcebergMetadata#getTableHandle is called with the rather unexpected argument hive_table_name$files.

Probably a refactoring of StatementAnalyzer#visitTable method which verifies in the beginning whether we're dealing with an redirected table and acts accordingly would be the right way to go, but such a change (if it makes sense) would rather fit in a different PR.

Comment on lines 480 to 613
if (isIcebergTable(table.get())) {
// After redirecting, use the original table name, with "$partitions" and similar suffixes
return targetCatalogName.map(catalog -> new CatalogSchemaTableName(catalog, tableName));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should redirect non-Iceberg tables, rather than redirect Iceberg tables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this bug because I didn't know I can run the Glue tests locally.
@findepi Once the PR is in a good shape, please run it against the CI with AWS GLue secrets.

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from 50e3430 to 13196c8 Compare April 1, 2022 10:17
@findinpath
Copy link
Contributor Author

Rebased on master to make use of b395286

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch 2 times, most recently from 863c172 to bc81a1a Compare April 1, 2022 10:47
@findinpath findinpath requested a review from findepi April 1, 2022 10:50
}
if (!isIcebergTable(table.get())) {
// After redirecting, use the original table name, with "$partitions" and similar suffixes
return targetCatalogName.map(catalog -> new CatalogSchemaTableName(catalog, tableName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I'd use get instead of map here, makes it clear that an empty catalog name shouldn't ever show up at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a few lines above:

        Optional<String> targetCatalogName = getHiveCatalogName(session);
        if (targetCatalogName.isEmpty()) {
            return Optional.empty();
        }

Also note that the method returns an Optional, so I'd have to do .get() and then wrap the result back to Optional.


Optional<com.amazonaws.services.glue.model.Table> table = getTable(new SchemaTableName(tableNameBase.getSchemaName(), tableNameBase.getTableName()));

if (table.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that Glue views are merged we should probably match what the Hive catalog has for this line: (table.isEmpty() || VIRTUAL_VIEW.name().equals(table.get().getTableType()))

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from bc81a1a to 58426ff Compare April 4, 2022 10:18
@findinpath
Copy link
Contributor Author

Rebased on top of master to make use of the changes from #11617

@@ -205,7 +206,10 @@ public IcebergMetadata(
public IcebergTableHandle getTableHandle(ConnectorSession session, SchemaTableName tableName)
{
IcebergTableName name = IcebergTableName.from(tableName.getTableName());
verify(name.getTableType() == DATA, "Wrong table type: " + name.getTableNameWithType());
if (name.getTableType() != DATA) {
// Avoid dealing with non DATA table types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Pretend the table does not exist to produce better message in case of redirects to Hive

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch 2 times, most recently from fd3366f to 101291d Compare April 5, 2022 12:36
@findinpath
Copy link
Contributor Author

Rebased on top of master to deal with the refactoring of the glue tests.

The Iceberg connector can make use of the `iceberg.hive-catalog-name`
configuration property for enable table redirects towards the Hive tables.

When the table redirection towards Hive connector is not enabled,
in case of trying to query on the Iceberg connector
a metadata table of a Hive connector table,
the user will receive a table not found exception.
@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from 101291d to b2a73ff Compare April 5, 2022 13:34
@findepi findepi merged commit 7562be6 into trinodb:master Apr 6, 2022
@findepi findepi mentioned this pull request Apr 6, 2022
@github-actions github-actions bot added this to the 376 milestone Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants