Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow querying iceberg tables in hive #10441

Merged

Conversation

findinpath
Copy link
Contributor

Hive connector cannot read from Iceberg tables reason
why querying such tables shouldn't be permitted within
the hive connector.
In case of trying to query an Iceberg table from the
hive connector provide a meaningful message to the user
about not supporting such an operation.

In case that the hive users don't want to see at all
the Iceberg tables, the property hive.hide-iceberg-tables
can be set for the hive connector to true.

Fixes #8693

" <\\[\\1]>\n" +
"but found.*");

throw new SkipException("not supported");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the same model also in the testHideDeltaLakeTables() and I am wondering whether it makes sense to do the skipping at the end of the method.
Can anyone explain the reasoning behind this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the assertion is so we do not forget to "uncomment" the test when we fix the tested code.
But actually why is this test supposed to fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems testDisallowQueryingOfIcebergTables is expected to pass, so why the override?

@Config("hive.hide-delta-lake-tables")
@ConfigDescription("Hide Delta Lake tables in table listings")
public MetastoreConfig setHideDeltaLakeTables(boolean hideDeltaLakeTables)
{
this.hideDeltaLakeTables = hideDeltaLakeTables;
return this;
}

@Config("hive.hide-iceberg-tables")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether the strategy of using hive.hide-xxx-tables properties is that good.
Would it make sense to rather offer a generic property to hide any non-hive tables?

@findinpath findinpath force-pushed the disallow-querying-iceberg-tables-in-hive branch from 8b30733 to d679601 Compare December 31, 2021 14:13
@@ -2870,6 +2870,7 @@ public void testHideDeltaLakeTables()
ConnectorSession session = newSession();
HiveIdentity identity = new HiveIdentity(session);
SchemaTableName tableName = temporaryTable("trino_delta_lake_table");
SchemaTableName propertiesSchemaTableName = new SchemaTableName(tableName.getSchemaName(), format("%s$properties", tableName.getTableName()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: propertiesTableName. Or jus tinline.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First and last commit LGTM (can you extract separate PR with those?)

@losipiuk
Copy link
Member

First and last commit LGTM (can you extract separate PR with those?)

Or two separate PRs?

@sajjoseph
Copy link
Contributor

While I understand the need for this change, shouldn't this PR - #10173 - be a better solution for production use?

@findinpath
Copy link
Contributor Author

While I understand the need for this change, shouldn't this PR - #10173 - be a better solution for production use?

@sajjoseph I'd say that the redirect usage is a different use case than what this PR is trying to solve.

@findinpath
Copy link
Contributor Author

First and last commit LGTM (can you extract separate PR with those?)

Or two separate PRs?

I've created a separate PR for disallowing the query of $properties delta lake table in hive here:

#10447

@findinpath
Copy link
Contributor Author

As @sajjoseph (#10441 (comment)) and @findepi (#10441 (comment)) pointed out, the iceberg redirects are a better option to deal with iceberg tables from shared hive metastores.

@findinpath findinpath closed this Jan 5, 2022
@findepi
Copy link
Member

findepi commented Jan 5, 2022

@findinpath i think there is a misunderstanding about my comment #10441 (comment) .
I asked you to drop the hive.hide-iceberg-tables property, not to drop the PR.

We still want to properly reject Iceberg tables when redirects are not enabled, #8693
Same as we do for Delta tables (to be improved in #10447).

@findinpath findinpath reopened this Jan 5, 2022
@findinpath findinpath force-pushed the disallow-querying-iceberg-tables-in-hive branch from d679601 to b87a079 Compare January 10, 2022 16:10
@findinpath findinpath requested review from findepi and losipiuk and removed request for mosabua January 10, 2022 16:10
@@ -436,6 +437,9 @@ public HiveTableHandle getTableHandle(ConnectorSession session, SchemaTableName
if (isDeltaLakeTable(table)) {
throw new TrinoException(HIVE_UNSUPPORTED_FORMAT, format("Cannot query Delta Lake table '%s'", tableName));
}
if (isIcebergTable(table)) {
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throw "Cannot query Iceberg table" instead of making the system throw "table not found".

Copy link
Contributor Author

@findinpath findinpath Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this doesn't work well with the concept of redirection recently implemented in hive.
If we throw an exception while trying to get the table handle in Hive, the workflow used for in the analyzer for getting the table handle will stop abruptly before checking for redirections.

See https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java#L1517 for reference

The current logic first checks whether we're dealing with a materialized view, then with a view and only at the end with a table (that may contain redirections).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we throw an exception while trying to get the table handle in Hive, the workflow used for in the analyzer for getting the table handle will stop abruptly before checking for redirections.

the redirections should be consulted first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current logic first checks whether we're dealing with a materialized view, then with a view and only at the end with a table (that may contain redirections).

An iceberg table is not a materialized view, nor a view, so get[Materialized]View should return Optional.empty for these calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, when checking whether we're dealing with a materialized view or view there is subsequently made a call towards HiveMetadata#getTableHandle which would end up in throwing an exception instead of returning Optional.empty in case that an exception would be thrown (in a similar fashion as for delta lake tables).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the logic of the StatementAnalyzer is something I'd avoid to do in this case for retrieving first the table redirections.

For this reason, I have opted to partly replicate the logic of HiveMetadata#getTableHandle method in the PartitionsSystemTableProvider which offers me the possibility to do a fine grained handling in case of dealing with an iceberg/delta lake table.

One inconvenient of this approach is that we have duplicated logic in HiveMetadata and in PartitionsSystemTableProvider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the logic of the StatementAnalyzer is something I'd avoid to do in this case for retrieving first the table redirections.

Agreed, this is out of question.

@Override
public void testDisallowQueryingOfIcebergTables()
{
throw new SkipException("not supported");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why disabled?

add assertThatThrownBy(super:: ...

Copy link
Contributor Author

@findinpath findinpath Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While trying to run the test I receive

2022-01-11T03:30:02.708-0600 INFO Copying resource dir 'spark_bucketed_nation' to /var/folders/1q/y42hmc4s3yl38kp0t0q142_c0000gn/T/TestHiveInMemoryMetastore7495715074983273908


java.lang.IllegalArgumentException: Table directory does not exist

	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
	at io.trino.plugin.hive.metastore.thrift.InMemoryThriftMetastore.createTable(InMemoryThriftMetastore.java:190)
	at io.trino.plugin.hive.metastore.thrift.BridgingHiveMetastore.createTable(BridgingHiveMetastore.java:205)
	at io.trino.plugin.hive.AbstractTestHive.testDisallowQueryingOfIcebergTables(AbstractTestHive.java:2978)
	at io.trino.plugin.hive.TestHiveInMemoryMetastore.testDisallowQueryingOfIcebergTables(TestHiveInMemoryMetastore.java:68)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which directory?
This sounds like a test setup problem. Does it mean TestHiveInMemoryMetastore cannot run a test that uses createTable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment to contain the missing directory path.

Note that the similar test testHideDeltaLakeTables is also containing a not supported skip exception.

https://github.com/trinodb/trino/blob/master/plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveInMemoryMetastore.java#L59-L63

@findinpath findinpath force-pushed the disallow-querying-iceberg-tables-in-hive branch 3 times, most recently from 1ec8505 to 20ba90a Compare January 12, 2022 05:57
@findinpath findinpath requested a review from findepi January 12, 2022 05:57
@findinpath findinpath force-pushed the disallow-querying-iceberg-tables-in-hive branch 2 times, most recently from 96bb5b3 to 44f670f Compare January 12, 2022 09:46
@@ -75,7 +76,6 @@ public void testIcebergHiveViewsCompatibility()
.add(row("hive_view_qualified_hive"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@losipiuk

.add(row("iceberg_table")) // TODO: should this be filtered out?

filtering out the entry from show tables would involve querying the properties metadata of each table.

Can I remove this TODO ?

@findinpath
Copy link
Contributor Author

I've added a separate PR #10577 for extending the redirection awareness on the table tasks because the TestHiveRedirectionToIceberg tests were failing with the modification of HiveMetadata#getTableHandle() method when a Iceberg table is being queried.

The current PR has the tests from TestHiveRedirectionToIceberg currently failing. Please advise what would be the sane way to go further with this PR.

@findinpath findinpath force-pushed the disallow-querying-iceberg-tables-in-hive branch 2 times, most recently from bdafcba to 96b1e51 Compare January 14, 2022 09:24
@findinpath findinpath force-pushed the disallow-querying-iceberg-tables-in-hive branch 2 times, most recently from bccfb68 to 3fb545e Compare January 14, 2022 11:13
Table sourceTable = metadata.getMetastore()
.getTable(new HiveIdentity(session), sourceTableName.getSchemaName(), sourceTableName.getTableName())
.orElse(null);
if (sourceTable == null || isDeltaLakeTable(sourceTable)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disallow querying Delta Lake system tables

This is not the right commit title, since it was disallowed before.

Copy link
Contributor Author

@findinpath findinpath Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to name it Disallow instead of Deny.
I considered that before it was denied - by throwing the exception and now it is disallowed. Any wording suggestions in this case?

@@ -82,7 +83,7 @@ public PartitionsSystemTableProvider(HivePartitionManager partitionManager, Type
Table sourceTable = metadata.getMetastore()
.getTable(new HiveIdentity(session), sourceTableName.getSchemaName(), sourceTableName.getTableName())
.orElse(null);
if (sourceTable == null || isDeltaLakeTable(sourceTable)) {
if (sourceTable == null || isDeltaLakeTable(sourceTable) || isIcebergTable(sourceTable)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for follow-up, this returns empty when table not found, while PropertiesSystemTableProvider throws in such case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually PropertiesSystemTableProvider returns Optional.empty() as well.
This is covered by the previous test 3fb545e#diff-92c37834248d69878a49876bfa440c9f6afbee04b1363d402a04cf7fddc7bb62R2905-R2913

onTrino().executeQuery("DROP TABLE " + hiveTableName);
assertQueryFailure(() -> onTrino().executeQuery("TABLE " + icebergTableName))
.hasMessageMatching("\\QQuery failed (#\\E\\S+\\Q): line 1:1: Table '" + icebergTableName + "' does not exist");
//TODO restore test assertions after adding redirection awareness to the DropTableTask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have an issue to link to?

(nit: add space after //)

return Optional.empty();
}
verifyOnline(sourceTableName, Optional.empty(), getProtectMode(sourceTable), sourceTable.getParameters());
HiveTableHandle sourceTableHandle = new HiveTableHandle(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE that compared to HiveMetadata#getTableHandle I didn't include here anymore the check

        // we must not allow system tables due to how permissions are checked in SystemTableAwareAccessControl
        if (getSourceTableNameFromSystemTable(systemTableProviders, tableName).isPresent()) {
            throw new TrinoException(HIVE_INVALID_METADATA, "Unexpected table present in Hive metastore: " + tableName);
        }

Hive connector cannot read from Delta Lake tables
reason why querying system tables for such tables
shouldn't be permitted within the hive connector.

In case of trying to query on the hive connector
the special hive tables:

- $properties
- $partitions
on a Delta Lake table, the user will receive a table
not found exception.
Hive connector cannot read from Iceberg tables
reason why querying such tables shouldn't be
permitted within the hive connector.

In case of trying to query an Iceberg table from the
hive connector (without iceberg redirection enabled)
the user will receive a hive unsupported format exception.

It is no longer possible to create a view in hive
which selects from Iceberg.

In case of trying to query on the hive connector
the special hive tables:

- $properties
- $partitions
on an Iceberg table, the user will receive a table
not found exception.
@findinpath findinpath force-pushed the disallow-querying-iceberg-tables-in-hive branch from 3fb545e to 22c899a Compare January 18, 2022 14:08
@findepi
Copy link
Member

findepi commented Jan 18, 2022

@findinpath
Copy link
Contributor Author

@findepi I changed the assertions in the test class testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveRedirectionToIceberg.java so that the tests are green.

@findepi findepi merged commit 12a6b5f into trinodb:master Jan 19, 2022
@github-actions github-actions bot added this to the 369 milestone Jan 19, 2022
This was referenced Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Properly reject Iceberg tables in Hive connector
4 participants