-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow querying iceberg tables in hive #10441
Disallow querying iceberg tables in hive #10441
Conversation
" <\\[\\1]>\n" + | ||
"but found.*"); | ||
|
||
throw new SkipException("not supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the same model also in the testHideDeltaLakeTables()
and I am wondering whether it makes sense to do the skipping at the end of the method.
Can anyone explain the reasoning behind this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the assertion is so we do not forget to "uncomment" the test when we fix the tested code.
But actually why is this test supposed to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems testDisallowQueryingOfIcebergTables
is expected to pass, so why the override?
@Config("hive.hide-delta-lake-tables") | ||
@ConfigDescription("Hide Delta Lake tables in table listings") | ||
public MetastoreConfig setHideDeltaLakeTables(boolean hideDeltaLakeTables) | ||
{ | ||
this.hideDeltaLakeTables = hideDeltaLakeTables; | ||
return this; | ||
} | ||
|
||
@Config("hive.hide-iceberg-tables") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether the strategy of using hive.hide-xxx-tables
properties is that good.
Would it make sense to rather offer a generic property to hide any non-hive tables?
8b30733
to
d679601
Compare
@@ -2870,6 +2870,7 @@ public void testHideDeltaLakeTables() | |||
ConnectorSession session = newSession(); | |||
HiveIdentity identity = new HiveIdentity(session); | |||
SchemaTableName tableName = temporaryTable("trino_delta_lake_table"); | |||
SchemaTableName propertiesSchemaTableName = new SchemaTableName(tableName.getSchemaName(), format("%s$properties", tableName.getTableName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: propertiesTableName
. Or jus tinline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First and last commit LGTM (can you extract separate PR with those?)
Or two separate PRs? |
While I understand the need for this change, shouldn't this PR - #10173 - be a better solution for production use? |
@sajjoseph I'd say that the redirect usage is a different use case than what this PR is trying to solve. |
I've created a separate PR for disallowing the query of |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/PropertiesSystemTableProvider.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/PropertiesSystemTableProvider.java
Show resolved
Hide resolved
As @sajjoseph (#10441 (comment)) and @findepi (#10441 (comment)) pointed out, the iceberg redirects are a better option to deal with iceberg tables from shared hive metastores. |
@findinpath i think there is a misunderstanding about my comment #10441 (comment) . We still want to properly reject Iceberg tables when redirects are not enabled, #8693 |
d679601
to
b87a079
Compare
@@ -436,6 +437,9 @@ public HiveTableHandle getTableHandle(ConnectorSession session, SchemaTableName | |||
if (isDeltaLakeTable(table)) { | |||
throw new TrinoException(HIVE_UNSUPPORTED_FORMAT, format("Cannot query Delta Lake table '%s'", tableName)); | |||
} | |||
if (isIcebergTable(table)) { | |||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw "Cannot query Iceberg table" instead of making the system throw "table not found".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this doesn't work well with the concept of redirection recently implemented in hive.
If we throw an exception while trying to get the table handle in Hive, the workflow used for in the analyzer for getting the table handle will stop abruptly before checking for redirections.
See https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/sql/analyzer/StatementAnalyzer.java#L1517 for reference
The current logic first checks whether we're dealing with a materialized view, then with a view and only at the end with a table (that may contain redirections).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we throw an exception while trying to get the table handle in Hive, the workflow used for in the analyzer for getting the table handle will stop abruptly before checking for redirections.
the redirections should be consulted first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic first checks whether we're dealing with a materialized view, then with a view and only at the end with a table (that may contain redirections).
An iceberg table is not a materialized view, nor a view, so get[Materialized]View
should return Optional.empty
for these calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, when checking whether we're dealing with a materialized view or view there is subsequently made a call towards HiveMetadata#getTableHandle
which would end up in throwing an exception instead of returning Optional.empty
in case that an exception would be thrown (in a similar fashion as for delta lake tables).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the logic of the StatementAnalyzer
is something I'd avoid to do in this case for retrieving first the table redirections.
For this reason, I have opted to partly replicate the logic of HiveMetadata#getTableHandle
method in the PartitionsSystemTableProvider
which offers me the possibility to do a fine grained handling in case of dealing with an iceberg/delta lake table.
One inconvenient of this approach is that we have duplicated logic in HiveMetadata
and in PartitionsSystemTableProvider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the logic of the
StatementAnalyzer
is something I'd avoid to do in this case for retrieving first the table redirections.
Agreed, this is out of question.
@Override | ||
public void testDisallowQueryingOfIcebergTables() | ||
{ | ||
throw new SkipException("not supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why disabled?
add assertThatThrownBy(super::
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While trying to run the test I receive
2022-01-11T03:30:02.708-0600 INFO Copying resource dir 'spark_bucketed_nation' to /var/folders/1q/y42hmc4s3yl38kp0t0q142_c0000gn/T/TestHiveInMemoryMetastore7495715074983273908
java.lang.IllegalArgumentException: Table directory does not exist
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
at io.trino.plugin.hive.metastore.thrift.InMemoryThriftMetastore.createTable(InMemoryThriftMetastore.java:190)
at io.trino.plugin.hive.metastore.thrift.BridgingHiveMetastore.createTable(BridgingHiveMetastore.java:205)
at io.trino.plugin.hive.AbstractTestHive.testDisallowQueryingOfIcebergTables(AbstractTestHive.java:2978)
at io.trino.plugin.hive.TestHiveInMemoryMetastore.testDisallowQueryingOfIcebergTables(TestHiveInMemoryMetastore.java:68)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which directory?
This sounds like a test setup problem. Does it mean TestHiveInMemoryMetastore
cannot run a test that uses createTable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment to contain the missing directory path.
Note that the similar test testHideDeltaLakeTables
is also containing a not supported
skip exception.
1ec8505
to
20ba90a
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/PartitionsSystemTableProvider.java
Outdated
Show resolved
Hide resolved
96bb5b3
to
44f670f
Compare
core/trino-main/src/main/java/io/trino/execution/AddColumnTask.java
Outdated
Show resolved
Hide resolved
@@ -75,7 +76,6 @@ public void testIcebergHiveViewsCompatibility() | |||
.add(row("hive_view_qualified_hive")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.add(row("iceberg_table")) // TODO: should this be filtered out?
filtering out the entry from show tables would involve querying the properties metadata of each table.
Can I remove this TODO ?
I've added a separate PR #10577 for extending the redirection awareness on the table tasks because the The current PR has the tests from |
bdafcba
to
96b1e51
Compare
bccfb68
to
3fb545e
Compare
Table sourceTable = metadata.getMetastore() | ||
.getTable(new HiveIdentity(session), sourceTableName.getSchemaName(), sourceTableName.getTableName()) | ||
.orElse(null); | ||
if (sourceTable == null || isDeltaLakeTable(sourceTable)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disallow querying Delta Lake system tables
This is not the right commit title, since it was disallowed before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought to name it Disallow
instead of Deny
.
I considered that before it was denied - by throwing the exception and now it is disallowed. Any wording suggestions in this case?
@@ -82,7 +83,7 @@ public PartitionsSystemTableProvider(HivePartitionManager partitionManager, Type | |||
Table sourceTable = metadata.getMetastore() | |||
.getTable(new HiveIdentity(session), sourceTableName.getSchemaName(), sourceTableName.getTableName()) | |||
.orElse(null); | |||
if (sourceTable == null || isDeltaLakeTable(sourceTable)) { | |||
if (sourceTable == null || isDeltaLakeTable(sourceTable) || isIcebergTable(sourceTable)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for follow-up, this returns empty when table not found, while PropertiesSystemTableProvider throws in such case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually PropertiesSystemTableProvider
returns Optional.empty()
as well.
This is covered by the previous test 3fb545e#diff-92c37834248d69878a49876bfa440c9f6afbee04b1363d402a04cf7fddc7bb62R2905-R2913
onTrino().executeQuery("DROP TABLE " + hiveTableName); | ||
assertQueryFailure(() -> onTrino().executeQuery("TABLE " + icebergTableName)) | ||
.hasMessageMatching("\\QQuery failed (#\\E\\S+\\Q): line 1:1: Table '" + icebergTableName + "' does not exist"); | ||
//TODO restore test assertions after adding redirection awareness to the DropTableTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have an issue to link to?
(nit: add space after //
)
return Optional.empty(); | ||
} | ||
verifyOnline(sourceTableName, Optional.empty(), getProtectMode(sourceTable), sourceTable.getParameters()); | ||
HiveTableHandle sourceTableHandle = new HiveTableHandle( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE that compared to HiveMetadata#getTableHandle
I didn't include here anymore the check
// we must not allow system tables due to how permissions are checked in SystemTableAwareAccessControl
if (getSourceTableNameFromSystemTable(systemTableProviders, tableName).isPresent()) {
throw new TrinoException(HIVE_INVALID_METADATA, "Unexpected table present in Hive metastore: " + tableName);
}
Hive connector cannot read from Delta Lake tables reason why querying system tables for such tables shouldn't be permitted within the hive connector. In case of trying to query on the hive connector the special hive tables: - $properties - $partitions on a Delta Lake table, the user will receive a table not found exception.
Hive connector cannot read from Iceberg tables reason why querying such tables shouldn't be permitted within the hive connector. In case of trying to query an Iceberg table from the hive connector (without iceberg redirection enabled) the user will receive a hive unsupported format exception. It is no longer possible to create a view in hive which selects from Iceberg. In case of trying to query on the hive connector the special hive tables: - $properties - $partitions on an Iceberg table, the user will receive a table not found exception.
3fb545e
to
22c899a
Compare
The force push diff (https://github.com/trinodb/trino/compare/3fb545ed8d4c40042966ec147923987ff79208f3..22c899a87226b5636191acc3e25cfc00be641f1b) is large. |
@findepi I changed the assertions in the test class |
Hive connector cannot read from Iceberg tables reason
why querying such tables shouldn't be permitted within
the hive connector.
In case of trying to query an Iceberg table from the
hive connector provide a meaningful message to the user
about not supporting such an operation.
In case that the hive users don't want to see at all
the Iceberg tables, the property
hive.hide-iceberg-tables
can be set for the
hive
connector totrue
.Fixes #8693