-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete Iceberg table data on DROP #11062
Conversation
ab705af
to
055da5f
Compare
@mosabua can you please indicate whether we should document explicitly in the documentation of the connector the I had a look over https://docs.starburst.io/latest/connector/starburst-delta-lake.html which implements as well this concept and haven't seen any reference there for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the first commit is a pretty good change, though I'd reword it to clarify that it only affects FileHiveMetastore
. Something like "allow alternate table data locations in FileHiveMetastore."
I didn't look at the second commit in detail, but I think the commit message should note that it's basically the same way HiveMetadata
handles the same information.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/TableMetadata.java
Outdated
Show resolved
Hide resolved
055da5f
to
8a9e89d
Compare
I was just taking a look through the Iceberg API utilities, did you see https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/CatalogUtil.java#L78 ? Looks like it may be a safer way of dropping table files |
@alexjo2144 yes I did see also this utility function and to be honest, I was planning to delegate the deletion logic to this method. However, in the current state of Trino, there are tables being dropped only when they don't have any overridden location properties. So, in the current stage of the Iceberg connector implementation, the deletion of the table data can be still delegated to the Hive Metastore. |
8a9e89d
to
ad06d18
Compare
I think the main advantage to using that utility instead of delegating the delete to Hive is consistency between tables made my Spark and tables made by Trino. If I follow the same steps:
In either Spark or Trino, and then drop the tables from Trino, I would expect it to do the same thing in either case, but that's not true here since we're relying on the managed/external flag being set on table creation and Spark will always set it to be external. Does that make sense? |
@alexjo2144 this goes rather in the direction of interpreting all tables as EXTERNAL (the current status quo before this PR). |
I don't think this is the case. I believe a table is only supposed to be considered external if it has the |
In the Hive connector, yes, but Iceberg does not really have a well defined managed/external definition yet.
That's correct, personally I think this would be a better approach. |
I went ahead and looked through the iceberg's implementation for spark
As it can be seen, Spark does delegate to |
dca7dc4
to
c0007f5
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java
Show resolved
Hide resolved
a3a7b31
to
160c5c4
Compare
I think so. Renaming a managed table may cause Hive metastore to do table location path rename on the filesystem. I think it is problematic for iceberg |
e647522
to
01deab7
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java
Show resolved
Hide resolved
...sts/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkDropTableCompatibility.java
Outdated
Show resolved
Hide resolved
...sts/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkDropTableCompatibility.java
Outdated
Show resolved
Hide resolved
...rino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergTableWithExternalLocation.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Outdated
Show resolved
Hide resolved
01deab7
to
0f230c9
Compare
Share DataFileRecord functionality across the Iceberg tests
Provide for dropping tables a consistent functionality with the Hive Metastore Service.
Rebasing the PR on master to make use of the update to iceberg library to the version |
0f230c9
to
8387c84
Compare
8387c84
to
0075653
Compare
Delegate the responsibility of deleting the data files of the table to Iceberg because the data files of the Iceberg table may be located in different locations. The DROP statement deletes the Iceberg table data irrespective of the fact that the `location` parameter has been specified explicitly on the table, or it was implicitly derived from the schema location. The Iceberg table directory is also being removed. In the unlikely case that this directory may be shared with other tables, dropping of any of these tables will delete the directory and will affect therefore also other tables.
0075653
to
2c47087
Compare
Description
Delegate the responsibility of deleting the data files
of the table to Iceberg because the data files
of the Iceberg table may be located in different locations.
The DROP statement deletes the Iceberg table data irrespective
of the fact that the
location
parameter has been specifiedexplicitly on the table, or it was implicitly derived from
the schema location.
It is important to note that the DROP functionality can be currently
performed only for Iceberg tables that do not contain path overrides
(
write.object-storage.path
,write.folder-storage.path
,write.metadata.path
).General information
This is a fix.
Previously the data of the Iceberg table was not removed by HMS because Iceberg tables are marked implicitly as EXTERNAL, reason why the data of the table is not deleted by HMS on DROP.
This change relates solely to the Iceberg connector.
This feature brings the ability to delete the data files while performing a DROP statements.
Related issues, pull requests, and links
Fixes #5616
Related PR: #6063
Depends on PR: #11032
Documentation
( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: