-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow dropping Hive schemas that contain external files #9740
Disallow dropping Hive schemas that contain external files #9740
Conversation
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestCreateDropSchema.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
514304f
to
2d197c5
Compare
1aa0008
to
e993519
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java
Outdated
Show resolved
Hide resolved
private static final SchemaType DATABASE = SchemaType.DATABASE; | ||
private static final SchemaType TABLE = SchemaType.TABLE; | ||
private static final SchemaType PARTITION = SchemaType.PARTITION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just static import those (actually I am not sure if you really can) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I make the enum non-private, I could.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - would make more sense to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it package-private with a comment.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java
Outdated
Show resolved
Hide resolved
deleteMetadataDirectory(DATABASE, databaseMetadataDirectory); | ||
try { | ||
if (!metadataFileSystem.delete(getSchemaPath(DATABASE, databaseMetadataDirectory), false)) { | ||
throw new TrinoException(HIVE_METASTORE_ERROR, "Failed to delete database schema file"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: put path in exception message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A few nits. To be merged after.
fc68747
to
daf9981
Compare
This shouldn't be merged yet, pending offline discussion. |
Per offline discussion, we should unregister such schema, and leave the files on the storage intact. |
daf9981
to
412d29e
Compare
@@ -783,7 +783,27 @@ public void createSchema(ConnectorSession session, String schemaName, Map<String | |||
@Override | |||
public void dropSchema(ConnectorSession session, String schemaName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I think about it, the less sure I am that this is the right place to delete the files.
I think it might be better to include a deleteData
parameter for HiveMetastore.dropDatabase
, and pass true when there are no external files. That argument is already present on HiveMetastore.dropTable
and dropPartition
, too, so it wouldn't be entirely out of place.
I guess if we delete the file here, though, HiveMetastore.dropDatabase
should note that it's not supposed to delete non-metadata files.
assertQuerySucceeds(format("CREATE SCHEMA %s WITH (location = '%s')", schemaName, schemaDir)); | ||
assertQuerySucceeds(format("DROP SCHEMA %s", schemaName)); | ||
@Test | ||
public void testDropSchemaWithoutLocation() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely sure how this case still works (well, it seemed to work locally; maybe it doesn't actually work). If the location property is not set, HiveMetadata
shouldn't be deleting anything, and the HiveMetastore
implementations shouldn't be deleting anything, either.
(Though I don't believe this test runs with every HiveMetastore
we have, which would probably be good coverage to have.)
Extracted #9902 from here |
FileSystem fs = hdfsEnvironment.getFileSystem(new HdfsContext(session), path); | ||
// If no files in schema directory, delete it | ||
if (!fs.listFiles(path, false).hasNext()) { | ||
fs.delete(path, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done in SemiTransactionalHiveMetastore
, and only when commit
is invoked
add a test with:
DROP SCHEMA
- rollback
- try to use that schema
} | ||
|
||
@Test | ||
public void testDropSchemaDeletesDirectory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This depends on file systems.
We should have a copy of that test in io.trino.plugin.hive.AbstractTestHive
, so it's run for supported storages too.
#9902 merged, @jirassimok can you please rebase & squash? |
aab6657
to
1b56cc6
Compare
fb8d1c4
to
85a8348
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First commit LGTM
@electrum this slightly changes the semantics of |
fe78787
to
7b2e2d3
Compare
That update added some logging. |
Still LGTM. Maybe:
And we can merge this one? |
Sounds good. @alexjo2144 also suggested logging a warning instead of throwing an exception if there's an error while trying to delete the files, because the schema has already been dropped at that point. |
7b2e2d3
to
149992b
Compare
In HiveMetadata, delete an empty schema location after dropping it from the metastore. In ThriftHiveMetastore and FileHiveMetastore, do not delete data.
149992b
to
11ba2d2
Compare
Last push just fixed an copy/paste error in the tests that made one of them fail. |
This avoids potential data loss when a schema is created and dropped in a location that already has files.