-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow dropping Hive schema that contains external files #10146
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i understand the missing piece is the logic which would drive the value for the deleteData
parameter.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/glue/GlueHiveMetastore.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/glue/GlueHiveMetastore.java
Outdated
Show resolved
Hide resolved
@jirassimok can you please verify Glue |
811fe45
to
71571a0
Compare
@jirassimok did you look at the CI's red? |
It doesn't look related.
|
e773947
to
c3e09be
Compare
.../trino-hive/src/main/java/io/trino/plugin/hive/metastore/SemiTransactionalHiveMetastore.java
Show resolved
Hide resolved
.../trino-hive/src/main/java/io/trino/plugin/hive/metastore/SemiTransactionalHiveMetastore.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed. Tiny things + question ragarding default behaviour for case when we fail to list schema directory.
c3e09be
to
e3abc7a
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/file/FileHiveMetastore.java
Outdated
Show resolved
Hide resolved
@@ -203,6 +203,12 @@ public synchronized void createDatabase(HiveIdentity identity, Database database | |||
|
|||
Path databaseMetadataDirectory = getDatabaseMetadataDirectory(database.getDatabaseName()); | |||
writeSchemaFile(DATABASE, databaseMetadataDirectory, databaseCodec, new DatabaseMetadata(currentVersion, database), false); | |||
try { | |||
metadataFileSystem.mkdirs(databaseMetadataDirectory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it needed in Don't put database schema files in the database directories
commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ensuring that the database directory exists when the database is created. It's not strictly necessary, but I think it makes sense.
@@ -555,6 +562,10 @@ public void dropDatabase(HiveIdentity identity, String databaseName, boolean del | |||
catch (AmazonServiceException e) { | |||
throw new TrinoException(HIVE_METASTORE_ERROR, e); | |||
} | |||
|
|||
if (deleteData) { | |||
location.ifPresent(path -> deleteDir(hdfsContext, hdfsEnvironment, new Path(path), true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's an error when deleteData && location.isEmpty()
.
Either checkState
or add a comment why we're not failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can theoretically occur if database.getLocation()
returns empty.
In this case, it only occurs if Glue's API returns null for the database location, which seems unlikely, but that method is not documented as non-null.
.../trino-hive/src/main/java/io/trino/plugin/hive/metastore/SemiTransactionalHiveMetastore.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestCreateDropSchema.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestCreateDropSchema.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestCreateDropSchema.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestCreateDropSchema.java
Outdated
Show resolved
Hide resolved
.../trino-hive/src/main/java/io/trino/plugin/hive/metastore/SemiTransactionalHiveMetastore.java
Outdated
Show resolved
Hide resolved
e3abc7a
to
07ed54f
Compare
2fd7051
to
6116ac0
Compare
Green tests before new push. |
6116ac0
to
73c0880
Compare
Changes in last update: auto-close the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -93,6 +93,7 @@ | |||
private boolean immutablePartitions; | |||
private Optional<InsertExistingPartitionsBehavior> insertExistingPartitionsBehavior = Optional.empty(); | |||
private boolean createEmptyBucketFiles; | |||
private boolean deleteSchemaLocationsFallback = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why true
as default. You probably talked this through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the current behavior, so it's less likely to cause problems for anyone upgrading. But it's also the less-safe behavior for someone who isn't familiar with Trino who might , especially if they aren't very familiar with it. I think we did discussed it last week, but maybe we should change it?
What do you think, @findepi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of the change is "do not delete those random files when deleting schema directory".
false
feels more appropriate in case we cannot verify whether there are any files in there.
let's also add code comment discussion the rationale behind the choice we put in the default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched the default and added a comment.
73c0880
to
ec3c615
Compare
Rebased forward, and avoided including |
ec3c615
to
877cc04
Compare
Instead of /catalog/database/.trinoSchema, the database schemas in FileHiveMetastore now go in /catalog/.trinoSchema.database.
877cc04
to
11055a1
Compare
@jirassimok mind the CI |
In SemiTransactionalHiveMetastore, check for files before dropping the schema. Do not request deletion (via HiveMetastore) if files are visible in the schema location. A new config property, hive.delete-schema-locations-fallback, determines the behavior when Trino can't check the file location. False (the default) will not request deletion in that case.
11055a1
to
703098e
Compare
All tests have passed, something is just dangling in this one so it isn't finishing the job. |
This will attempt to reimplement the behavior from #9740 in a way that doesn't break for some configurations.