-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create external table location for Hive #17920
Create external table location for Hive #17920
Conversation
a7e603c
to
a8a7670
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. nitpick comments only.
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
a8a7670
to
12b7e34
Compare
Big thanks, all comments addressed |
I would recommend reading #1277. I don't think we want to allow creating the directory with external_location table property. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the commit message is separated into two lines. Could you please make it in one line?
12b7e34
to
ad55aa3
Compare
need a way to disable this for following reasons:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need config to prevent
For this there is This change does allow a user to create a table pointing to a non-existent location though because the directory would get auto-created. Also I agree with Yuya on that we shouldn't add this. Hive doesn't allow this for example and also because "Create external table target location, if it's not exists." is what a managed table is - not an external table. External tables are supposed to point to a table which already exists but isn't registered. And for allowing arbitrary location for managed tables see concerns in the issue and the two PRs linked from that issue that Yuya has shared. |
For me this change is more about, synchronise behaviour with s3:
As far as I understand on s3 we don't have abstraction as directory, so if for example someone wants to create external table, he could just provide any non existing path and we even don't check it (exists or not exists), because this path will be automatically "created" by s3 (path will be just part of the key). So I'd like to give the same option for HDFS, if we want external table just give trino a path and we will create it. |
I mean, that I could not find code which actually check that table exists on provided external_path, it's just check that path exists (and only for non s3 file systems). |
Yes and this is good option to allow create managed table with location provided by the user. So as soon, as it will be implemented and merged, it will eliminate current change, but mean time current change just synchronise behaviours between s3 and hdfs. |
Last time when I checked with Hive - if create an external table pointing to non-existing directory, hive would create an empty directory #1277 is about setting the location of the manged table while this PR is about external table - I'm not sure if it would help us here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have additional PT coverage where we check the permission of the directory for external tables ? We do have a similar test for hdfs-impersonation
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
ded99ad
to
1a8c89a
Compare
so we should introduce smth like |
didn't find how to create directory using TrinoFileSystem, or we should add method like |
@electrum Please correct me if I am wrong, this definition is derived from hive right, so if hive supports creating an empty directory if it doesn't exist shouldn't (for an external table) we support the same. Today we do support inserting data into a new partition based on
Even in this PR we do hide them behind a flag which is disabled by default. |
fffebe7
to
89246ff
Compare
Discussed offline with @electrum , now external location will be created if |
89246ff
to
5442e70
Compare
a99dad3
to
39aa317
Compare
/test-with-secrets sha=39aa31743ed6cd83b0fcd785e22c43f450783ceb |
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5961189431 |
39aa317
to
883ae60
Compare
/test-with-secrets sha=883ae60c59bafb1f4d7dfb6dafb0ad8db2e1c602 |
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5961868678 |
There were a few flaky failures, re-ran them, now the pipeline is green |
Thanks for working on this. |
@@ -36,6 +36,7 @@ public List<SuiteTestRun> getTestRuns(EnvironmentConfig config) | |||
return ImmutableList.of( | |||
testOnEnvironment(EnvMultinode.class) | |||
.withGroups("configured_features", "hdfs_no_impersonation") | |||
.withExcludedTests("io.trino.tests.product.TestImpersonation.testExternalLocationTableCreationSuccess") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this exclusion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be honest, don't remember any specific reason, need to run and check this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood. please recover this info & capture it as a code comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understood. please recover this info & capture it as a code comment.
got it I didn't want enable hive.non-managed-table-writes-enabled
for whole EnvMultinode
environment to break other tests,
which is required for this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
Create directory structure if it's not exists.
external location will be created if
writesToNonManagedTablesEnabled
flag is setThe same behaviour is on pure Hive as mentioned in #17920 (comment)
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:
With current change - external location will be created for Hive tables if flag hive.non-managed-table-writes-enabled is set,
otherwise exception will raised as it was before.