-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow dropping Iceberg schema that contains external files #9767
Disallow dropping Iceberg schema that contains external files #9767
Conversation
1153b1a
to
c16e96a
Compare
The tests previously failed because Tempto thought the tests were using HDP2 (from config-default), but the Spark environments used HDP3 instead, which uses a different port, so the To fix this, I've made the Spark environments use the same Tempto configuration as |
d6c81ff
to
3b2c26c
Compare
...er/src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSparkHive.java
Outdated
Show resolved
Hide resolved
3b2c26c
to
1f2ae81
Compare
Shouldn't be merged in current shape -- #9740 (comment) |
1f2ae81
to
e427f53
Compare
e427f53
to
5e53606
Compare
b36273e
to
2e7df7f
Compare
Fixed the tests; this should work properly now. |
...src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSparkIceberg.java
Outdated
Show resolved
Hide resolved
builder.configureContainer(TESTS, dockerContainer -> { | ||
dockerContainer.withCopyFileToContainer( | ||
forHostPath(dockerFiles.getDockerFilesHostPath("conf/tempto/tempto-configuration-for-hive3.yaml")), | ||
CONTAINER_TEMPTO_PROFILE_CONFIG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONTAINER_TEMPTO_PROFILE_CONFIG is deprecated. what's the current way to accomplish this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It lists EnvironmentContainers.configureTempto
as its replacement, but that method doesn't work in this case (because that only works when the tempto configuration has a specific name and is in the environment's config directory).
Quite a few things around the environment configuration need to be refactored, and I avoided changing it as much as possible.
(This is also copied verbatim from EnvSinglenodeHdp3
, so if we change it here, we should probably also change it there.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so CONTAINER_TEMPTO_PROFILE_CONFIG is deprecated but we cannot use the replacement?
was it deprecated too early? cc @kokosing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replacement works in most cases, but there are a few where it doesn't. To address that, we should probably add an overload like configureTempto(Environment.Builder, String)
to specify which file to use rather than taking a file from a ResourceProvider
.
...src/main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSparkIceberg.java
Show resolved
Hide resolved
...g/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestCreateDropSchema.java
Outdated
Show resolved
Hide resolved
...g/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestCreateDropSchema.java
Outdated
Show resolved
Hide resolved
|
||
private void useIceberg() | ||
{ | ||
onTrino().executeQuery("USE iceberg.default"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use fully qualified names instead, but whatever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that at first, but then if I (or any future developer) forgot to qualify the names anywhere, the tests might pass without actually using Iceberg.
It also makes the variable declarations in the tests a little less nice, because you need the unqualified schema name for the default schema location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that at first, but then if I (or any future developer) forgot to qualify the names anywhere, the tests might pass without actually using Iceberg.
we should remove default catalog/schema from tempto configuration here
jdbc_url: jdbc:trino://${databases.presto.host}:${databases.presto.port}/hive/${databases.hive.schema} |
@jirassimok can you work on that, separately?
4a883f2
to
ebe31db
Compare
please add an escape hatch like #10067 |
ebe31db
to
bae9114
Compare
Updated to be based on #10146. Only the last two commits are part of this PR. |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoCatalogFactory.java
Show resolved
Hide resolved
metadataFileSystem.mkdirs(databaseMetadataDirectory); | ||
} | ||
catch (IOException e) { | ||
throw new TrinoException(HIVE_METASTORE_ERROR, "Could not write database", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add the database information in the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would go in #10146, but this was just copied from the other "could not write" errors in the class, which also don't give more detailed messages.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Show resolved
Hide resolved
db6f0bc
to
d5970bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
d5970bb
to
497fa82
Compare
Instead of /catalog/database/.trinoSchema, the database schemas in FileHiveMetastore now go in /catalog/.trinoSchema.database.
497fa82
to
6199026
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
builder.configureContainer(TESTS, dockerContainer -> { | ||
dockerContainer.withCopyFileToContainer( | ||
forHostPath(dockerFiles.getDockerFilesHostPath("conf/tempto/tempto-configuration-for-hive3.yaml")), | ||
CONTAINER_TEMPTO_PROFILE_CONFIG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so CONTAINER_TEMPTO_PROFILE_CONFIG is deprecated but we cannot use the replacement?
was it deprecated too early? cc @kokosing
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TrinoHiveCatalog.java
Show resolved
Hide resolved
.map(Path::new); | ||
|
||
// If we see files in the schema location, don't delete it. | ||
// If we see no files or can't see the location at all, use fallback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we see no files
then we delete the directory
or can't see the location at all, use fallback.
only on this latter case we use fallback
did you mean "when location is not set"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we see no files, delete. If we can't see or if getDatabase
didn't get the location (it always should in practice, even when using default location) use fallback. I'll update the comment.
import static org.assertj.core.api.Assertions.assertThat; | ||
import static org.assertj.core.api.Assertions.fail; | ||
|
||
public class TestCreateDropSchema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should we test this logic with respect to upcoming Iceberg Glue support? (#10151)
No change requested in this PR, but still worth discussing.
cc @jirassimok @losipiuk @jackye1995 @phd3
In SemiTransactionalHiveMetastore, check for files before dropping the schema. Do not request deletion (via HiveMetastore) if files are visible in the schema location. A new config property, hive.delete-schema-locations-fallback, determines the behavior when Trino can't check the file location. False (the default) will not request deletion in that case.
6199026
to
d65ec83
Compare
d65ec83
to
c666720
Compare
merged as ccee7b6, thanks |
Based on #10146
Adds same logic for Iceberg