Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow dropping Iceberg schema that contains external files #9767

Conversation

jirassimok
Copy link
Member

@jirassimok jirassimok commented Oct 26, 2021

Based on #10146
Adds same logic for Iceberg

@cla-bot cla-bot bot added the cla-signed label Oct 26, 2021
@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch 3 times, most recently from 1153b1a to c16e96a Compare October 28, 2021 20:41
@jirassimok
Copy link
Member Author

jirassimok commented Oct 28, 2021

The tests previously failed because Tempto thought the tests were using HDP2 (from config-default), but the Spark environments used HDP3 instead, which uses a different port, so the HdfsClient was unable to operate correctly.

To fix this, I've made the Spark environments use the same Tempto configuration as singlenode-hdp3.

@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch 5 times, most recently from d6c81ff to 3b2c26c Compare October 28, 2021 21:19
@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch from 3b2c26c to 1f2ae81 Compare October 29, 2021 21:22
@findepi
Copy link
Member

findepi commented Oct 30, 2021

Shouldn't be merged in current shape -- #9740 (comment)

@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch from 1f2ae81 to e427f53 Compare November 1, 2021 13:23
@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch from e427f53 to 5e53606 Compare November 17, 2021 23:26
@jirassimok jirassimok marked this pull request as ready for review November 22, 2021 14:04
@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch 4 times, most recently from b36273e to 2e7df7f Compare November 24, 2021 17:55
@jirassimok
Copy link
Member Author

Fixed the tests; this should work properly now.

builder.configureContainer(TESTS, dockerContainer -> {
dockerContainer.withCopyFileToContainer(
forHostPath(dockerFiles.getDockerFilesHostPath("conf/tempto/tempto-configuration-for-hive3.yaml")),
CONTAINER_TEMPTO_PROFILE_CONFIG);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTAINER_TEMPTO_PROFILE_CONFIG is deprecated. what's the current way to accomplish this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It lists EnvironmentContainers.configureTempto as its replacement, but that method doesn't work in this case (because that only works when the tempto configuration has a specific name and is in the environment's config directory).

Quite a few things around the environment configuration need to be refactored, and I avoided changing it as much as possible.

(This is also copied verbatim from EnvSinglenodeHdp3, so if we change it here, we should probably also change it there.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so CONTAINER_TEMPTO_PROFILE_CONFIG is deprecated but we cannot use the replacement?
was it deprecated too early? cc @kokosing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replacement works in most cases, but there are a few where it doesn't. To address that, we should probably add an overload like configureTempto(Environment.Builder, String) to specify which file to use rather than taking a file from a ResourceProvider.


private void useIceberg()
{
onTrino().executeQuery("USE iceberg.default");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use fully qualified names instead, but whatever

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that at first, but then if I (or any future developer) forgot to qualify the names anywhere, the tests might pass without actually using Iceberg.

It also makes the variable declarations in the tests a little less nice, because you need the unqualified schema name for the default schema location.

Copy link
Member

@findepi findepi Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that at first, but then if I (or any future developer) forgot to qualify the names anywhere, the tests might pass without actually using Iceberg.

we should remove default catalog/schema from tempto configuration here

jdbc_url: jdbc:trino://${databases.presto.host}:${databases.presto.port}/hive/${databases.hive.schema}

@jirassimok can you work on that, separately?

@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch 2 times, most recently from 4a883f2 to ebe31db Compare November 24, 2021 20:07
@findepi
Copy link
Member

findepi commented Nov 25, 2021

please add an escape hatch like #10067

@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch from ebe31db to bae9114 Compare December 14, 2021 15:50
@jirassimok
Copy link
Member Author

Updated to be based on #10146. Only the last two commits are part of this PR.

metadataFileSystem.mkdirs(databaseMetadataDirectory);
}
catch (IOException e) {
throw new TrinoException(HIVE_METASTORE_ERROR, "Could not write database", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add the database information in the exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would go in #10146, but this was just copied from the other "could not write" errors in the class, which also don't give more detailed messages.

@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch from db6f0bc to d5970bb Compare December 15, 2021 19:24
Copy link
Member

@homar homar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch from d5970bb to 497fa82 Compare December 16, 2021 21:46
@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch from 497fa82 to 6199026 Compare December 16, 2021 22:59
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two last commits LGTM, rest is #10146, but not the latest version, so #10146 needs to be merged and this one rebased.

builder.configureContainer(TESTS, dockerContainer -> {
dockerContainer.withCopyFileToContainer(
forHostPath(dockerFiles.getDockerFilesHostPath("conf/tempto/tempto-configuration-for-hive3.yaml")),
CONTAINER_TEMPTO_PROFILE_CONFIG);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so CONTAINER_TEMPTO_PROFILE_CONFIG is deprecated but we cannot use the replacement?
was it deprecated too early? cc @kokosing

.map(Path::new);

// If we see files in the schema location, don't delete it.
// If we see no files or can't see the location at all, use fallback.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we see no files

then we delete the directory

or can't see the location at all, use fallback.

only on this latter case we use fallback

did you mean "when location is not set"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we see no files, delete. If we can't see or if getDatabase didn't get the location (it always should in practice, even when using default location) use fallback. I'll update the comment.

import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.fail;

public class TestCreateDropSchema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we test this logic with respect to upcoming Iceberg Glue support? (#10151)
No change requested in this PR, but still worth discussing.
cc @jirassimok @losipiuk @jackye1995 @phd3

@findepi findepi changed the title Disallow dropping Iceberg schemas that contain external files Disallow dropping Iceberg schema that contains external files Dec 17, 2021
In SemiTransactionalHiveMetastore, check for files before dropping the
schema. Do not request deletion (via HiveMetastore) if files are
visible in the schema location.

A new config property, hive.delete-schema-locations-fallback,
determines the behavior when Trino can't check the file location.
False (the default) will not request deletion in that case.
@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch from 6199026 to d65ec83 Compare December 17, 2021 15:08
@jirassimok jirassimok force-pushed the dont-drop-iceberg-schemas-with-files branch from d65ec83 to c666720 Compare December 17, 2021 19:07
@findepi
Copy link
Member

findepi commented Dec 20, 2021

merged as ccee7b6, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants