Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handling for root bucket table location in sync_partition_metadata #20090

Merged

Conversation

okayhooni
Copy link
Contributor

@okayhooni okayhooni commented Dec 13, 2023

Description

  • Currently, sync_partition_metadata() procedure cannot handle the table located in the root bucket path, due to bug on the listDirectoryName() logic
  • I know, using table mapped with the root bucket path is not the best practice, but we dumped some AWS logs(like cloudfront logs) into the root path of dedicated log bucket.
  • the same strategy was already in use on trino/lib/trino-filesystem-s3/src/main/java/io/trino/filesystem/s3/S3FileSystem.java

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

Copy link

cla-bot bot commented Dec 13, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions bot added tests:hive hive Hive connector labels Dec 13, 2023
@okayhooni okayhooni changed the title Fix bug in listedDirectoryName to handle table located in the root path of s3 bucket Fix bug in sync_partition_metadata() to handle table located in the root path of s3 bucket Dec 13, 2023
@@ -191,7 +191,7 @@ private static Set<Location> listDirectories(TrinoFileSystem fileSystem, Locatio
private static String listedDirectoryName(Location directory, Location location)
{
String prefix = directory.path();
if (!prefix.endsWith("/")) {
if (!prefix.endsWith("/") && !prefix.equals("")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some test coverage for this change?

Copy link
Contributor Author

@okayhooni okayhooni Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for comment..!

I cannot find any existing test suites for built-in procedures in Hive plugin to add some test case for this hotfix.

Did you mean creating new test module for this procedure and writing all the test cases for it..?
(Is it okay to maintain other hive procedures without test suites..?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have so called product tests that can be used to check the accuracy of sync_partition_metadata procedure

https://github.com/trinodb/trino/blob/33dd20a8c104d358f5b6d38e8a0405f8b0ced944/testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/BaseTestSyncPartitionMetadata.java

You can run the hdfs related product test in this way:

testing/bin/ptl test run --environment multinode  -- -t io.trino.tests.product.hive.TestHdfsSyncPartitionMetadata 

However, you can use for the purpose of testing your changes HiveMinioDataLake utility.
See for example

public void testPartitionedTableExternalLocationOnTopOfTheBucket()
{
String topBucketName = "test-hive-partitioned-top-of-the-bucket-" + randomNameSuffix();
hiveMinioDataLake.getMinio().createBucket(topBucketName);
String tableName = "test_external_location_top_of_the_bucket_" + randomNameSuffix();
assertUpdate(format(
"CREATE TABLE %s (" +
" a_varchar varchar, " +
" pkey integer) " +
"WITH (" +
" external_location='%s'," +
" partitioned_by=ARRAY['pkey'])",
tableName,
format("s3://%s/", topBucketName)));
assertUpdate("INSERT INTO " + tableName + " VALUES ('a', 1) , ('b', 1), ('c', 2), ('d', 2)", 4);
assertQuery("SELECT * FROM " + tableName, "VALUES ('a', 1), ('b',1), ('c', 2), ('d', 2)");
assertUpdate("DELETE FROM " + tableName + " where pkey = 2");
assertQuery("SELECT * FROM " + tableName, "VALUES ('a', 1), ('b',1)");
assertUpdate("DROP TABLE " + tableName);
}

Note that you can make use of io.trino.plugin.hive.containers.HiveMinioDataLake#copyResources to copy files from the test resources to eventually simulate easier your use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing and explaining those..!

I will add some tiny test case on there!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tiny test case following your guides :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

Copy link

cla-bot bot commented Dec 20, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@okayhooni okayhooni requested a review from findinpath December 20, 2023 14:05
@@ -191,7 +191,7 @@ private static Set<Location> listDirectories(TrinoFileSystem fileSystem, Locatio
private static String listedDirectoryName(Location directory, Location location)
{
String prefix = directory.path();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the description of your PR to point out that you're using the same strategy as in

if (!key.isEmpty() && !key.endsWith("/")) {
key += "/";
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't noticed the same strategy was already in use on another module..!

Thanks, I will update the description!

@@ -191,7 +191,7 @@ private static Set<Location> listDirectories(TrinoFileSystem fileSystem, Locatio
private static String listedDirectoryName(Location directory, Location location)
{
String prefix = directory.path();
if (!prefix.endsWith("/")) {
if (!prefix.endsWith("/") && !prefix.equals("")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!prefix.endsWith("/") && !prefix.equals("")) {
if (!prefix.isEmpty() && !prefix.endsWith("/")) {

@@ -191,7 +191,7 @@ private static Set<Location> listDirectories(TrinoFileSystem fileSystem, Locatio
private static String listedDirectoryName(Location directory, Location location)
{
String prefix = directory.path();
if (!prefix.endsWith("/")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix bug in sync_partition_metadata() to handle table located in the root path of s3 bucket -> Add handling for root bucket table location in sync_partition_metadata

Commit title 80 characters. Aim for conciseness.
https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I am done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squash the commits into one pls and use a concise commit title.
You can take inspiration from my previous message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I squashed commits with a message, same as the title of this PR, following your guide! Thanks!

@okayhooni okayhooni changed the title Fix bug in sync_partition_metadata() to handle table located in the root path of s3 bucket Add handling for root bucket table location in sync_partition_metadata Dec 21, 2023
Copy link

cla-bot bot commented Dec 21, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

2 similar comments
Copy link

cla-bot bot commented Dec 21, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Dec 21, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % commit title

@findinpath findinpath requested a review from electrum December 21, 2023 09:32
@findinpath
Copy link
Contributor

@okayhooni i see that the verification/cla-signed is failing.
Pls be so kind to follow https://github.com/trinodb/trino/blob/master/.github/CONTRIBUTING.md

@okayhooni okayhooni force-pushed the hotfix/hive_sync_partition_metadata_proc branch from c5085e8 to 1aa1198 Compare December 25, 2023 09:16
Copy link

cla-bot bot commented Dec 25, 2023

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@okayhooni
Copy link
Contributor Author

@okayhooni i see that the verification/cla-signed is failing. Pls be so kind to follow https://github.com/trinodb/trino/blob/master/.github/CONTRIBUTING.md

@findinpath
I wrote and submitted the CLA to [email protected], just before..!
Thank you for guiding me..! Happy Christmas :)

@findinpath findinpath requested a review from ebyhr January 7, 2024 05:06
@ebyhr
Copy link
Member

ebyhr commented Jan 12, 2024

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Jan 12, 2024
Copy link

cla-bot bot commented Jan 12, 2024

The cla-bot has been summoned, and re-checked this pull request!

@ebyhr ebyhr force-pushed the hotfix/hive_sync_partition_metadata_proc branch from 1aa1198 to 7f6367f Compare January 12, 2024 07:51
@ebyhr ebyhr merged commit ea890c7 into trinodb:master Jan 14, 2024
56 checks passed
@github-actions github-actions bot added this to the 437 milestone Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

3 participants