Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use UUID instead of parent directory name in computing the file prefix #18183

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Jul 7, 2023

Description

In a table with multiple partition columns, it may happen that there exists a file with the same name and the same value for the least significant partition column which would lead to having the same temporary file name prefix.

Discovered while working on #18178

Additional context and related issues

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

In a table with multiple partition columns, it may happen that
there exists a file with the same name and the same value for the
least significant partition column which would lead to having the
same temporary file name prefix.
@@ -587,7 +587,7 @@ public HiveWriter createWriter(Page partitionColumns, int position, OptionalInt
if (sortedWritingTempStagingPathEnabled) {
String stagingPath = sortedWritingTempStagingPath.replace("${USER}", session.getIdentity().getUser());
Location tempPrefix = setSchemeToFileIfAbsent(Location.of(stagingPath));
tempFilePath = tempPrefix.appendPath(".tmp-sort.%s.%s".formatted(path.parentDirectory().fileName(), path.fileName()));
tempFilePath = tempPrefix.appendPath(".tmp-sort.%s.%s".formatted(UUID.randomUUID().toString(), path.fileName()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean to change the else branch that uses parentDirectory().appendPath ie

tempFilePath = path.parentDirectory().appendPath(".tmp-sort." + path.fileName());

perhaps this one needs to be changed as well, however, since path.parentDirectory().fileName() is not well defined .... we should enhance BaseS3AndGlueMetastoreTest tests with sorted tables to guard against potential breakages cc @pajaks (can be for unpartitioned tables only)

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jan 12, 2024
@mosabua
Copy link
Member

mosabua commented Jan 12, 2024

@findepi and @findinpath I assume you are going to continue work on this PR.

@github-actions github-actions bot removed the stale label Jan 15, 2024
Copy link

github-actions bot commented Feb 5, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Feb 5, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Feb 29, 2024
@findinpath findinpath reopened this Mar 1, 2024
@github-actions github-actions bot removed the stale label Mar 4, 2024
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Mar 26, 2024
@mosabua
Copy link
Member

mosabua commented Mar 27, 2024

@findinpath ping?

@github-actions github-actions bot removed the stale label Mar 28, 2024
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label May 10, 2024
@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Jun 3, 2024
@findinpath findinpath closed this Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

3 participants