-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use UUID instead of parent directory name in computing the file prefix #18183
Use UUID instead of parent directory name in computing the file prefix #18183
Conversation
In a table with multiple partition columns, it may happen that there exists a file with the same name and the same value for the least significant partition column which would lead to having the same temporary file name prefix.
@@ -587,7 +587,7 @@ public HiveWriter createWriter(Page partitionColumns, int position, OptionalInt | |||
if (sortedWritingTempStagingPathEnabled) { | |||
String stagingPath = sortedWritingTempStagingPath.replace("${USER}", session.getIdentity().getUser()); | |||
Location tempPrefix = setSchemeToFileIfAbsent(Location.of(stagingPath)); | |||
tempFilePath = tempPrefix.appendPath(".tmp-sort.%s.%s".formatted(path.parentDirectory().fileName(), path.fileName())); | |||
tempFilePath = tempPrefix.appendPath(".tmp-sort.%s.%s".formatted(UUID.randomUUID().toString(), path.fileName())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean to change the else
branch that uses parentDirectory().appendPath
ie
trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveWriterFactory.java
Line 593 in 85beedf
tempFilePath = path.parentDirectory().appendPath(".tmp-sort." + path.fileName()); |
perhaps this one needs to be changed as well, however, since path.parentDirectory().fileName()
is not well defined .... we should enhance BaseS3AndGlueMetastoreTest
tests with sorted tables to guard against potential breakages cc @pajaks (can be for unpartitioned tables only)
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@findepi and @findinpath I assume you are going to continue work on this PR. |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time. |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
@findinpath ping? |
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
Description
In a table with multiple partition columns, it may happen that there exists a file with the same name and the same value for the least significant partition column which would lead to having the same temporary file name prefix.
Discovered while working on #18178
Additional context and related issues
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: