-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix weird case when s3
/s3Cluster
return incomplete result or exception
#71947
Conversation
This is an automated comment for commit 5aeeec0 with description of existing statuses. It's updated for the latest CI running ✅ Click here to open a full report in a separate page Successful checks
|
Backport? |
7b6223b
to
1c682f1
Compare
ok, will do. first i need to read the code and understand what will break |
117329c
to
cfb5a04
Compare
cfb5a04
to
a82ab36
Compare
SET max_threads = 1; | ||
SET s3_truncate_on_insert = 1; | ||
|
||
INSERT INTO FUNCTION s3(s3_conn, filename='dir1/03271_s3_table_function_asterisk_glob/', format=Parquet) SELECT 0 as num; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#72007 would break this test by no longer including paths with empty file names when using a trailing wildcard (/*
). Is that ok or is this behavior needed for a specific use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't have any use case in mind, it just seems to be the correct behaviour. because s3 is not a fs, object key is a single string that might end on a '/'. initially i was asked by Pete why query with /dir/*
pattern didn't work and it turned out that he had an object with name dir/
, we got it in the List
output, found that its name is empty and also throw away all other objects. so i decided to try to fix both problems - don't give up when we see an "empty file name" and read from such objects too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was solving the same original problem with #72007. :-)
don't give up when we see an "empty file name" and read from such objects too
Makes sense to read from it when there is data in it, but shouldn't we rather skip it if it's empty?
Because Amazon creates such objects when you create a directory in the S3 console and it doesn't really make sense to try to read from such objects (from here):
When you create a folder in Amazon S3, S3 creates a 0-byte object with a key that's set to the folder name that you provided. For example, if you create a folder named photos in your bucket, the Amazon S3 console creates a 0-byte object with the key photos/. The console creates this object to support the idea of folders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx for this reference - i was wondering who creates these empty objects, seemed hard to believe that people really do it manually.
s3
function returns incomplete results3
function returns incomplete result or exception
s3
function returns incomplete result or exceptions3
/s3Cluster
return incomplete result or exception
Hello! Will the backport be made in the LTS/stable versions (from 24.3) ? |
Cherry pick #71947 to 24.10: Fix weird case when `s3`/`s3Cluster` return incomplete result or exception
Backport #71947 to 24.10: Fix weird case when `s3`/`s3Cluster` return incomplete result or exception
Hello! Is it possible to backport in 24.3 - 24.9 also ? They were closed |
Hi! Unfortunately not - there were merge conflicts. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fixed case when
s3
/s3Cluster
functions could return incomplete result or throw an exception. It involved using glob pattern in s3 uri (likepattern/*
) and an empty object should exist with the keypattern/
(such objects automatically created by S3 Console). Also default value for settings3_skip_empty_files
changed fromfalse
totrue
by default.CI Settings (Only check the boxes if you know what you are doing):