-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure when Hive partition value contains +
and hive.recursive-directories
is enabled
#18149
Comments
I tried to reproduce this using
I did not check with real S3. |
Tested with the latest
|
Okay thank you for checking. |
@JulianGoede pls add the full stack trace of the issue. |
i don't think the value is changed by hive connector (the examples above show that it was faithfully preserved), so it's whatever |
Closing for now. @JulianGoede please reopen with new information. |
Hi again, I just retried the queries from @findinpath (now on trino v421) but it threw an exception nevertheless. Here again, the set of queries: trino:temp> CREATE TABLE localhive.temp.plus_error (
-> x varchar,
-> code varchar
-> )
-> WITH (
-> external_location = 's3a://{bucket}/tmp/trino_temp_schema_stage/plus_error',
-> format = 'ORC',
-> partitioned_by = ARRAY['code']
-> )
-> ;
CREATE TABLE
trino:temp> insert into plus_error values ('foo', 'foo+bar');
INSERT: 1 row
Query 20230707_083158_00079_72z5g, FINISHED, 2 nodes
http://trino.atv-stage.svc.k8s.local/ui/query.html?20230707_083158_00079_72z5g
Splits: 38 total, 38 done (100.00%)
CPU Time: 0.0s total, 0 rows/s, 0B/s, 25% active
Per Node: 0.0 parallelism, 0 rows/s, 0B/s
Parallelism: 0.1
Peak Memory: 2.52KB
0.37 [0 rows, 0B] [0 rows/s, 0B/s]
trino:temp> select * from plus_error;
Query 20230707_083207_00080_72z5g failed: path s3a://{bucket}/tmp/trino_temp_schema_stage/plus_error/code=foo bar/20230707_083158_00079_72z5g_bfe2bfc8-c64e-49a9-9755-fbe0d2e06d39 does not start with prefix s3a://{bucket}/tmp/trino_temp_schema_stage/plus_error/code=foo+bar
io.trino.spi.TrinoException: path s3a://{bucket}/tmp/trino_temp_schema_stage/plus_error/code=foo bar/20230707_083158_00079_72z5g_bfe2bfc8-c64e-49a9-9755-fbe0d2e06d39 does not start with prefix s3a://{bucket}/tmp/trino_temp_schema_stage/plus_error/code=foo+bar
at io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:318)
at io.trino.plugin.hive.util.ResumableTasks$1.run(ResumableTasks.java:38)
at io.trino.$gen.Trino_421____20230707_080001_2.run(Unknown Source)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.IllegalArgumentException: path s3a://{bucket}/tmp/trino_temp_schema_stage/plus_error/code=foo bar/20230707_083158_00079_72z5g_bfe2bfc8-c64e-49a9-9755-fbe0d2e06d39 does not start with prefix s3a://{bucket}/tmp/trino_temp_schema_stage/plus_error/code=foo+bar
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:445)
at io.trino.plugin.hive.fs.HiveFileIterator.isHiddenOrWithinHiddenParentDirectory(HiveFileIterator.java:127)
at io.trino.plugin.hive.fs.HiveFileIterator.computeNext(HiveFileIterator.java:83)
at io.trino.plugin.hive.fs.HiveFileIterator.computeNext(HiveFileIterator.java:39)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1855)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:292)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:298)
at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at io.trino.plugin.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:405)
at io.trino.plugin.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:311)
... 6 more
When I look into s3 it correctly wrote the object with the aws s3 ls {bucket}/tmp/trino_temp_schema_stage/plus_error/
PRE code=foo+bar/ |
@JulianGoede thanks for providing more info. Especially the stacktrace is useful, since now i see it's related to |
+
and hive.recursive-directories
is enabled
@findinpath the exception is because we have some trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/fs/HiveFileIterator.java Line 126 in 58f9a52
I don't know why it's there, and there is no comment explaining it, so intuitively we should be good just removing it. |
Marking this as |
#18167 might fix this. |
After upgrading from trino 417 to 420 querying tables
with url_encoded partition values results in an
java.lang.IllegalArgumentException
intrino-hive/src/main/java/io/trino/plugin/hive/fs/HiveFileIterator.java
:path <eradicated>/foo/part=Erw. 40%252B/20230706_124818_00867_wqv8v_06d14113-abfa-4298-8746-816dc7818928 does not start with prefix <eradicated>/foo/part=Erw.+40%252B
Here is a minimal setup to reproduce this error (hive-connector with s3 storage):
The text was updated successfully, but these errors were encountered: