You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Query 20240609_161836_02951_zpmtj failed: Error opening Iceberg split /path/data/file.parquet (offset=0, length=1660): Cannot invoke "org.apache.parquet.schema.Type$ID.intValue()" because the return value of "org.apache.parquet.schema.PrimitiveType.getId()" is null
io.trino.spi.TrinoException: Error opening Iceberg split /mnt/shavast01_datalake/iceberg/temp/_managedtmp/6p7pl0cwmpdw/iceberg_caches/iceberg-temp-ghjkloyvmfircvttdsyufowgjqtmgzwh/data/file.parquet (offset=0, length=1660): Cannot invoke "org.apache.parquet.schema.Type$ID.intValue()" because the return value of "org.apache.parquet.schema.PrimitiveType.getId()" is null
at io.trino.plugin.iceberg.IcebergPageSourceProvider.createParquetPageSource(IcebergPageSourceProvider.java:1132)
at io.trino.plugin.iceberg.IcebergPageSourceProvider.createDataPageSource(IcebergPageSourceProvider.java:633)
at io.trino.plugin.iceberg.IcebergPageSourceProvider.createPageSource(IcebergPageSourceProvider.java:373)
at io.trino.plugin.iceberg.IcebergPageSourceProvider.createPageSource(IcebergPageSourceProvider.java:265)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:48)
at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:61)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:264)
at io.trino.operator.Driver.processInternal(Driver.java:403)
at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
at io.trino.operator.Driver.tryWithLock(Driver.java:709)
at io.trino.operator.Driver.process(Driver.java:298)
at io.trino.operator.Driver.processForDuration(Driver.java:269)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:191)
at io.trino.$gen.Trino_448____20240607_201000_2.run(Unknown Source)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:192)
at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:168)
at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:155)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.parquet.schema.Type$ID.intValue()" because the return value of "org.apache.parquet.schema.PrimitiveType.getId()" is null
at io.trino.plugin.iceberg.IcebergPageSourceProvider.lambda$getParquetTupleDomain$35(IcebergPageSourceProvider.java:1504)
at com.google.common.collect.CollectCollectors.lambda$toImmutableMap$7(CollectCollectors.java:195)
at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1787)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:556)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:546)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:702)
at io.trino.plugin.iceberg.IcebergPageSourceProvider.getParquetTupleDomain(IcebergPageSourceProvider.java:1504)
at io.trino.plugin.iceberg.IcebergPageSourceProvider.createParquetPageSource(IcebergPageSourceProvider.java:1017)
... 25 more
Context
When
id
for the type is missing from the parquet file, this can cause a query failure on the call:trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSourceProvider.java
Line 937 in 7190f20
Relevant stack trace
Slack discussion
https://trinodb.slack.com/archives/CJ6UC075E/p1717777005901119
Technical notes
The issue reported here is likely related to #19066
Avoid doing the predicate pushdown when any of the descriptors from the parquet schema is missing the id value.
For the corresponding PR - add an integration test to ensure we'll not be dealing with further regressions (either through hive migrate procedure or potentially https://iceberg.apache.org/docs/latest/spark-procedures/#add_files)
The text was updated successfully, but these errors were encountered: