Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INSERT, DELETE, UPDATE, MERGE query fails when merging into Iceberg table with non-lowercase partitioning column #16622

Closed
arunb2w opened this issue Mar 19, 2023 · 2 comments · Fixed by #16713
Assignees
Labels
bug Something isn't working

Comments

@arunb2w
Copy link

arunb2w commented Mar 19, 2023

Getting Internal when running this merge statement in trino using iceberg connector and glue catalog for partitioned table. For non-partitioned table it works fine.
trino version - 403
connector - iceberg
Stacktrace:

java.lang.NullPointerException: undefined
	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:889)
	at com.google.common.collect.ImmutableList$Builder.add(ImmutableList.java:813)
	at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitMerge(StatementAnalyzer.java:3372)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitMerge(StatementAnalyzer.java:468)
	at io.trino.sql.tree.Merge.accept(Merge.java:100)
	at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
	at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:485)
	at io.trino.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:447)
	at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:79)
	at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:71)
	at io.trino.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:267)
	at io.trino.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:204)
	at io.trino.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:856)
	at io.trino.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:138)
	at io.trino.$gen.Trino_403_amzn_0____20230313_135431_2.call(Unknown Source)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)

Even after updating to 410 version, could see the same issue.
Upon further analyzing, was able to identify the root cause and it seems to be that when the column name used for partitioning is in upper-case it is throwing NPE whereas when created the table with partitioning column in lower case was able to resolve the issue.

Steps to reproduce:

  1. Load the dataset from TPCH in a dataframe to create iceberg table using spark.
  2. Change the case of column name that you want to partition using df = df.withColumn("PARTN_COLUMN", col("partn_column")). No fancy functions, just change the column name to upper case.
  3. Create iceberg table by partitioned using this upper-case column. df.writeTo("maintbl").using("iceberg").partitionedBy("PARTN_COLUMN").createOrReplace()
  4. Run merge query by using this partitioned_column in join condn
    merge into maintbl t using join_tbl s on (t.PARTN_COLUMN = s.join_column) when matched then update ...

Then it will throw the same NPE error whereas if we used column name for partition in lower case itself it will work fine.

@findepi findepi changed the title Partitioned table merge issue using iceberg connector MERGE query fails when merging into Iceberg table with non-lowercase partitioning column Mar 20, 2023
@findepi findepi added the bug Something isn't working label Mar 20, 2023
@findepi
Copy link
Member

findepi commented Mar 20, 2023

cc @djsstarburst

@ebyhr ebyhr self-assigned this Mar 24, 2023
@ebyhr ebyhr changed the title MERGE query fails when merging into Iceberg table with non-lowercase partitioning column INSERT, DELETE, UPDATE, MERGE query fails when merging into Iceberg table with non-lowercase partitioning column Mar 24, 2023
@ebyhr
Copy link
Member

ebyhr commented Mar 24, 2023

I confirmed INSERT, DELETE and UPDATE queries also fail. Going to send a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

3 participants