Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATERIALIZED VIEW]: java.lang.IllegalArgumentException: Duplicate key [tpcds1t.b] found #10229

Closed
zhengxingmao opened this issue Dec 8, 2021 · 4 comments · Fixed by #10570
Closed
Labels
bug Something isn't working

Comments

@zhengxingmao
Copy link

zhengxingmao commented Dec 8, 2021

  I use trino-365 release version with iceberg connector to select materialized view ,and met a “duplicate key " error ,
  below is the detail info.
  Simply check, the root cause is :
   Map<String, String> tableToSnapshotIdMap = Splitter.on(',').withKeyValueSeparator('=').split(dependsOnTables); 
   in IcebergMetadata.java file at 1069 line.
   There is the table used twice here,and the map key can not be same .

   create table a (a1 int ,b1 int ,c1 int) ;
   create table b (a1 int ,b1 int ,c1 int) ;
   create table c (a1 int ,b1 int ,c1 int) ;

   insert into a values (1,1,1),(2,2,2),(3,3,3);
   insert into b values (1,1,1),(2,2,2),(3,3,3);
   insert into c values (1,1,1),(2,2,2),(3,3,3);

   create materialized view tv with  (partitioning = ARRAY['bucket(a1,64)']) as (
   select a.a1,b.b1,c.c1 from a,b,c where a.a1=b.b1 and a.c1=c.c1 
   union all 
   select c.a1,b.b1,c.c1 from b,c where c.c1=b.b1) ;

   refresh materiaized view tv;
   select * from tv limit 10;
    
    **Here is error stack:**
    **java.lang.IllegalArgumentException: Duplicate key [tpcds1t.b] found.**
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:219)
at com.google.common.base.Splitter$MapSplitter.split(Splitter.java:524)
at io.trino.plugin.iceberg.IcebergMetadata.getMaterializedViewToken(IcebergMetadata.java:1069)
at io.trino.plugin.iceberg.IcebergMetadata.getMaterializedViewFreshness(IcebergMetadata.java:1026)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.getMaterializedViewFreshness(ClassLoaderSafeConnectorMetadata.java:972)
at io.trino.metadata.MetadataManager.getMaterializedViewFreshness(MetadataManager.java:1596)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:1499)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:377)
at io.trino.sql.tree.Table.accept(Table.java:60)
at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:394)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:3377)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:2147)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:377)
at io.trino.sql.tree.QuerySpecification.accept(QuerySpecification.java:185)
at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:394)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:402)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:1365)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:377)
at io.trino.sql.tree.Query.accept(Query.java:107)
at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:394)
at io.trino.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:357)
at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:91)
at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:83)
at io.trino.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:259)
at io.trino.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:187)
at io.trino.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:796)
at io.trino.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:132)
at io.trino.$gen.Trino_365_4_gec2e959_dirty____20211208_020140_2.call(Unknown Source)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
@findepi
Copy link
Member

findepi commented Dec 8, 2021

cc @anjalinorwood

@sopel39 sopel39 added the bug Something isn't working label Dec 8, 2021
@zhengxingmao
Copy link
Author

I'm not sure what the meaning of obtaining table names in this place is. I think there are two ways to solve this problem

  1. Remove duplicates from this string
    or
  2. Replace the map data structure with ArrayListMultimap

@anjalinorwood
Copy link
Member

Removing duplicates would be the right solution. The property is set here: https://github.com/trinodb/trino/blob/master/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L964

This property tracks snapshot-ids of all the tables present in the materialized view definition, captured at the time of last refresh of materialized view. This helps establish freshness of materialized view.

@berndlunghamer
Copy link
Contributor

berndlunghamer commented Dec 22, 2021

We stumbled accross this as well and this happens very often in our analytical queries which can reference the same source table multiple times

Seems like we simply would need to apply additional distinct() on the source table stream before serializing the table list.

However I am not sure if the IcebergTableHandles fed into finishRefreshMaterializedView() are really equal(), the distinct would be sufficient on schema and table name only. Is that assumption correct?

Would you be interested in a pull request for this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
5 participants