Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql (fix): Resolve UNNEST earlier to prevent cyclic references #3312

Merged
merged 2 commits into from
Jan 8, 2024

Conversation

takezoe
Copy link
Member

@takezoe takezoe commented Dec 19, 2023

Case 1

SELECT id, t.name FROM A CROSS JOIN UNNEST (name) AS t (name)

This SQL is valid but fails to resolve with the following error:

wvlet.airframe.sql.SQLError: [SyntaxError] line 1:45 name is ambiguous:
- *name:string <- A.name
- *name:?

Because the output attribute of AliasedRelation (t (name)) is included in the source attributes when resolve UNNEST (name) here:

case r: Relation =>
r.transformUpExpressions { case x: Expression => resolveExpression(context, x, r.inputAttributes) }
}

  • r is CrossJoin
  • r.inputAttributes is left.outputAttributes + right.outputAttributes
  • right is AliasedRelation(Unnest)

I think output attributes of AliasedRelation shouldn't be in the scope when Unnest is resolved but I didn't come up with an easy fix for this issue.

Case 2

SELECT id, t.name FROM A CROSS JOIN UNNEST (A.name) AS t (name)

This SQL is also valid but fails to resolve with the following error:

Found unresolved expressions in:
[sql]
SELECT id, t.name FROM A CROSS JOIN UNNEST (A.name) AS t (name)
[plan]
[Project]: (id:long, name:string, name:string) => (id:long, name:?)
  - *id:long <- A.id
  - UnresolvedAttribute(t.name)
  [CrossJoin]: (id:long, name:string, name:string) => (id:long, name:string, name:string)
    - NaturalJoin
    [TableScan] default.A:  => (id:long, name:string)
    [AliasedRelation]:  => (name:string)
      - ResolvedIdentifier(Id(t))
      [Unnest]:  => (name:string)
        - *A.name:string <- A.name

Maybe this is because the resolved UNNEST column has a qualifier. This case might not be UNNEST specific.

@takezoe takezoe force-pushed the sql-unnest-test-cases branch from c6abedf to 52528d3 Compare December 19, 2023 23:42
@takezoe takezoe force-pushed the sql-unnest-test-cases branch from 52528d3 to 359a608 Compare December 19, 2023 23:44
@xerial
Copy link
Member

xerial commented Dec 19, 2023

I also don't have a clear idea of how we should represent Unnest. Need to check how UNNEST is handled in Trino or SparkSQL.

@takezoe
Copy link
Member Author

takezoe commented Jan 7, 2024

  • Spark doesn't seem to support UNNEST (It has EXPLODE function instead)
  • Trino seems to create an analysis for UNNEST first. This prevents cyclic references in UNNEST expression

Copy link

codecov bot commented Jan 7, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (09287a3) 82.71% compared to head (8a82fd8) 82.60%.
Report is 7 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3312      +/-   ##
==========================================
- Coverage   82.71%   82.60%   -0.11%     
==========================================
  Files         355      355              
  Lines       14836    14861      +25     
  Branches     2510     2537      +27     
==========================================
+ Hits        12271    12276       +5     
- Misses       2565     2585      +20     
Files Coverage Δ
...ala/wvlet/airframe/sql/analyzer/TypeResolver.scala 93.33% <87.50%> (-0.31%) ⬇️

... and 4 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09287a3...8a82fd8. Read the comment docs.

Comment on lines +395 to +396
case SingleColumn(a: Attribute, qualifier, tableAlias, _) if a.resolved =>
a.withQualifier(qualifier).withTableAlias(tableAlias)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following case cannot be resolved without this fix:

select t.id from (select id from A) as t(id)

}

private def resolveUnnest(u: Unnest, j: Join, context: AnalyzerContext): Unnest = {
val resolvedAttributes = j.outputAttributes.filter(_.resolved)
Copy link
Member Author

@takezoe takezoe Jan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluding unresolved attributes is a workaround to prevent cyclic references easily.

@takezoe takezoe changed the title airframe-sql: Add test cases for UNNEST airframe-sql: Resolve UNNEST earlier Jan 7, 2024
@takezoe takezoe changed the title airframe-sql: Resolve UNNEST earlier airframe-sql: Resolve UNNEST earlier to prevent cyclic references Jan 7, 2024
@xerial
Copy link
Member

xerial commented Jan 8, 2024

Ok. Thanks for the explanation. I understand the fix

@xerial xerial merged commit e76a7b1 into wvlet:main Jan 8, 2024
17 checks passed
@xerial xerial changed the title airframe-sql: Resolve UNNEST earlier to prevent cyclic references sql (fix): Resolve UNNEST earlier to prevent cyclic references Jan 8, 2024
@xerial xerial added the bug label Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants