Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TECH DEBT]: Handle the precedence of the INTERSECT set operation in the grammar(s) rather than in code #1255

Closed
2 tasks done
asnare opened this issue Nov 28, 2024 · 0 comments · Fixed by #1288
Closed
2 tasks done
Assignees
Labels
antlr Changes to any of the ANTLR g4 grammar files. feat/ir everything related to abstract syntax trees sql/snowflake sql/tsql tech debt design flaws and other cascading effects

Comments

@asnare
Copy link
Contributor

asnare commented Nov 28, 2024

Current Behavior

As part of #1227 it became clear that our grammars for set operations treat INTERSECT, UNION and EXCEPT as equivalent. However it turns out that INTERSECT has higher precedence than the others. Although this can be dealt with in code during conversion to the IR, it would be better if this was reflected by the grammar (which also simplifies the conversion to IR).

Expected Behavior

The grammars should treat INTERSECT as having higher precedence than UNION and EXCEPT.

The rules of precedence for set operations are:

  1. Parentheses
  2. INTERSECT
  3. EXCEPT and UNION ordered from left-to-right.

Given that INTERSECT is commutative, ordering is unimportant.

Needs to be fixed in:

References:

Relevant grammar rules:

setOperators
// TODO: Handle INTERSECT precedence in the grammar; it has higher precedence than EXCEPT and UNION ALL.
// Reference: https://docs.snowflake.com/en/sql-reference/operators-query#:~:text=precedence
: (UNION ALL? | EXCEPT | MINUS_ | INTERSECT) selectStatement //EXCEPT and MINUS have same SQL meaning
;

sqlUnion
// TODO: Handle INTERSECT precedence in the grammar; it has higher precedence than EXCEPT and UNION ALL.
// Reference: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql?view=sql-server-ver16#:~:text=following%20precedence
: (UNION ALL? | EXCEPT | INTERSECT) (querySpecification | (LPAREN queryExpression RPAREN))
;

Changing the grammar will mean refactoring this code:

@tailrec
private[this] def buildIntersectOperations(
lhs: ir.LogicalPlan,
remainingContexts: Seq[TSqlParser.SqlUnionContext]): ir.LogicalPlan = {
/*
* The sqlUnion* rule needs to be handled carefully because the grammar does not completely implement the
* precedence rules for set operations:
* 1. Brackets.
* 2. INTERSECT
* 3. UNION and EXCEPT, from left to right.
*
* Specifically the grammar treats INTERSECT as having the same precedence as UNION and EXCEPT, so we need to
* handle that specially. (Brackets and left-right precedence is handled by the grammar.)
*/
// Start by finding the contexts up to the first INTERSECT op (if any), and gather them into a plan covering the LHS
// of the INTERSECT op.
val (beforeFirstIntersect, fromFirstIntersect) = remainingContexts.span(_.INTERSECT() == null)
val gatheredLhs = beforeFirstIntersect.foldLeft(lhs)(buildSetOperation)
// If there is no INTERSECT operation, nothing further to do.
if (fromFirstIntersect.isEmpty) {
gatheredLhs
} else {
// Before proceeding, we need to account for subsequent INTERSECT operations: to preserve left-to-right
// precedence we can only immediately handle up to the next INTERSECT operation, if any.
val (beforeNextIntersect, fromNextIntersect) = fromFirstIntersect.tail.span(_.INTERSECT() == null)
val thisLogicalPlan = thisSetOperation(fromFirstIntersect.head)
val gatheredRhs = beforeNextIntersect.foldLeft(thisLogicalPlan)(buildSetOperation)
val thisOp = ir.SetOperation(
gatheredLhs,
gatheredRhs,
ir.IntersectSetOp,
is_all = false,
by_name = false,
allow_missing_columns = false)
buildIntersectOperations(thisOp, fromNextIntersect)
}
}
private[this] def thisSetOperation(ctx: TSqlParser.SqlUnionContext): ir.LogicalPlan = {
val rhsContext = ctx match {
case qs if qs.querySpecification() != null => qs.querySpecification()
case qe if qe.queryExpression() != null => qe.queryExpression()
}
rhsContext.accept(this)
}
private[this] def buildSetOperation(lhs: ir.LogicalPlan, ctx: TSqlParser.SqlUnionContext): ir.LogicalPlan = {
val setOp = ctx match {
case u if u.UNION() != null => ir.UnionSetOp
case e if e.EXCEPT() != null => ir.ExceptSetOp
}
val isAll = ctx.ALL() != null
val rhs = thisSetOperation(ctx)
ir.SetOperation(lhs, rhs, setOp, is_all = isAll, by_name = false, allow_missing_columns = false)
}

@asnare asnare added feat/ir everything related to abstract syntax trees tech debt design flaws and other cascading effects antlr Changes to any of the ANTLR g4 grammar files. labels Nov 28, 2024
@asnare asnare self-assigned this Dec 2, 2024
github-merge-queue bot pushed a commit that referenced this issue Dec 4, 2024
)

This PR updates the TSQL grammar and accompanying IR processing so that
the precedence of `INTERSECT` is handled by the grammar instead of
during transportation to our IR.

Relates #1255.
@asnare asnare added bug Something isn't working tech debt design flaws and other cascading effects and removed tech debt design flaws and other cascading effects bug Something isn't working labels Dec 5, 2024
github-merge-queue bot pushed a commit that referenced this issue Dec 9, 2024
…rations (#1288)

This PR updates the Snowflake processing so that the precedence of
`INTERSECT` is handled correctly. At present the set operations
(`UNION`, `EXCEPT` and `INTERSECT`) are all handled left-to-right with
equal precedence. However this is incorrect, Snowflake semantics
are[^1], from highest to lowest:
1. Parenthesis
2. `INTERSECT`
3. `UNION [ALL]` and `MINUS`/`EXCEPT`

At the same level operators are processed from left to right.

Resolves #1255.

Tests coverage:
 - Unit tests.
 - Transpiler tests.
 - Functional tests.

Incidental changes include factoring out some of the tests that we run
against TSQL so that the same tests can run against Snowflake.

[^1]: Reference:
https://docs.snowflake.com/en/sql-reference/operators-query
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this issue Jan 2, 2025
…rations (databrickslabs#1288)

This PR updates the Snowflake processing so that the precedence of
`INTERSECT` is handled correctly. At present the set operations
(`UNION`, `EXCEPT` and `INTERSECT`) are all handled left-to-right with
equal precedence. However this is incorrect, Snowflake semantics
are[^1], from highest to lowest:
1. Parenthesis
2. `INTERSECT`
3. `UNION [ALL]` and `MINUS`/`EXCEPT`

At the same level operators are processed from left to right.

Resolves databrickslabs#1255.

Tests coverage:
 - Unit tests.
 - Transpiler tests.
 - Functional tests.

Incidental changes include factoring out some of the tests that we run
against TSQL so that the same tests can run against Snowflake.

[^1]: Reference:
https://docs.snowflake.com/en/sql-reference/operators-query
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this issue Jan 3, 2025
…rations (databrickslabs#1288)

This PR updates the Snowflake processing so that the precedence of
`INTERSECT` is handled correctly. At present the set operations
(`UNION`, `EXCEPT` and `INTERSECT`) are all handled left-to-right with
equal precedence. However this is incorrect, Snowflake semantics
are[^1], from highest to lowest:
1. Parenthesis
2. `INTERSECT`
3. `UNION [ALL]` and `MINUS`/`EXCEPT`

At the same level operators are processed from left to right.

Resolves databrickslabs#1255.

Tests coverage:
 - Unit tests.
 - Transpiler tests.
 - Functional tests.

Incidental changes include factoring out some of the tests that we run
against TSQL so that the same tests can run against Snowflake.

[^1]: Reference:
https://docs.snowflake.com/en/sql-reference/operators-query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
antlr Changes to any of the ANTLR g4 grammar files. feat/ir everything related to abstract syntax trees sql/snowflake sql/tsql tech debt design flaws and other cascading effects
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant