Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TECH DEBT]: Fix ORDER BY precedence in TSQL grammar #1256

Closed
asnare opened this issue Nov 28, 2024 · 2 comments · Fixed by #1349
Closed

[TECH DEBT]: Fix ORDER BY precedence in TSQL grammar #1256

asnare opened this issue Nov 28, 2024 · 2 comments · Fixed by #1349
Assignees
Labels
antlr Changes to any of the ANTLR g4 grammar files. bug Something isn't working sql/tsql

Comments

@asnare
Copy link
Contributor

asnare commented Nov 28, 2024

Current Behavior

The TSQL grammar currently allows for ORDER BY expressions between set operations (INTERSECT, EXCEPT, UNION) and associates them with the intermediate expression.

Expected Behavior

For TSQL the ORDER BY clause is applied after all the set operations: it should not be attached to the immediately preceding expression. That is:

SELECT 10
UNION
SELECT 20
ORDER BY 1

is equivalent to:

(SELECT 10
 UNION
 SELECT 20)
ORDER BY 1

and not this as currently is the case:

(SELECT 10)
UNION
(SELECT 20
 ORDER BY 1)

Reference: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-union-transact-sql?view=sql-server-ver16#c-using-union-of-two-select-statements-with-order-by

Further to this, the following is erroneous:

SELECT 10
ORDER BY 1
UNION
SELECT 10
ORDER BY 2

The documentation refers to this as incorrect which is a little ambiguous: it's open to read that rather than being an error it could be accepted but pointless. Testing shows that it is treated as an error:

select 10
order by 1
union
select 20
order by 1 desc

yields

Failed to execute query. Error: Incorrect syntax near the keyword 'union'.

Functional Test

This functional test should pass if ORDER BY has been implemented correctly.

--
-- ORDER BY appears at the end of a sequence of set operations, and applies to the result of the set operation.
--
-- tsql sql:

SELECT 1
UNION
SELECT 2

ORDER BY 1 DESC; -- Applies to the complete UNION, not just the final expression.

-- databricks sql:
((SELECT 1)
 UNION
 (SELECT 2))
ORDER BY 1 DESC;
@asnare asnare added bug Something isn't working antlr Changes to any of the ANTLR g4 grammar files. labels Nov 28, 2024
@asnare
Copy link
Contributor Author

asnare commented Nov 28, 2024

Some playing around to explore some of the ORDER BY behaviour…

(SELECT 10 ORDER BY 1) UNION SELECT 20

Failed to execute query. Error: Incorrect syntax near the keyword 'UNION'.

Let's try a subquery:

SELECT * FROM (SELECT 10 ORDER BY 1) UNION SELECT 20

Failed to execute query. Error: The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

Drop in a TOP and it's still not happy, we're back to the original error:

SELECT * FROM (SELECT TOP(1) 10 ORDER BY 1) UNION SELECT 20

Failed to execute query. Error: Incorrect syntax near the keyword 'UNION'.

Extract to a CTE though and it's okay:

WITH x AS (SELECT TOP(1) 10 AS n ORDER BY 1)
SELECT * FROM x UNION SELECT 20
n
10
20

@jimidle
Copy link
Contributor

jimidle commented Nov 28, 2024

Cool. Good research, which will help me rejig the grammar to support teh correct precedence (there basically isn't any right now!).

Just one point to make in case it isn't me that changes the grammar:

We decided very early on to assume that the SQL given to us is syntactically and semantically correct, as we are a transpiler - garbage in, garbage out. So, we don't need to report contextual/semantic errors like the TSQL engine will. Hence we don't try to write grammar rules that tortuously enforces syntax. One should not write grammars like this anyway, we write a grammar that accepts anything that looks like it could be valid, then apply a semantic pass to throw out errors like 'not valid in views' - which we won't do in the transpiler of course as the input code should already be verified accurate.

@asnare asnare self-assigned this Dec 11, 2024
github-merge-queue bot pushed a commit that referenced this issue Dec 13, 2024
This PR updates where we handle `ORDER BY` clauses in TSql: these should
be applied after set operations to the complete sequence, not to each
individual query within a sequence of set operations.

References:

-
https://learn.microsoft.com/en-us/sql/t-sql/queries/select-transact-sql?view=sql-server-ver16#syntax
-
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-union-transact-sql?view=sql-server-ver16#c-using-union-of-two-select-statements-with-order-by

Tested:

 - [x] Updated unit tests
 - [x] Transpiler tests
 - [X] Functional tests

Blocked by #1350.
Resolves #1256.
vil1 pushed a commit that referenced this issue Dec 13, 2024
This PR updates where we handle `ORDER BY` clauses in TSql: these should
be applied after set operations to the complete sequence, not to each
individual query within a sequence of set operations.

References:

-
https://learn.microsoft.com/en-us/sql/t-sql/queries/select-transact-sql?view=sql-server-ver16#syntax
-
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-union-transact-sql?view=sql-server-ver16#c-using-union-of-two-select-statements-with-order-by

Tested:

 - [x] Updated unit tests
 - [x] Transpiler tests
 - [X] Functional tests

Blocked by #1350.
Resolves #1256.
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this issue Jan 2, 2025
This PR updates where we handle `ORDER BY` clauses in TSql: these should
be applied after set operations to the complete sequence, not to each
individual query within a sequence of set operations.

References:

-
https://learn.microsoft.com/en-us/sql/t-sql/queries/select-transact-sql?view=sql-server-ver16#syntax
-
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-union-transact-sql?view=sql-server-ver16#c-using-union-of-two-select-statements-with-order-by

Tested:

 - [x] Updated unit tests
 - [x] Transpiler tests
 - [X] Functional tests

Blocked by databrickslabs#1350.
Resolves databrickslabs#1256.
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this issue Jan 3, 2025
This PR updates where we handle `ORDER BY` clauses in TSql: these should
be applied after set operations to the complete sequence, not to each
individual query within a sequence of set operations.

References:

-
https://learn.microsoft.com/en-us/sql/t-sql/queries/select-transact-sql?view=sql-server-ver16#syntax
-
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-union-transact-sql?view=sql-server-ver16#c-using-union-of-two-select-statements-with-order-by

Tested:

 - [x] Updated unit tests
 - [x] Transpiler tests
 - [X] Functional tests

Blocked by databrickslabs#1350.
Resolves databrickslabs#1256.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
antlr Changes to any of the ANTLR g4 grammar files. bug Something isn't working sql/tsql
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants