Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSQL: Adds tests and support for SELECT OPTION(...) generation #755

Merged
merged 1 commit into from
Aug 2, 2024

Conversation

jimidle
Copy link
Contributor

@jimidle jimidle commented Aug 1, 2024

Here, we add code generation support for TSQL SELECT ... OPTION(...) clause.

Any query hints supplied with a SELECT statement are now generated as comments in the output code, in case they may be useful in assessing query performance after remorph transpilation. As in the following sample:

SELECT * FROM t
            OPTION (
            MAXRECURSION 10,
            SOMETHING ON,
            SOMETHINGELSE OFF,
            SOMEOTHER AUTO,
            SOMEstrOpt = 'STRINGOPTION')

Transpiles to:

/*
   The following statement was originally given the following OPTIONS:

    Expression options:

     MAXRECURSION = 10

    String options:

     SOMESTROPT = 'STRINGOPTION'

    Boolean options:

     SOMETHING ON
     SOMETHINGELSE OFF

    Auto options:

     SOMEOTHER AUTO


 */
SELECT * FROM t

@jimidle jimidle requested review from nfx and vil1 August 1, 2024 15:22
Copy link

github-actions bot commented Aug 1, 2024

Coverage tests results

401 tests  ±0   109 ✅ ±0   4s ⏱️ ±0s
  2 suites ±0     0 💤 ±0 
  2 files   ±0   292 ❌ ±0 

For more details on these failures, see this check.

Results for commit 46007e2. ± Comparison against base commit 1663d47.

@jimidle jimidle marked this pull request as ready for review August 1, 2024 15:26
@jimidle jimidle requested a review from a team as a code owner August 1, 2024 15:26
Copy link
Contributor

@vil1 vil1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nfx nfx merged commit 6b2f237 into main Aug 2, 2024
7 checks passed
@nfx nfx deleted the feature/genselect branch August 2, 2024 12:02
nfx added a commit that referenced this pull request Aug 2, 2024
* Aggregate Queries Reconciliation ([#740](#740)). This release introduces several changes to enhance the functionality of the project, including the implementation of Aggregate Queries Reconciliation, addressing issue [#503](#503). A new property, `aggregates`, has been added to the base class of the query builder module to support aggregate queries reconciliation. A `generate_final_reconcile_aggregate_output` function has been added to generate the final reconcile output for aggregate queries. A new SQL file creates a table called `aggregate_details` to store details about aggregate reconciles, and a new column, `operation_name`, has been added to the `main` table in the `installation` reconciliation query. Additionally, new classes and methods have been introduced for handling aggregate queries and their reconciliation, and new SQL tables and columns have been created for storing and managing rules for aggregating data in the context of query reconciliation. Unit tests have been added to ensure the proper functioning of aggregate queries reconciliation and reconcile aggregate data in the context of missing records.
* Generate GROUP BY / PIVOT ([#747](#747)). The LogicalPlanGenerator class in the remorph library has been updated to support generating GROUP BY and PIVOT clauses for SQL queries. A new private method, "aggregate", has been added to handle two types of aggregates: GroupBy and Pivot. For GroupBy, it generates a GROUP BY clause with specified grouping expressions. For Pivot, it generates a PIVOT clause where the specified column is used as the pivot column and the specified values are used as the pivot values, compatible with Spark SQL. If the aggregate type is unsupported, a TranspileException is thrown. Additionally, new test cases have been introduced for the LogicalPlanGenerator class in the com.databricks.labs.remorph.generators.sql package to support testing the transpilation of Aggregate expressions with GROUP BY and PIVOT clauses, ensuring proper handling and transpilation of these expressions.
* Implement error strategy for Snowflake parsing and use error strategy for all parser instances ([#760](#760)). In this release, we have developed an error strategy specifically for Snowflake parsing that translates raw token names and parser rules into more user-friendly SQL error messages. This strategy is applied consistently across all parser instances, ensuring a unified error handling experience. Additionally, we have refined the DBL_DOLLAR rule in the SnowflakeLexer grammar to handle escaped dollar signs correctly. These updates improve the accuracy and readability of error messages for SQL authors, regardless of the parsing tool or transpiler used. Furthermore, we have updated the TSQL parsing error strategy to match the new Snowflake error strategy implementation, providing a consistent error handling experience across dialects.
* Incremental improvement to error messages - article selection ([#711](#711)). In this release, we have implemented an incremental improvement to the error messages generated during T-SQL code parsing. This change introduces a new private method, `articleFor`, which determines whether to use `a` or `an` in the generated messages based on the first letter of the following word. The `generateMessage` method has been updated to use this new method when constructing the initial error message and subsequent messages when there are multiple expected tokens. This improvement ensures consistent use of articles `a` or `an` in the error messages, enhancing their readability for software engineers working with T-SQL code.
* TSQL: Adds tests and support for SELECT OPTION(...) generation ([#755](#755)). In this release, we have added support for generating code for the TSQL `SELECT ... OPTION(...)` clause in the codebase. This new feature includes the ability to transpile any query hints supplied with a SELECT statement as comments in the output code, allowing for easier assessment of query performance after transpilation. The OPTION clause is now generated as comments, including MAXRECURSION, string options, boolean options, and auto options. Additionally, we have added new tests and updated the TSqlAstBuilderSpec test class with new and updated test cases to cover the new functionality. The implementation is focused on generating code for the OPTION clause, and does not affect the actual execution of the query. The changes are limited to the ExpressionGenerator class and its associated methods, and the TSqlRelationBuilder class, without affecting other parts of the codebase.
* TSQL: IR implementation of MERGE ([#719](#719)). The open-source library has been updated to include a complete implementation of the TSQL MERGE statement's IR (Intermediate Representation), bringing it in line with Spark SQL. The `LogicalPlanGenerator` class now includes a `generateMerge` method, which generates the SQL code for the MERGE statement, taking a `MergeIntoTable` object containing the target and source tables, merge condition, and merge actions as input. The `MergeIntoTable` class has been added as a case class to represent the logical plan of the MERGE INTO command and extends the `Modification` trait. The `LogicalPlanGenerator` class also includes a new `generateWithOptions` method, which generates SQL code for the WITH OPTIONS clause, taking a `WithOptions` object containing the input and options as children. Additionally, the `TSqlRelationBuilder` class has been updated to handle the MERGE statement's parsing, introducing new methods and updating existing ones, such as `visitMerge`. The `TSqlToDatabricksTranspiler` class has been updated to include support for the TSQL MERGE statement, and the `ExpressionGenerator` class has new tests for options, columns, and arithmetic expressions. A new optimization rule, `TrapInsertDefaultsAction`, has been added to handle the behavior of the DEFAULT keyword during INSERT statements. The commit also includes test cases for the `MergeIntoTable` logical operator and the T-SQL merge statement in the `TSqlAstBuilderSpec`.
@nfx nfx mentioned this pull request Aug 2, 2024
nfx added a commit that referenced this pull request Aug 2, 2024
* Aggregate Queries Reconciliation
([#740](#740)). This
release introduces several changes to enhance the functionality of the
project, including the implementation of Aggregate Queries
Reconciliation, addressing issue
[#503](#503). A new
property, `aggregates`, has been added to the base class of the query
builder module to support aggregate queries reconciliation. A
`generate_final_reconcile_aggregate_output` function has been added to
generate the final reconcile output for aggregate queries. A new SQL
file creates a table called `aggregate_details` to store details about
aggregate reconciles, and a new column, `operation_name`, has been added
to the `main` table in the `installation` reconciliation query.
Additionally, new classes and methods have been introduced for handling
aggregate queries and their reconciliation, and new SQL tables and
columns have been created for storing and managing rules for aggregating
data in the context of query reconciliation. Unit tests have been added
to ensure the proper functioning of aggregate queries reconciliation and
reconcile aggregate data in the context of missing records.
* Generate GROUP BY / PIVOT
([#747](#747)). The
LogicalPlanGenerator class in the remorph library has been updated to
support generating GROUP BY and PIVOT clauses for SQL queries. A new
private method, "aggregate", has been added to handle two types of
aggregates: GroupBy and Pivot. For GroupBy, it generates a GROUP BY
clause with specified grouping expressions. For Pivot, it generates a
PIVOT clause where the specified column is used as the pivot column and
the specified values are used as the pivot values, compatible with Spark
SQL. If the aggregate type is unsupported, a TranspileException is
thrown. Additionally, new test cases have been introduced for the
LogicalPlanGenerator class in the
com.databricks.labs.remorph.generators.sql package to support testing
the transpilation of Aggregate expressions with GROUP BY and PIVOT
clauses, ensuring proper handling and transpilation of these
expressions.
* Implement error strategy for Snowflake parsing and use error strategy
for all parser instances
([#760](#760)). In this
release, we have developed an error strategy specifically for Snowflake
parsing that translates raw token names and parser rules into more
user-friendly SQL error messages. This strategy is applied consistently
across all parser instances, ensuring a unified error handling
experience. Additionally, we have refined the DBL_DOLLAR rule in the
SnowflakeLexer grammar to handle escaped dollar signs correctly. These
updates improve the accuracy and readability of error messages for SQL
authors, regardless of the parsing tool or transpiler used. Furthermore,
we have updated the TSQL parsing error strategy to match the new
Snowflake error strategy implementation, providing a consistent error
handling experience across dialects.
* Incremental improvement to error messages - article selection
([#711](#711)). In this
release, we have implemented an incremental improvement to the error
messages generated during T-SQL code parsing. This change introduces a
new private method, `articleFor`, which determines whether to use `a` or
`an` in the generated messages based on the first letter of the
following word. The `generateMessage` method has been updated to use
this new method when constructing the initial error message and
subsequent messages when there are multiple expected tokens. This
improvement ensures consistent use of articles `a` or `an` in the error
messages, enhancing their readability for software engineers working
with T-SQL code.
* TSQL: Adds tests and support for SELECT OPTION(...) generation
([#755](#755)). In this
release, we have added support for generating code for the TSQL `SELECT
... OPTION(...)` clause in the codebase. This new feature includes the
ability to transpile any query hints supplied with a SELECT statement as
comments in the output code, allowing for easier assessment of query
performance after transpilation. The OPTION clause is now generated as
comments, including MAXRECURSION, string options, boolean options, and
auto options. Additionally, we have added new tests and updated the
TSqlAstBuilderSpec test class with new and updated test cases to cover
the new functionality. The implementation is focused on generating code
for the OPTION clause, and does not affect the actual execution of the
query. The changes are limited to the ExpressionGenerator class and its
associated methods, and the TSqlRelationBuilder class, without affecting
other parts of the codebase.
* TSQL: IR implementation of MERGE
([#719](#719)). The
open-source library has been updated to include a complete
implementation of the TSQL MERGE statement's IR (Intermediate
Representation), bringing it in line with Spark SQL. The
`LogicalPlanGenerator` class now includes a `generateMerge` method,
which generates the SQL code for the MERGE statement, taking a
`MergeIntoTable` object containing the target and source tables, merge
condition, and merge actions as input. The `MergeIntoTable` class has
been added as a case class to represent the logical plan of the MERGE
INTO command and extends the `Modification` trait. The
`LogicalPlanGenerator` class also includes a new `generateWithOptions`
method, which generates SQL code for the WITH OPTIONS clause, taking a
`WithOptions` object containing the input and options as children.
Additionally, the `TSqlRelationBuilder` class has been updated to handle
the MERGE statement's parsing, introducing new methods and updating
existing ones, such as `visitMerge`. The `TSqlToDatabricksTranspiler`
class has been updated to include support for the TSQL MERGE statement,
and the `ExpressionGenerator` class has new tests for options, columns,
and arithmetic expressions. A new optimization rule,
`TrapInsertDefaultsAction`, has been added to handle the behavior of the
DEFAULT keyword during INSERT statements. The commit also includes test
cases for the `MergeIntoTable` logical operator and the T-SQL merge
statement in the `TSqlAstBuilderSpec`.
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this pull request Jan 2, 2025
* Aggregate Queries Reconciliation
([databrickslabs#740](databrickslabs#740)). This
release introduces several changes to enhance the functionality of the
project, including the implementation of Aggregate Queries
Reconciliation, addressing issue
[databrickslabs#503](databrickslabs#503). A new
property, `aggregates`, has been added to the base class of the query
builder module to support aggregate queries reconciliation. A
`generate_final_reconcile_aggregate_output` function has been added to
generate the final reconcile output for aggregate queries. A new SQL
file creates a table called `aggregate_details` to store details about
aggregate reconciles, and a new column, `operation_name`, has been added
to the `main` table in the `installation` reconciliation query.
Additionally, new classes and methods have been introduced for handling
aggregate queries and their reconciliation, and new SQL tables and
columns have been created for storing and managing rules for aggregating
data in the context of query reconciliation. Unit tests have been added
to ensure the proper functioning of aggregate queries reconciliation and
reconcile aggregate data in the context of missing records.
* Generate GROUP BY / PIVOT
([databrickslabs#747](databrickslabs#747)). The
LogicalPlanGenerator class in the remorph library has been updated to
support generating GROUP BY and PIVOT clauses for SQL queries. A new
private method, "aggregate", has been added to handle two types of
aggregates: GroupBy and Pivot. For GroupBy, it generates a GROUP BY
clause with specified grouping expressions. For Pivot, it generates a
PIVOT clause where the specified column is used as the pivot column and
the specified values are used as the pivot values, compatible with Spark
SQL. If the aggregate type is unsupported, a TranspileException is
thrown. Additionally, new test cases have been introduced for the
LogicalPlanGenerator class in the
com.databricks.labs.remorph.generators.sql package to support testing
the transpilation of Aggregate expressions with GROUP BY and PIVOT
clauses, ensuring proper handling and transpilation of these
expressions.
* Implement error strategy for Snowflake parsing and use error strategy
for all parser instances
([databrickslabs#760](databrickslabs#760)). In this
release, we have developed an error strategy specifically for Snowflake
parsing that translates raw token names and parser rules into more
user-friendly SQL error messages. This strategy is applied consistently
across all parser instances, ensuring a unified error handling
experience. Additionally, we have refined the DBL_DOLLAR rule in the
SnowflakeLexer grammar to handle escaped dollar signs correctly. These
updates improve the accuracy and readability of error messages for SQL
authors, regardless of the parsing tool or transpiler used. Furthermore,
we have updated the TSQL parsing error strategy to match the new
Snowflake error strategy implementation, providing a consistent error
handling experience across dialects.
* Incremental improvement to error messages - article selection
([databrickslabs#711](databrickslabs#711)). In this
release, we have implemented an incremental improvement to the error
messages generated during T-SQL code parsing. This change introduces a
new private method, `articleFor`, which determines whether to use `a` or
`an` in the generated messages based on the first letter of the
following word. The `generateMessage` method has been updated to use
this new method when constructing the initial error message and
subsequent messages when there are multiple expected tokens. This
improvement ensures consistent use of articles `a` or `an` in the error
messages, enhancing their readability for software engineers working
with T-SQL code.
* TSQL: Adds tests and support for SELECT OPTION(...) generation
([databrickslabs#755](databrickslabs#755)). In this
release, we have added support for generating code for the TSQL `SELECT
... OPTION(...)` clause in the codebase. This new feature includes the
ability to transpile any query hints supplied with a SELECT statement as
comments in the output code, allowing for easier assessment of query
performance after transpilation. The OPTION clause is now generated as
comments, including MAXRECURSION, string options, boolean options, and
auto options. Additionally, we have added new tests and updated the
TSqlAstBuilderSpec test class with new and updated test cases to cover
the new functionality. The implementation is focused on generating code
for the OPTION clause, and does not affect the actual execution of the
query. The changes are limited to the ExpressionGenerator class and its
associated methods, and the TSqlRelationBuilder class, without affecting
other parts of the codebase.
* TSQL: IR implementation of MERGE
([databrickslabs#719](databrickslabs#719)). The
open-source library has been updated to include a complete
implementation of the TSQL MERGE statement's IR (Intermediate
Representation), bringing it in line with Spark SQL. The
`LogicalPlanGenerator` class now includes a `generateMerge` method,
which generates the SQL code for the MERGE statement, taking a
`MergeIntoTable` object containing the target and source tables, merge
condition, and merge actions as input. The `MergeIntoTable` class has
been added as a case class to represent the logical plan of the MERGE
INTO command and extends the `Modification` trait. The
`LogicalPlanGenerator` class also includes a new `generateWithOptions`
method, which generates SQL code for the WITH OPTIONS clause, taking a
`WithOptions` object containing the input and options as children.
Additionally, the `TSqlRelationBuilder` class has been updated to handle
the MERGE statement's parsing, introducing new methods and updating
existing ones, such as `visitMerge`. The `TSqlToDatabricksTranspiler`
class has been updated to include support for the TSQL MERGE statement,
and the `ExpressionGenerator` class has new tests for options, columns,
and arithmetic expressions. A new optimization rule,
`TrapInsertDefaultsAction`, has been added to handle the behavior of the
DEFAULT keyword during INSERT statements. The commit also includes test
cases for the `MergeIntoTable` logical operator and the T-SQL merge
statement in the `TSqlAstBuilderSpec`.
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this pull request Jan 3, 2025
* Aggregate Queries Reconciliation
([databrickslabs#740](databrickslabs#740)). This
release introduces several changes to enhance the functionality of the
project, including the implementation of Aggregate Queries
Reconciliation, addressing issue
[databrickslabs#503](databrickslabs#503). A new
property, `aggregates`, has been added to the base class of the query
builder module to support aggregate queries reconciliation. A
`generate_final_reconcile_aggregate_output` function has been added to
generate the final reconcile output for aggregate queries. A new SQL
file creates a table called `aggregate_details` to store details about
aggregate reconciles, and a new column, `operation_name`, has been added
to the `main` table in the `installation` reconciliation query.
Additionally, new classes and methods have been introduced for handling
aggregate queries and their reconciliation, and new SQL tables and
columns have been created for storing and managing rules for aggregating
data in the context of query reconciliation. Unit tests have been added
to ensure the proper functioning of aggregate queries reconciliation and
reconcile aggregate data in the context of missing records.
* Generate GROUP BY / PIVOT
([databrickslabs#747](databrickslabs#747)). The
LogicalPlanGenerator class in the remorph library has been updated to
support generating GROUP BY and PIVOT clauses for SQL queries. A new
private method, "aggregate", has been added to handle two types of
aggregates: GroupBy and Pivot. For GroupBy, it generates a GROUP BY
clause with specified grouping expressions. For Pivot, it generates a
PIVOT clause where the specified column is used as the pivot column and
the specified values are used as the pivot values, compatible with Spark
SQL. If the aggregate type is unsupported, a TranspileException is
thrown. Additionally, new test cases have been introduced for the
LogicalPlanGenerator class in the
com.databricks.labs.remorph.generators.sql package to support testing
the transpilation of Aggregate expressions with GROUP BY and PIVOT
clauses, ensuring proper handling and transpilation of these
expressions.
* Implement error strategy for Snowflake parsing and use error strategy
for all parser instances
([databrickslabs#760](databrickslabs#760)). In this
release, we have developed an error strategy specifically for Snowflake
parsing that translates raw token names and parser rules into more
user-friendly SQL error messages. This strategy is applied consistently
across all parser instances, ensuring a unified error handling
experience. Additionally, we have refined the DBL_DOLLAR rule in the
SnowflakeLexer grammar to handle escaped dollar signs correctly. These
updates improve the accuracy and readability of error messages for SQL
authors, regardless of the parsing tool or transpiler used. Furthermore,
we have updated the TSQL parsing error strategy to match the new
Snowflake error strategy implementation, providing a consistent error
handling experience across dialects.
* Incremental improvement to error messages - article selection
([databrickslabs#711](databrickslabs#711)). In this
release, we have implemented an incremental improvement to the error
messages generated during T-SQL code parsing. This change introduces a
new private method, `articleFor`, which determines whether to use `a` or
`an` in the generated messages based on the first letter of the
following word. The `generateMessage` method has been updated to use
this new method when constructing the initial error message and
subsequent messages when there are multiple expected tokens. This
improvement ensures consistent use of articles `a` or `an` in the error
messages, enhancing their readability for software engineers working
with T-SQL code.
* TSQL: Adds tests and support for SELECT OPTION(...) generation
([databrickslabs#755](databrickslabs#755)). In this
release, we have added support for generating code for the TSQL `SELECT
... OPTION(...)` clause in the codebase. This new feature includes the
ability to transpile any query hints supplied with a SELECT statement as
comments in the output code, allowing for easier assessment of query
performance after transpilation. The OPTION clause is now generated as
comments, including MAXRECURSION, string options, boolean options, and
auto options. Additionally, we have added new tests and updated the
TSqlAstBuilderSpec test class with new and updated test cases to cover
the new functionality. The implementation is focused on generating code
for the OPTION clause, and does not affect the actual execution of the
query. The changes are limited to the ExpressionGenerator class and its
associated methods, and the TSqlRelationBuilder class, without affecting
other parts of the codebase.
* TSQL: IR implementation of MERGE
([databrickslabs#719](databrickslabs#719)). The
open-source library has been updated to include a complete
implementation of the TSQL MERGE statement's IR (Intermediate
Representation), bringing it in line with Spark SQL. The
`LogicalPlanGenerator` class now includes a `generateMerge` method,
which generates the SQL code for the MERGE statement, taking a
`MergeIntoTable` object containing the target and source tables, merge
condition, and merge actions as input. The `MergeIntoTable` class has
been added as a case class to represent the logical plan of the MERGE
INTO command and extends the `Modification` trait. The
`LogicalPlanGenerator` class also includes a new `generateWithOptions`
method, which generates SQL code for the WITH OPTIONS clause, taking a
`WithOptions` object containing the input and options as children.
Additionally, the `TSqlRelationBuilder` class has been updated to handle
the MERGE statement's parsing, introducing new methods and updating
existing ones, such as `visitMerge`. The `TSqlToDatabricksTranspiler`
class has been updated to include support for the TSQL MERGE statement,
and the `ExpressionGenerator` class has new tests for options, columns,
and arithmetic expressions. A new optimization rule,
`TrapInsertDefaultsAction`, has been added to handle the behavior of the
DEFAULT keyword during INSERT statements. The commit also includes test
cases for the `MergeIntoTable` logical operator and the T-SQL merge
statement in the `TSqlAstBuilderSpec`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants