Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Comparison of aggregate metrics in source vs target for specific columns #503

Closed
1 task done
ajalisatgi opened this issue Jun 28, 2024 · 0 comments · Fixed by #740
Closed
1 task done
Assignees
Labels
feat/recon making sure that remorphed query produces the same results as original

Comments

@ajalisatgi
Copy link

ajalisatgi commented Jun 28, 2024

Image

Is there an existing issue for this?

  • I have searched the existing issues

Category of feature request

Reconcile

Problem statement

Remorph's recon module cannot be configured to compare aggregate summary metrics for specific columns based on user input.

Proposed Solution

Provide users the ability to define configuration-driven options that can compare summary metrics for specific columns in the source vs. the target.

Additional Context

Example:
Comparison between Snowflake table (snftable.T1) and Databricks table (dltable.T1)
query1 --> select sum(col A ) , sum(col B )from snftable.T1 group by colC
query 2 --> select sum(col A ) , sum(col B )from dltable.T1 group by colC

@ajalisatgi ajalisatgi added the enhancement New feature or request label Jun 28, 2024
@vijaypavann-db vijaypavann-db self-assigned this Jul 8, 2024
@nfx nfx added feat/recon making sure that remorphed query produces the same results as original and removed enhancement New feature or request labels Jul 19, 2024
@nfx nfx closed this as completed in #740 Aug 2, 2024
@nfx nfx closed this as completed in de431bc Aug 2, 2024
nfx added a commit that referenced this issue Aug 2, 2024
* Aggregate Queries Reconciliation ([#740](#740)). This release introduces several changes to enhance the functionality of the project, including the implementation of Aggregate Queries Reconciliation, addressing issue [#503](#503). A new property, `aggregates`, has been added to the base class of the query builder module to support aggregate queries reconciliation. A `generate_final_reconcile_aggregate_output` function has been added to generate the final reconcile output for aggregate queries. A new SQL file creates a table called `aggregate_details` to store details about aggregate reconciles, and a new column, `operation_name`, has been added to the `main` table in the `installation` reconciliation query. Additionally, new classes and methods have been introduced for handling aggregate queries and their reconciliation, and new SQL tables and columns have been created for storing and managing rules for aggregating data in the context of query reconciliation. Unit tests have been added to ensure the proper functioning of aggregate queries reconciliation and reconcile aggregate data in the context of missing records.
* Generate GROUP BY / PIVOT ([#747](#747)). The LogicalPlanGenerator class in the remorph library has been updated to support generating GROUP BY and PIVOT clauses for SQL queries. A new private method, "aggregate", has been added to handle two types of aggregates: GroupBy and Pivot. For GroupBy, it generates a GROUP BY clause with specified grouping expressions. For Pivot, it generates a PIVOT clause where the specified column is used as the pivot column and the specified values are used as the pivot values, compatible with Spark SQL. If the aggregate type is unsupported, a TranspileException is thrown. Additionally, new test cases have been introduced for the LogicalPlanGenerator class in the com.databricks.labs.remorph.generators.sql package to support testing the transpilation of Aggregate expressions with GROUP BY and PIVOT clauses, ensuring proper handling and transpilation of these expressions.
* Implement error strategy for Snowflake parsing and use error strategy for all parser instances ([#760](#760)). In this release, we have developed an error strategy specifically for Snowflake parsing that translates raw token names and parser rules into more user-friendly SQL error messages. This strategy is applied consistently across all parser instances, ensuring a unified error handling experience. Additionally, we have refined the DBL_DOLLAR rule in the SnowflakeLexer grammar to handle escaped dollar signs correctly. These updates improve the accuracy and readability of error messages for SQL authors, regardless of the parsing tool or transpiler used. Furthermore, we have updated the TSQL parsing error strategy to match the new Snowflake error strategy implementation, providing a consistent error handling experience across dialects.
* Incremental improvement to error messages - article selection ([#711](#711)). In this release, we have implemented an incremental improvement to the error messages generated during T-SQL code parsing. This change introduces a new private method, `articleFor`, which determines whether to use `a` or `an` in the generated messages based on the first letter of the following word. The `generateMessage` method has been updated to use this new method when constructing the initial error message and subsequent messages when there are multiple expected tokens. This improvement ensures consistent use of articles `a` or `an` in the error messages, enhancing their readability for software engineers working with T-SQL code.
* TSQL: Adds tests and support for SELECT OPTION(...) generation ([#755](#755)). In this release, we have added support for generating code for the TSQL `SELECT ... OPTION(...)` clause in the codebase. This new feature includes the ability to transpile any query hints supplied with a SELECT statement as comments in the output code, allowing for easier assessment of query performance after transpilation. The OPTION clause is now generated as comments, including MAXRECURSION, string options, boolean options, and auto options. Additionally, we have added new tests and updated the TSqlAstBuilderSpec test class with new and updated test cases to cover the new functionality. The implementation is focused on generating code for the OPTION clause, and does not affect the actual execution of the query. The changes are limited to the ExpressionGenerator class and its associated methods, and the TSqlRelationBuilder class, without affecting other parts of the codebase.
* TSQL: IR implementation of MERGE ([#719](#719)). The open-source library has been updated to include a complete implementation of the TSQL MERGE statement's IR (Intermediate Representation), bringing it in line with Spark SQL. The `LogicalPlanGenerator` class now includes a `generateMerge` method, which generates the SQL code for the MERGE statement, taking a `MergeIntoTable` object containing the target and source tables, merge condition, and merge actions as input. The `MergeIntoTable` class has been added as a case class to represent the logical plan of the MERGE INTO command and extends the `Modification` trait. The `LogicalPlanGenerator` class also includes a new `generateWithOptions` method, which generates SQL code for the WITH OPTIONS clause, taking a `WithOptions` object containing the input and options as children. Additionally, the `TSqlRelationBuilder` class has been updated to handle the MERGE statement's parsing, introducing new methods and updating existing ones, such as `visitMerge`. The `TSqlToDatabricksTranspiler` class has been updated to include support for the TSQL MERGE statement, and the `ExpressionGenerator` class has new tests for options, columns, and arithmetic expressions. A new optimization rule, `TrapInsertDefaultsAction`, has been added to handle the behavior of the DEFAULT keyword during INSERT statements. The commit also includes test cases for the `MergeIntoTable` logical operator and the T-SQL merge statement in the `TSqlAstBuilderSpec`.
@nfx nfx mentioned this issue Aug 2, 2024
nfx added a commit that referenced this issue Aug 2, 2024
* Aggregate Queries Reconciliation
([#740](#740)). This
release introduces several changes to enhance the functionality of the
project, including the implementation of Aggregate Queries
Reconciliation, addressing issue
[#503](#503). A new
property, `aggregates`, has been added to the base class of the query
builder module to support aggregate queries reconciliation. A
`generate_final_reconcile_aggregate_output` function has been added to
generate the final reconcile output for aggregate queries. A new SQL
file creates a table called `aggregate_details` to store details about
aggregate reconciles, and a new column, `operation_name`, has been added
to the `main` table in the `installation` reconciliation query.
Additionally, new classes and methods have been introduced for handling
aggregate queries and their reconciliation, and new SQL tables and
columns have been created for storing and managing rules for aggregating
data in the context of query reconciliation. Unit tests have been added
to ensure the proper functioning of aggregate queries reconciliation and
reconcile aggregate data in the context of missing records.
* Generate GROUP BY / PIVOT
([#747](#747)). The
LogicalPlanGenerator class in the remorph library has been updated to
support generating GROUP BY and PIVOT clauses for SQL queries. A new
private method, "aggregate", has been added to handle two types of
aggregates: GroupBy and Pivot. For GroupBy, it generates a GROUP BY
clause with specified grouping expressions. For Pivot, it generates a
PIVOT clause where the specified column is used as the pivot column and
the specified values are used as the pivot values, compatible with Spark
SQL. If the aggregate type is unsupported, a TranspileException is
thrown. Additionally, new test cases have been introduced for the
LogicalPlanGenerator class in the
com.databricks.labs.remorph.generators.sql package to support testing
the transpilation of Aggregate expressions with GROUP BY and PIVOT
clauses, ensuring proper handling and transpilation of these
expressions.
* Implement error strategy for Snowflake parsing and use error strategy
for all parser instances
([#760](#760)). In this
release, we have developed an error strategy specifically for Snowflake
parsing that translates raw token names and parser rules into more
user-friendly SQL error messages. This strategy is applied consistently
across all parser instances, ensuring a unified error handling
experience. Additionally, we have refined the DBL_DOLLAR rule in the
SnowflakeLexer grammar to handle escaped dollar signs correctly. These
updates improve the accuracy and readability of error messages for SQL
authors, regardless of the parsing tool or transpiler used. Furthermore,
we have updated the TSQL parsing error strategy to match the new
Snowflake error strategy implementation, providing a consistent error
handling experience across dialects.
* Incremental improvement to error messages - article selection
([#711](#711)). In this
release, we have implemented an incremental improvement to the error
messages generated during T-SQL code parsing. This change introduces a
new private method, `articleFor`, which determines whether to use `a` or
`an` in the generated messages based on the first letter of the
following word. The `generateMessage` method has been updated to use
this new method when constructing the initial error message and
subsequent messages when there are multiple expected tokens. This
improvement ensures consistent use of articles `a` or `an` in the error
messages, enhancing their readability for software engineers working
with T-SQL code.
* TSQL: Adds tests and support for SELECT OPTION(...) generation
([#755](#755)). In this
release, we have added support for generating code for the TSQL `SELECT
... OPTION(...)` clause in the codebase. This new feature includes the
ability to transpile any query hints supplied with a SELECT statement as
comments in the output code, allowing for easier assessment of query
performance after transpilation. The OPTION clause is now generated as
comments, including MAXRECURSION, string options, boolean options, and
auto options. Additionally, we have added new tests and updated the
TSqlAstBuilderSpec test class with new and updated test cases to cover
the new functionality. The implementation is focused on generating code
for the OPTION clause, and does not affect the actual execution of the
query. The changes are limited to the ExpressionGenerator class and its
associated methods, and the TSqlRelationBuilder class, without affecting
other parts of the codebase.
* TSQL: IR implementation of MERGE
([#719](#719)). The
open-source library has been updated to include a complete
implementation of the TSQL MERGE statement's IR (Intermediate
Representation), bringing it in line with Spark SQL. The
`LogicalPlanGenerator` class now includes a `generateMerge` method,
which generates the SQL code for the MERGE statement, taking a
`MergeIntoTable` object containing the target and source tables, merge
condition, and merge actions as input. The `MergeIntoTable` class has
been added as a case class to represent the logical plan of the MERGE
INTO command and extends the `Modification` trait. The
`LogicalPlanGenerator` class also includes a new `generateWithOptions`
method, which generates SQL code for the WITH OPTIONS clause, taking a
`WithOptions` object containing the input and options as children.
Additionally, the `TSqlRelationBuilder` class has been updated to handle
the MERGE statement's parsing, introducing new methods and updating
existing ones, such as `visitMerge`. The `TSqlToDatabricksTranspiler`
class has been updated to include support for the TSQL MERGE statement,
and the `ExpressionGenerator` class has new tests for options, columns,
and arithmetic expressions. A new optimization rule,
`TrapInsertDefaultsAction`, has been added to handle the behavior of the
DEFAULT keyword during INSERT statements. The commit also includes test
cases for the `MergeIntoTable` logical operator and the T-SQL merge
statement in the `TSqlAstBuilderSpec`.
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this issue Jan 2, 2025
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this issue Jan 2, 2025
* Aggregate Queries Reconciliation
([databrickslabs#740](databrickslabs#740)). This
release introduces several changes to enhance the functionality of the
project, including the implementation of Aggregate Queries
Reconciliation, addressing issue
[databrickslabs#503](databrickslabs#503). A new
property, `aggregates`, has been added to the base class of the query
builder module to support aggregate queries reconciliation. A
`generate_final_reconcile_aggregate_output` function has been added to
generate the final reconcile output for aggregate queries. A new SQL
file creates a table called `aggregate_details` to store details about
aggregate reconciles, and a new column, `operation_name`, has been added
to the `main` table in the `installation` reconciliation query.
Additionally, new classes and methods have been introduced for handling
aggregate queries and their reconciliation, and new SQL tables and
columns have been created for storing and managing rules for aggregating
data in the context of query reconciliation. Unit tests have been added
to ensure the proper functioning of aggregate queries reconciliation and
reconcile aggregate data in the context of missing records.
* Generate GROUP BY / PIVOT
([databrickslabs#747](databrickslabs#747)). The
LogicalPlanGenerator class in the remorph library has been updated to
support generating GROUP BY and PIVOT clauses for SQL queries. A new
private method, "aggregate", has been added to handle two types of
aggregates: GroupBy and Pivot. For GroupBy, it generates a GROUP BY
clause with specified grouping expressions. For Pivot, it generates a
PIVOT clause where the specified column is used as the pivot column and
the specified values are used as the pivot values, compatible with Spark
SQL. If the aggregate type is unsupported, a TranspileException is
thrown. Additionally, new test cases have been introduced for the
LogicalPlanGenerator class in the
com.databricks.labs.remorph.generators.sql package to support testing
the transpilation of Aggregate expressions with GROUP BY and PIVOT
clauses, ensuring proper handling and transpilation of these
expressions.
* Implement error strategy for Snowflake parsing and use error strategy
for all parser instances
([databrickslabs#760](databrickslabs#760)). In this
release, we have developed an error strategy specifically for Snowflake
parsing that translates raw token names and parser rules into more
user-friendly SQL error messages. This strategy is applied consistently
across all parser instances, ensuring a unified error handling
experience. Additionally, we have refined the DBL_DOLLAR rule in the
SnowflakeLexer grammar to handle escaped dollar signs correctly. These
updates improve the accuracy and readability of error messages for SQL
authors, regardless of the parsing tool or transpiler used. Furthermore,
we have updated the TSQL parsing error strategy to match the new
Snowflake error strategy implementation, providing a consistent error
handling experience across dialects.
* Incremental improvement to error messages - article selection
([databrickslabs#711](databrickslabs#711)). In this
release, we have implemented an incremental improvement to the error
messages generated during T-SQL code parsing. This change introduces a
new private method, `articleFor`, which determines whether to use `a` or
`an` in the generated messages based on the first letter of the
following word. The `generateMessage` method has been updated to use
this new method when constructing the initial error message and
subsequent messages when there are multiple expected tokens. This
improvement ensures consistent use of articles `a` or `an` in the error
messages, enhancing their readability for software engineers working
with T-SQL code.
* TSQL: Adds tests and support for SELECT OPTION(...) generation
([databrickslabs#755](databrickslabs#755)). In this
release, we have added support for generating code for the TSQL `SELECT
... OPTION(...)` clause in the codebase. This new feature includes the
ability to transpile any query hints supplied with a SELECT statement as
comments in the output code, allowing for easier assessment of query
performance after transpilation. The OPTION clause is now generated as
comments, including MAXRECURSION, string options, boolean options, and
auto options. Additionally, we have added new tests and updated the
TSqlAstBuilderSpec test class with new and updated test cases to cover
the new functionality. The implementation is focused on generating code
for the OPTION clause, and does not affect the actual execution of the
query. The changes are limited to the ExpressionGenerator class and its
associated methods, and the TSqlRelationBuilder class, without affecting
other parts of the codebase.
* TSQL: IR implementation of MERGE
([databrickslabs#719](databrickslabs#719)). The
open-source library has been updated to include a complete
implementation of the TSQL MERGE statement's IR (Intermediate
Representation), bringing it in line with Spark SQL. The
`LogicalPlanGenerator` class now includes a `generateMerge` method,
which generates the SQL code for the MERGE statement, taking a
`MergeIntoTable` object containing the target and source tables, merge
condition, and merge actions as input. The `MergeIntoTable` class has
been added as a case class to represent the logical plan of the MERGE
INTO command and extends the `Modification` trait. The
`LogicalPlanGenerator` class also includes a new `generateWithOptions`
method, which generates SQL code for the WITH OPTIONS clause, taking a
`WithOptions` object containing the input and options as children.
Additionally, the `TSqlRelationBuilder` class has been updated to handle
the MERGE statement's parsing, introducing new methods and updating
existing ones, such as `visitMerge`. The `TSqlToDatabricksTranspiler`
class has been updated to include support for the TSQL MERGE statement,
and the `ExpressionGenerator` class has new tests for options, columns,
and arithmetic expressions. A new optimization rule,
`TrapInsertDefaultsAction`, has been added to handle the behavior of the
DEFAULT keyword during INSERT statements. The commit also includes test
cases for the `MergeIntoTable` logical operator and the T-SQL merge
statement in the `TSqlAstBuilderSpec`.
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this issue Jan 3, 2025
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this issue Jan 3, 2025
* Aggregate Queries Reconciliation
([databrickslabs#740](databrickslabs#740)). This
release introduces several changes to enhance the functionality of the
project, including the implementation of Aggregate Queries
Reconciliation, addressing issue
[databrickslabs#503](databrickslabs#503). A new
property, `aggregates`, has been added to the base class of the query
builder module to support aggregate queries reconciliation. A
`generate_final_reconcile_aggregate_output` function has been added to
generate the final reconcile output for aggregate queries. A new SQL
file creates a table called `aggregate_details` to store details about
aggregate reconciles, and a new column, `operation_name`, has been added
to the `main` table in the `installation` reconciliation query.
Additionally, new classes and methods have been introduced for handling
aggregate queries and their reconciliation, and new SQL tables and
columns have been created for storing and managing rules for aggregating
data in the context of query reconciliation. Unit tests have been added
to ensure the proper functioning of aggregate queries reconciliation and
reconcile aggregate data in the context of missing records.
* Generate GROUP BY / PIVOT
([databrickslabs#747](databrickslabs#747)). The
LogicalPlanGenerator class in the remorph library has been updated to
support generating GROUP BY and PIVOT clauses for SQL queries. A new
private method, "aggregate", has been added to handle two types of
aggregates: GroupBy and Pivot. For GroupBy, it generates a GROUP BY
clause with specified grouping expressions. For Pivot, it generates a
PIVOT clause where the specified column is used as the pivot column and
the specified values are used as the pivot values, compatible with Spark
SQL. If the aggregate type is unsupported, a TranspileException is
thrown. Additionally, new test cases have been introduced for the
LogicalPlanGenerator class in the
com.databricks.labs.remorph.generators.sql package to support testing
the transpilation of Aggregate expressions with GROUP BY and PIVOT
clauses, ensuring proper handling and transpilation of these
expressions.
* Implement error strategy for Snowflake parsing and use error strategy
for all parser instances
([databrickslabs#760](databrickslabs#760)). In this
release, we have developed an error strategy specifically for Snowflake
parsing that translates raw token names and parser rules into more
user-friendly SQL error messages. This strategy is applied consistently
across all parser instances, ensuring a unified error handling
experience. Additionally, we have refined the DBL_DOLLAR rule in the
SnowflakeLexer grammar to handle escaped dollar signs correctly. These
updates improve the accuracy and readability of error messages for SQL
authors, regardless of the parsing tool or transpiler used. Furthermore,
we have updated the TSQL parsing error strategy to match the new
Snowflake error strategy implementation, providing a consistent error
handling experience across dialects.
* Incremental improvement to error messages - article selection
([databrickslabs#711](databrickslabs#711)). In this
release, we have implemented an incremental improvement to the error
messages generated during T-SQL code parsing. This change introduces a
new private method, `articleFor`, which determines whether to use `a` or
`an` in the generated messages based on the first letter of the
following word. The `generateMessage` method has been updated to use
this new method when constructing the initial error message and
subsequent messages when there are multiple expected tokens. This
improvement ensures consistent use of articles `a` or `an` in the error
messages, enhancing their readability for software engineers working
with T-SQL code.
* TSQL: Adds tests and support for SELECT OPTION(...) generation
([databrickslabs#755](databrickslabs#755)). In this
release, we have added support for generating code for the TSQL `SELECT
... OPTION(...)` clause in the codebase. This new feature includes the
ability to transpile any query hints supplied with a SELECT statement as
comments in the output code, allowing for easier assessment of query
performance after transpilation. The OPTION clause is now generated as
comments, including MAXRECURSION, string options, boolean options, and
auto options. Additionally, we have added new tests and updated the
TSqlAstBuilderSpec test class with new and updated test cases to cover
the new functionality. The implementation is focused on generating code
for the OPTION clause, and does not affect the actual execution of the
query. The changes are limited to the ExpressionGenerator class and its
associated methods, and the TSqlRelationBuilder class, without affecting
other parts of the codebase.
* TSQL: IR implementation of MERGE
([databrickslabs#719](databrickslabs#719)). The
open-source library has been updated to include a complete
implementation of the TSQL MERGE statement's IR (Intermediate
Representation), bringing it in line with Spark SQL. The
`LogicalPlanGenerator` class now includes a `generateMerge` method,
which generates the SQL code for the MERGE statement, taking a
`MergeIntoTable` object containing the target and source tables, merge
condition, and merge actions as input. The `MergeIntoTable` class has
been added as a case class to represent the logical plan of the MERGE
INTO command and extends the `Modification` trait. The
`LogicalPlanGenerator` class also includes a new `generateWithOptions`
method, which generates SQL code for the WITH OPTIONS clause, taking a
`WithOptions` object containing the input and options as children.
Additionally, the `TSqlRelationBuilder` class has been updated to handle
the MERGE statement's parsing, introducing new methods and updating
existing ones, such as `visitMerge`. The `TSqlToDatabricksTranspiler`
class has been updated to include support for the TSQL MERGE statement,
and the `ExpressionGenerator` class has new tests for options, columns,
and arithmetic expressions. A new optimization rule,
`TrapInsertDefaultsAction`, has been added to handle the behavior of the
DEFAULT keyword during INSERT statements. The commit also includes test
cases for the `MergeIntoTable` logical operator and the T-SQL merge
statement in the `TSqlAstBuilderSpec`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/recon making sure that remorphed query produces the same results as original
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants