Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.3.1 #542

Closed
wants to merge 1 commit into from
Closed

Release v0.3.1 #542

wants to merge 1 commit into from

Conversation

sundarshankar89
Copy link
Collaborator

  • Added LEFT and RIGHT JOIN syntax in Snowflake ANTLR grammar (#526). In this release, we have addressed an issue in the Snowflake ANTLR grammar where the keywords LEFT and RIGHT were incorrectly being allowed as identifiers in join statements. These keywords are actually hard keywords in Snowflake SQL and must be escaped when used as column names. The solution implemented involves removing LEFT from the list of non-reserved words and adding RIGHT to this list in the SnowflakeParser.g4 file. Additionally, the commit introduces a more precise examination of the lookahead through a member function call when handling cases where LEFT or RIGHT are used as column names without escaping, avoiding the use of semantic predicates which are generally discouraged. The tests for translating queries with LEFT JOIN and RIGHT JOIN have been updated to reflect these changes in the grammar. Furthermore, the ignore keyword has been removed from test cases that were previously marked as TODO to be fixed, indicating that the tests are now expected to pass.
  • Fixed Snowflake Acceptance Testcases Failures (#531). In this release, we have improved the acceptance testcases for various SQL functionalities in Snowflake and Databricks. We have updated the SQL queries in the test files to incorporate Window Specification and Order by Clause, which resulted in resolving failures in test cases related to DENSE_RANK(), LEAD(), NTILE(), RANK(), and row number functionalities. We have also replaced the MySQL-specific MONTH_NAME function with the Snowflake equivalent MONTHNAME, and updated the DATE_FORMAT function with TO_DATE in Snowflake and Databricks SQL queries for consistent and accurate date formatting. These changes enhance the accuracy, reliability, and compatibility of the acceptance test suite across different SQL platforms.
  • Added invalid null constraint and FQN (#517). This commit addresses issues #516 and #517 by improving the read_data function in the databricks.py file and updating the installation queries for the reconcile package. The read_data function has been enhanced to handle cases where the catalog variable is None more accurately by checking its existence before concatenating it to form the table name. The main table's FQN in the installation queries has been modified to remove the invalid null constraint and allow nullable values for the catalog field under source_table and target_table STRUCT types. These changes, co-authored by Ganesh Dogiparthi and SundarShankar89, improve the flexibility and accuracy of the library, enhancing its overall functionality.
  • Support translation of TSQL INGORE NULLS clause in windowing functions (#511). In this change, we add support for translating the TSQL IGNORE NULLS and RESPECT NULLS clauses in windowing functions to their equivalent functionality in Databricks SQL. In TSQL, certain windowing functions such as LEAD or LAG allow for the specification of IGNORE NULLS or RESPECT NULLS, which influence their behavior. In Databricks SQL, equivalent functions take an optional trailing boolean parameter that indicates whether trailing nulls should be ignored. This update appends the boolean option to the Databricks windowing functions when the IGNORE NULLS clause is specified, with RESPECT NULLS being the default case. These changes are implemented in the TSqlExpressionBuilder.scala file and are tested using the test file for the TSQL function parser, with no modifications to other parts of the codebase.
  • TSQL: Implement translation of INSERT statement (#515). In this release, the TSQL INSERT statement has been fully implemented in the open-source library, including all target options, optional clauses, and Common Table Expressions (CTEs). The TSqlParser.g4 file has been updated with new rules and modifications to support the INSERT statement and its elements. The TSQL parser has been enhanced with new classes, such as Output, InsertIntoTable, DerivedRows, DefaultValues, and LocalVarTable, to handle INSERT statements and CTEs. The TSqlAstBuilder class has also been updated to support the INSERT statement and other DML clauses. Additionally, new methods have been added to the TSqlExpressionBuilder class to build expressions for output elements in a DML list and handle the optional AS clause for aliasing the output expression. The test file for the TSqlAstBuilder includes several examples of INSERT statements with their corresponding ASTs, covering various scenarios such as inserting values into regular tables, local variable tables, and tables with hints, as well as inserting multiple rows and using default values. These changes bring enhanced capabilities to the library for parsing and translating TSQL INSERT statements and other DML clauses.
  • TSQL: Simplifies named table tableSource, implements columnAlias list (#512). This diff includes enhancements to the TSqlParser's ANTLR grammar, specifically to the tableSource and tableSourceItem rules, and the addition of new classes for table hints and column aliases in the TSqlExpressionBuilder and TSqlRelationBuilder. The new tableSourceItem rule streamlines named table handling, reducing grammar and parser complexity for improved maintainability. The new withTableHints rule facilitates table hint parsing and collection, enabling the creation of a new Relation, TableWithHints, for better handling of table hints in the Catalyst optimizer. The columnAliasList rule accurately implements column aliases in the IR, enhancing parser accuracy and consistency. These changes focus on improving TSqlParser robustness, maintainability, and functionality, ensuring continued accuracy and reliability in handling T-SQL queries.
  • TSQL: Support generic FOR options (#525). In this release, we have implemented support for using the keyword FOR in TSQL option clauses without requiring escaping. This change involves expanding the ANTLR rule for parsing generic options in the TSqlParser.g4 file, allowing for the correct parsing of options containing FOR as a keyword. This resolves issue #525 and ensures that TSQL statements like "SELECT * FROM t FOR XML RAW OPTION (OPTIMIZE FOR UNKNOWN)" are correctly parsed, even with the unescaped use of "FOR." The code now specifically handles T-SQL options containing the FOR keyword in the OptionBuilder class, eliding it and managing a few particular options such as "OPTIMIZE FOR UNKNOWN," which is now parsed as OPTIMIZE with an optional UNKNOWN identifier. These updates enhance the consistency and accuracy of parsing T-SQL options within our codebase (Co-authored by Valentin Kasas)
  • Use Oracle library only if the recon source is Oracle (#532). In this release, we have introduced a new configuration object ReconcileConfig in the deployment.py module of the databricks/labs/remorph/helpers package. This configuration object is used to pass reconciliation configuration data to the Deployment class and includes a new parameter recon_config in the __init__ method. The _job_recon_task method has been updated to include the Oracle library in the libraries list only if the reconciliation source is Oracle. We have also added two new fixtures, oracle_recon_config and snowflake_recon_config, in the test suite for different configurations of the ReconcileConfig class. The test_deploy_job and test_deploy_job_with_valid_state tests have been updated to accept these fixtures as arguments and pass them to the JobDeployer constructor. Additionally, the test_deploy_job_in_gcp test has been updated to set the is_gcp attribute of the Workspace object to True. These changes ensure the appropriate database library is used based on the provided configuration, improving the efficiency and reliability of the reconciliation job.

* Added LEFT and RIGHT JOIN syntax in Snowflake ANTLR grammar ([#526](#526)). In this release, we have addressed an issue in the Snowflake ANTLR grammar where the keywords `LEFT` and `RIGHT` were incorrectly being allowed as identifiers in join statements. These keywords are actually hard keywords in Snowflake SQL and must be escaped when used as column names. The solution implemented involves removing `LEFT` from the list of non-reserved words and adding `RIGHT` to this list in the SnowflakeParser.g4 file. Additionally, the commit introduces a more precise examination of the lookahead through a member function call when handling cases where `LEFT` or `RIGHT` are used as column names without escaping, avoiding the use of semantic predicates which are generally discouraged. The tests for translating queries with LEFT JOIN and RIGHT JOIN have been updated to reflect these changes in the grammar. Furthermore, the `ignore` keyword has been removed from test cases that were previously marked as `TODO` to be fixed, indicating that the tests are now expected to pass.
* Fixed Snowflake Acceptance Testcases Failures ([#531](#531)). In this release, we have improved the acceptance testcases for various SQL functionalities in Snowflake and Databricks. We have updated the SQL queries in the test files to incorporate Window Specification and Order by Clause, which resulted in resolving failures in test cases related to DENSE_RANK(), LEAD(), NTILE(), RANK(), and row number functionalities. We have also replaced the MySQL-specific MONTH_NAME function with the Snowflake equivalent MONTHNAME, and updated the DATE_FORMAT function with TO_DATE in Snowflake and Databricks SQL queries for consistent and accurate date formatting. These changes enhance the accuracy, reliability, and compatibility of the acceptance test suite across different SQL platforms.
* Added invalid null constraint and FQN ([#517](#517)). This commit addresses issues [#516](#516) and [#517](#517) by improving the `read_data` function in the `databricks.py` file and updating the installation queries for the `reconcile` package. The `read_data` function has been enhanced to handle cases where the `catalog` variable is None more accurately by checking its existence before concatenating it to form the table name. The main table's FQN in the installation queries has been modified to remove the invalid null constraint and allow nullable values for the `catalog` field under `source_table` and `target_table` STRUCT types. These changes, co-authored by Ganesh Dogiparthi and SundarShankar89, improve the flexibility and accuracy of the library, enhancing its overall functionality.
* Support translation of TSQL INGORE NULLS clause in windowing functions ([#511](#511)). In this change, we add support for translating the TSQL `IGNORE NULLS` and `RESPECT NULLS` clauses in windowing functions to their equivalent functionality in Databricks SQL. In TSQL, certain windowing functions such as LEAD or LAG allow for the specification of `IGNORE NULLS` or `RESPECT NULLS`, which influence their behavior. In Databricks SQL, equivalent functions take an optional trailing boolean parameter that indicates whether trailing nulls should be ignored. This update appends the boolean option to the Databricks windowing functions when the `IGNORE NULLS` clause is specified, with `RESPECT NULLS` being the default case. These changes are implemented in the TSqlExpressionBuilder.scala file and are tested using the test file for the TSQL function parser, with no modifications to other parts of the codebase.
* TSQL: Implement translation of INSERT statement ([#515](#515)). In this release, the TSQL INSERT statement has been fully implemented in the open-source library, including all target options, optional clauses, and Common Table Expressions (CTEs). The TSqlParser.g4 file has been updated with new rules and modifications to support the INSERT statement and its elements. The TSQL parser has been enhanced with new classes, such as Output, InsertIntoTable, DerivedRows, DefaultValues, and LocalVarTable, to handle INSERT statements and CTEs. The TSqlAstBuilder class has also been updated to support the INSERT statement and other DML clauses. Additionally, new methods have been added to the TSqlExpressionBuilder class to build expressions for output elements in a DML list and handle the optional AS clause for aliasing the output expression. The test file for the TSqlAstBuilder includes several examples of INSERT statements with their corresponding ASTs, covering various scenarios such as inserting values into regular tables, local variable tables, and tables with hints, as well as inserting multiple rows and using default values. These changes bring enhanced capabilities to the library for parsing and translating TSQL INSERT statements and other DML clauses.
* TSQL: Simplifies named table tableSource, implements columnAlias list ([#512](#512)). This diff includes enhancements to the TSqlParser's ANTLR grammar, specifically to the tableSource and tableSourceItem rules, and the addition of new classes for table hints and column aliases in the TSqlExpressionBuilder and TSqlRelationBuilder. The new tableSourceItem rule streamlines named table handling, reducing grammar and parser complexity for improved maintainability. The new withTableHints rule facilitates table hint parsing and collection, enabling the creation of a new Relation, TableWithHints, for better handling of table hints in the Catalyst optimizer. The columnAliasList rule accurately implements column aliases in the IR, enhancing parser accuracy and consistency. These changes focus on improving TSqlParser robustness, maintainability, and functionality, ensuring continued accuracy and reliability in handling T-SQL queries.
* TSQL: Support generic FOR options ([#525](#525)). In this release, we have implemented support for using the keyword `FOR` in TSQL option clauses without requiring escaping. This change involves expanding the ANTLR rule for parsing generic options in the TSqlParser.g4 file, allowing for the correct parsing of options containing `FOR` as a keyword. This resolves issue [#525](#525) and ensures that TSQL statements like "SELECT * FROM t FOR XML RAW OPTION (OPTIMIZE FOR UNKNOWN)" are correctly parsed, even with the unescaped use of "FOR." The code now specifically handles T-SQL options containing the `FOR` keyword in the `OptionBuilder` class, eliding it and managing a few particular options such as "OPTIMIZE FOR UNKNOWN," which is now parsed as `OPTIMIZE` with an optional `UNKNOWN` identifier. These updates enhance the consistency and accuracy of parsing T-SQL options within our codebase (Co-authored by Valentin Kasas)
* Use Oracle library only if the recon source is Oracle ([#532](#532)). In this release, we have introduced a new configuration object `ReconcileConfig` in the `deployment.py` module of the `databricks/labs/remorph/helpers` package. This configuration object is used to pass reconciliation configuration data to the `Deployment` class and includes a new parameter `recon_config` in the `__init__` method. The `_job_recon_task` method has been updated to include the Oracle library in the `libraries` list only if the reconciliation source is Oracle. We have also added two new fixtures, `oracle_recon_config` and `snowflake_recon_config`, in the test suite for different configurations of the `ReconcileConfig` class. The `test_deploy_job` and `test_deploy_job_with_valid_state` tests have been updated to accept these fixtures as arguments and pass them to the `JobDeployer` constructor. Additionally, the `test_deploy_job_in_gcp` test has been updated to set the `is_gcp` attribute of the `Workspace` object to `True`. These changes ensure the appropriate database library is used based on the provided configuration, improving the efficiency and reliability of the reconciliation job.
@sundarshankar89 sundarshankar89 requested a review from a team as a code owner July 10, 2024 05:11
Copy link
Contributor

@ganeshdogiparthi-db ganeshdogiparthi-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

Coverage tests results

394 tests  ±0   367 ✅ ±0   4s ⏱️ -1s
  2 suites ±0     0 💤 ±0 
  2 files   ±0    27 ❌ ±0 

For more details on these failures, see this check.

Results for commit 72eaa87. ± Comparison against base commit 884dd5a.

@sundarshankar89
Copy link
Collaborator Author

closed as there are CI failures

@nfx nfx deleted the prepare/0.3.1 branch July 29, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants