-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v0.3.1 #542
Closed
Closed
Release v0.3.1 #542
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Added LEFT and RIGHT JOIN syntax in Snowflake ANTLR grammar ([#526](#526)). In this release, we have addressed an issue in the Snowflake ANTLR grammar where the keywords `LEFT` and `RIGHT` were incorrectly being allowed as identifiers in join statements. These keywords are actually hard keywords in Snowflake SQL and must be escaped when used as column names. The solution implemented involves removing `LEFT` from the list of non-reserved words and adding `RIGHT` to this list in the SnowflakeParser.g4 file. Additionally, the commit introduces a more precise examination of the lookahead through a member function call when handling cases where `LEFT` or `RIGHT` are used as column names without escaping, avoiding the use of semantic predicates which are generally discouraged. The tests for translating queries with LEFT JOIN and RIGHT JOIN have been updated to reflect these changes in the grammar. Furthermore, the `ignore` keyword has been removed from test cases that were previously marked as `TODO` to be fixed, indicating that the tests are now expected to pass. * Fixed Snowflake Acceptance Testcases Failures ([#531](#531)). In this release, we have improved the acceptance testcases for various SQL functionalities in Snowflake and Databricks. We have updated the SQL queries in the test files to incorporate Window Specification and Order by Clause, which resulted in resolving failures in test cases related to DENSE_RANK(), LEAD(), NTILE(), RANK(), and row number functionalities. We have also replaced the MySQL-specific MONTH_NAME function with the Snowflake equivalent MONTHNAME, and updated the DATE_FORMAT function with TO_DATE in Snowflake and Databricks SQL queries for consistent and accurate date formatting. These changes enhance the accuracy, reliability, and compatibility of the acceptance test suite across different SQL platforms. * Added invalid null constraint and FQN ([#517](#517)). This commit addresses issues [#516](#516) and [#517](#517) by improving the `read_data` function in the `databricks.py` file and updating the installation queries for the `reconcile` package. The `read_data` function has been enhanced to handle cases where the `catalog` variable is None more accurately by checking its existence before concatenating it to form the table name. The main table's FQN in the installation queries has been modified to remove the invalid null constraint and allow nullable values for the `catalog` field under `source_table` and `target_table` STRUCT types. These changes, co-authored by Ganesh Dogiparthi and SundarShankar89, improve the flexibility and accuracy of the library, enhancing its overall functionality. * Support translation of TSQL INGORE NULLS clause in windowing functions ([#511](#511)). In this change, we add support for translating the TSQL `IGNORE NULLS` and `RESPECT NULLS` clauses in windowing functions to their equivalent functionality in Databricks SQL. In TSQL, certain windowing functions such as LEAD or LAG allow for the specification of `IGNORE NULLS` or `RESPECT NULLS`, which influence their behavior. In Databricks SQL, equivalent functions take an optional trailing boolean parameter that indicates whether trailing nulls should be ignored. This update appends the boolean option to the Databricks windowing functions when the `IGNORE NULLS` clause is specified, with `RESPECT NULLS` being the default case. These changes are implemented in the TSqlExpressionBuilder.scala file and are tested using the test file for the TSQL function parser, with no modifications to other parts of the codebase. * TSQL: Implement translation of INSERT statement ([#515](#515)). In this release, the TSQL INSERT statement has been fully implemented in the open-source library, including all target options, optional clauses, and Common Table Expressions (CTEs). The TSqlParser.g4 file has been updated with new rules and modifications to support the INSERT statement and its elements. The TSQL parser has been enhanced with new classes, such as Output, InsertIntoTable, DerivedRows, DefaultValues, and LocalVarTable, to handle INSERT statements and CTEs. The TSqlAstBuilder class has also been updated to support the INSERT statement and other DML clauses. Additionally, new methods have been added to the TSqlExpressionBuilder class to build expressions for output elements in a DML list and handle the optional AS clause for aliasing the output expression. The test file for the TSqlAstBuilder includes several examples of INSERT statements with their corresponding ASTs, covering various scenarios such as inserting values into regular tables, local variable tables, and tables with hints, as well as inserting multiple rows and using default values. These changes bring enhanced capabilities to the library for parsing and translating TSQL INSERT statements and other DML clauses. * TSQL: Simplifies named table tableSource, implements columnAlias list ([#512](#512)). This diff includes enhancements to the TSqlParser's ANTLR grammar, specifically to the tableSource and tableSourceItem rules, and the addition of new classes for table hints and column aliases in the TSqlExpressionBuilder and TSqlRelationBuilder. The new tableSourceItem rule streamlines named table handling, reducing grammar and parser complexity for improved maintainability. The new withTableHints rule facilitates table hint parsing and collection, enabling the creation of a new Relation, TableWithHints, for better handling of table hints in the Catalyst optimizer. The columnAliasList rule accurately implements column aliases in the IR, enhancing parser accuracy and consistency. These changes focus on improving TSqlParser robustness, maintainability, and functionality, ensuring continued accuracy and reliability in handling T-SQL queries. * TSQL: Support generic FOR options ([#525](#525)). In this release, we have implemented support for using the keyword `FOR` in TSQL option clauses without requiring escaping. This change involves expanding the ANTLR rule for parsing generic options in the TSqlParser.g4 file, allowing for the correct parsing of options containing `FOR` as a keyword. This resolves issue [#525](#525) and ensures that TSQL statements like "SELECT * FROM t FOR XML RAW OPTION (OPTIMIZE FOR UNKNOWN)" are correctly parsed, even with the unescaped use of "FOR." The code now specifically handles T-SQL options containing the `FOR` keyword in the `OptionBuilder` class, eliding it and managing a few particular options such as "OPTIMIZE FOR UNKNOWN," which is now parsed as `OPTIMIZE` with an optional `UNKNOWN` identifier. These updates enhance the consistency and accuracy of parsing T-SQL options within our codebase (Co-authored by Valentin Kasas) * Use Oracle library only if the recon source is Oracle ([#532](#532)). In this release, we have introduced a new configuration object `ReconcileConfig` in the `deployment.py` module of the `databricks/labs/remorph/helpers` package. This configuration object is used to pass reconciliation configuration data to the `Deployment` class and includes a new parameter `recon_config` in the `__init__` method. The `_job_recon_task` method has been updated to include the Oracle library in the `libraries` list only if the reconciliation source is Oracle. We have also added two new fixtures, `oracle_recon_config` and `snowflake_recon_config`, in the test suite for different configurations of the `ReconcileConfig` class. The `test_deploy_job` and `test_deploy_job_with_valid_state` tests have been updated to accept these fixtures as arguments and pass them to the `JobDeployer` constructor. Additionally, the `test_deploy_job_in_gcp` test has been updated to set the `is_gcp` attribute of the `Workspace` object to `True`. These changes ensure the appropriate database library is used based on the provided configuration, improving the efficiency and reliability of the reconciliation job.
ganeshdogiparthi-db
approved these changes
Jul 10, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Coverage tests results394 tests ±0 367 ✅ ±0 4s ⏱️ -1s For more details on these failures, see this check. Results for commit 72eaa87. ± Comparison against base commit 884dd5a. |
closed as there are CI failures |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
LEFT
andRIGHT
were incorrectly being allowed as identifiers in join statements. These keywords are actually hard keywords in Snowflake SQL and must be escaped when used as column names. The solution implemented involves removingLEFT
from the list of non-reserved words and addingRIGHT
to this list in the SnowflakeParser.g4 file. Additionally, the commit introduces a more precise examination of the lookahead through a member function call when handling cases whereLEFT
orRIGHT
are used as column names without escaping, avoiding the use of semantic predicates which are generally discouraged. The tests for translating queries with LEFT JOIN and RIGHT JOIN have been updated to reflect these changes in the grammar. Furthermore, theignore
keyword has been removed from test cases that were previously marked asTODO
to be fixed, indicating that the tests are now expected to pass.read_data
function in thedatabricks.py
file and updating the installation queries for thereconcile
package. Theread_data
function has been enhanced to handle cases where thecatalog
variable is None more accurately by checking its existence before concatenating it to form the table name. The main table's FQN in the installation queries has been modified to remove the invalid null constraint and allow nullable values for thecatalog
field undersource_table
andtarget_table
STRUCT types. These changes, co-authored by Ganesh Dogiparthi and SundarShankar89, improve the flexibility and accuracy of the library, enhancing its overall functionality.IGNORE NULLS
andRESPECT NULLS
clauses in windowing functions to their equivalent functionality in Databricks SQL. In TSQL, certain windowing functions such as LEAD or LAG allow for the specification ofIGNORE NULLS
orRESPECT NULLS
, which influence their behavior. In Databricks SQL, equivalent functions take an optional trailing boolean parameter that indicates whether trailing nulls should be ignored. This update appends the boolean option to the Databricks windowing functions when theIGNORE NULLS
clause is specified, withRESPECT NULLS
being the default case. These changes are implemented in the TSqlExpressionBuilder.scala file and are tested using the test file for the TSQL function parser, with no modifications to other parts of the codebase.FOR
in TSQL option clauses without requiring escaping. This change involves expanding the ANTLR rule for parsing generic options in the TSqlParser.g4 file, allowing for the correct parsing of options containingFOR
as a keyword. This resolves issue #525 and ensures that TSQL statements like "SELECT * FROM t FOR XML RAW OPTION (OPTIMIZE FOR UNKNOWN)" are correctly parsed, even with the unescaped use of "FOR." The code now specifically handles T-SQL options containing theFOR
keyword in theOptionBuilder
class, eliding it and managing a few particular options such as "OPTIMIZE FOR UNKNOWN," which is now parsed asOPTIMIZE
with an optionalUNKNOWN
identifier. These updates enhance the consistency and accuracy of parsing T-SQL options within our codebase (Co-authored by Valentin Kasas)ReconcileConfig
in thedeployment.py
module of thedatabricks/labs/remorph/helpers
package. This configuration object is used to pass reconciliation configuration data to theDeployment
class and includes a new parameterrecon_config
in the__init__
method. The_job_recon_task
method has been updated to include the Oracle library in thelibraries
list only if the reconciliation source is Oracle. We have also added two new fixtures,oracle_recon_config
andsnowflake_recon_config
, in the test suite for different configurations of theReconcileConfig
class. Thetest_deploy_job
andtest_deploy_job_with_valid_state
tests have been updated to accept these fixtures as arguments and pass them to theJobDeployer
constructor. Additionally, thetest_deploy_job_in_gcp
test has been updated to set theis_gcp
attribute of theWorkspace
object toTrue
. These changes ensure the appropriate database library is used based on the provided configuration, improving the efficiency and reliability of the reconciliation job.