Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump sqlglot from 25.1.0 to 25.5.1 #534

Merged
merged 3 commits into from
Jul 17, 2024
Merged

Bump sqlglot from 25.1.0 to 25.5.1 #534

merged 3 commits into from
Jul 17, 2024

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Jul 8, 2024

Bumps sqlglot from 25.1.0 to 25.5.1.

Changelog

Sourced from sqlglot's changelog.

[v25.5.1] - 2024-07-05

🐛 Bug Fixes

[v25.5.0] - 2024-07-04

💥 BREAKING CHANGES

✨ New Features

🐛 Bug Fixes

♻️ Refactors

[v25.4.1] - 2024-06-29

🐛 Bug Fixes

[v25.4.0] - 2024-06-28

💥 BREAKING CHANGES

... (truncated)

Commits
  • 2bdde22 Fix(optimizer): only qualify coalesced USING columns if they exist in table s...
  • e449e3d docs: update API docs, CHANGELOG.md for v25.5.0 [skip ci]
  • 912bc84 feat(spark, databricks): Support view schema binding options (#3739)
  • c790c3b Fix(tsql): parse rhs of x::varchar(max) into a type (#3737)
  • 820d664 Feat(presto): wrap md5 string arguments in to_utf8 (#3732)
  • fb066a6 fix(oracle)!: Decouple NVL() from COALESCE() (#3734)
  • e5a53aa feat(snowflake): Support for FROM CHANGES (#3731)
  • 1e07c4d feat(presto, trino): Configurable transpilation of Snowflake VARIANT (#3725)
  • 8335ba1 Fix(clickhouse)!: preserve EXTRACT(date_part FROM datetime) calls (#3729)
  • f4a2872 Fix(clickhouse): switch off table alias columns generation (#3727)
  • Additional commits viewable in compare view

Most Recent Ignore Conditions Applied to This Pull Request
Dependency Name Ignore Conditions
sqlglot [>= 24.a, < 25]

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot requested a review from a team as a code owner July 8, 2024 09:41
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python labels Jul 8, 2024
@dependabot dependabot bot requested a review from ravit-db July 8, 2024 09:41
@dependabot dependabot bot requested a review from jimidle July 8, 2024 09:41
@sundarshankar89 sundarshankar89 added the good first issue Good for newcomers label Jul 9, 2024
Bumps [sqlglot](https://github.com/tobymao/sqlglot) from 25.1.0 to 25.5.1.
- [Changelog](https://github.com/tobymao/sqlglot/blob/main/CHANGELOG.md)
- [Commits](tobymao/sqlglot@v25.1.0...v25.5.1)

---
updated-dependencies:
- dependency-name: sqlglot
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot force-pushed the dependabot/pip/sqlglot-25.5.1 branch from 84ae90d to 467d780 Compare July 16, 2024 14:29
Copy link

github-actions bot commented Jul 17, 2024

Coverage tests results

398 tests  ±0   382 ✅ ±0   4s ⏱️ ±0s
  2 suites ±0     0 💤 ±0 
  2 files   ±0    16 ❌ ±0 

For more details on these failures, see this check.

Results for commit 3502381. ± Comparison against base commit a2e9d40.

♻️ This comment has been updated with latest results.

@nfx nfx merged commit 5b18af2 into main Jul 17, 2024
6 checks passed
@nfx nfx deleted the dependabot/pip/sqlglot-25.5.1 branch July 17, 2024 11:35
nfx added a commit that referenced this pull request Jul 30, 2024
* Added TSql transpiler ([#734](#734)). In this release, we have developed a new open-source library feature that enhances the transpilation of T-SQL code to Databricks-compatible code. The new TSqlToDatabricksTranspiler class has been added, which extends the Transpiler abstract class and defines the transpile method. This method converts T-SQL code to Databricks-compatible code by creating a lexer, token stream, parser, and parsed tree from the input string using TSqlLexer, CommonTokenStream, and tSqlFile. The parsed tree is then passed to the TSqlAstBuilder's visit method to generate a logical plan, which is optimized using an optimizer object with rules such as PullLimitUpwards and TopPercentToLimitSubquery. The optimized plan is then passed to the LogicalPlanGenerator's generate method to generate the final transpiled code. Additionally, a new class, IsTranspiledFromTSqlQueryRunner, has been added to the QueryRunner object to transpile T-SQL queries into the format expected by the Databricks query runner using the new TSqlToDatabricksTranspiler. The AcceptanceTestRunner class in the com.databricks.labs.remorph.coverage package has been updated to replace TSqlAstBuilder with IsTranspiledFromTSqlQueryRunner in the TSqlAcceptanceSuite class, indicating a change in the code responsible for handling TSql queries. This new feature aims to provide a smooth and efficient way to convert T-SQL code to Databricks-compatible code for further processing and execution.
* Added the missing info for the reconciliation documentation ([#520](#520)). In this release, we have made significant improvements to the reconciliation feature of our open-source library. We have added a configuration folder and provided a template for creating configuration files for specific table sources. The config files will contain necessary configurations for table-specific reconciliation. Additionally, we have included a note in the transformation section detailing the usage of user-defined functions (UDFs) in transformation expressions, with an example UDF called `sort_array_input()` provided. The reconcile configuration sample documentation has also been added, with a JSON configuration for reconciling tables using various operations like drop, join, transformation, threshold, filter, and JDBC ReaderOptions. The commit also includes examples of source and target tables, data overviews, and reconciliation configurations for various scenarios, such as basic config and column mapping, user transformations, explicit select, explicit drop, filters, and thresholds comparison, among others. These changes aim to make it easier for users to set up and execute the reconciliation process for specific table sources and provide clear and concise information about using UDFs in transformation expressions for reconciliation. This commit is co-authored by Vijay Pavan Nissankararao and SundarShankar89.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([#555](#555)). In this pull request, the sigstore/gh-action-sigstore-python dependency is being updated from version 2.1.1 to 3.0.0. This new version includes several changes and improvements, such as the addition of recursive globbing with ** to the inputs and the removal of certain settings like fulcio-url, rekor-url, ctfe, and rekor-root-pubkey. The signature, certificate, and bundle output settings have also been removed. Furthermore, the way inputs are parsed has been changed, and they are now made optional under certain conditions. The default suffix has been updated to .sigstore.json. The 3.0.0 version also resolves various deprecations present in sigstore-python's 2.x series and supports CI runners that use PEP 668 to constrain global package prefixes.
* Bump sqlglot from 25.1.0 to 25.5.1 ([#534](#534)). In the latest release, the `sqlglot` package has been updated from version 25.1.0 to 25.5.1, which includes bug fixes, breaking changes, new features, and refactors for parsing, analyzing, and rewriting SQL queries. The new version introduces optimizations for coalesced USING columns, preserves EXTRACT(date_part FROM datetime) calls, decouples NVL() from COALESCE(), and supports FROM CHANGES in Snowflake. It also provides configurable transpilation of Snowflake VARIANT, and supports view schema binding options for Spark and Databricks. The update addresses several issues, such as the use of timestamp with time zone over timestamptz, switch off table alias columns generation, and parse rhs of x::varchar(max) into a type. Additionally, the update cleans up CurrentTimestamp generation logic for Teradata. The `sqlglot` dependency has also been updated in the 'experimental.py' file of the 'databricks/labs/remorph/snow' module, along with the addition of a new private method `_parse_json` to the `Generator` class for parsing JSON data. Software engineers should review the changes and update their code accordingly, as conflicts with existing code will be resolved automatically by Dependabot, as long as the pull request is not altered manually.
* Bump sqlglot from 25.5.1 to 25.6.1 ([#585](#585)). In this release, the `sqlglot` dependency is updated from version 25.5.1 to 25.6.1 in the 'pyproject.toml' file. This update includes bug fixes, breaking changes, new features, and improvements. Breaking changes consist of updates to the QUALIFY clause in queries and the canonicalization of struct and array inline constructor. New features include support for ORDER BY ALL, FROM ROWS FROM (...), RPAD & LPAD functions, and exp.TimestampAdd. Bug fixes address issues related to the QUALIFY clause in queries, expansion of SELECT * REPLACE, RENAME, transpiling UDFs from Databricks, and more. The pull request also includes a detailed changelog, commit history, instructions for triggering Dependabot actions, and commands for reference, with the exception of the compatibility score for the new version, which is not taken into account in this pull request.
* Feature/reconcile table mismatch threshold ([#550](#550)). This commit enhances the reconciliation process in the open-source library with several new features, addressing issue [#504](#504). A new `get_record_count` method is added to the `Reconcile` class, providing record count data for source and target tables, facilitating comprehensive analysis of table mismatches. A `CountQueryBuilder` class is introduced to build record count queries for different layers and SQL dialects, ensuring consistency in data processing. The `Thresholds` class is refactored into `ColumnThresholds` and `TableThresholds`, allowing for more granular control over comparisons and customizable threshold settings. New methods `_is_mismatch_within_threshold_limits` and `_insert_into_metrics_table` are added to the `recon_capture.py` file, improving fine-grained control over the reconciliation process and preventing false positives. Additionally, new classes, methods, and data structures have been implemented in the `execute` module to handle reconciliation queries and data more efficiently. These improvements contribute to a more accurate and robust reconciliation system.
* Feature: introduce core transpiler ([#715](#715)). A new core transpiler, `SnowflakeToDatabricksTranspiler`, has been introduced to convert Snowflake queries into Databricks SQL, streamlining integration and compatibility between the two systems. This transpiler is integrated into the coverage test suites for thorough testing, and is used to convert various types of logical plans, handling cases such as `Batch`, `WithCTE`, `Project`, `NamedTable`, `Filter`, and `Join`. The `SnowflakeToDatabricksTranspiler` class tokenizes input Snowflake query strings, initializes a `SnowflakeParser` instance, parses the input Snowflake query, generates a logical plan, and applies the `LogicalPlanGenerator` to the logical plan to generate the equivalent Databricks SQL query. Additionally, the `SnowflakeAstBuilder` class has been updated to alter the way `Batch` logical plans are built and improve overall functionality of the transpiler.
* Fixed LEFT and RIGHT JOIN syntax in Snowflake ANTLR grammar ([#526](#526)). A fix has been implemented to address issues with the Snowflake ANTLR grammar related to the proper parsing of LEFT and RIGHT JOIN statements. Previously, the keywords LEFT and RIGHT were incorrectly allowed as identifiers, but they are hard keywords that must be escaped to be used as column names. This change updates the grammar to escape these keywords in JOIN statements, improving the overall parsing of queries that include LEFT and RIGHT JOINs. Additionally, semantic predicates have been suggested to handle cases where LEFT or RIGHT are used as column names without escaping, although this is not yet implemented. To ensure the correctness of the updated grammar, new tests have been added to the SnowflakeAstBuilderSpec for LEFT and RIGHT JOINs, which check that the Abstract Syntax Tree (AST) is built correctly for these queries.
* Fixed Snowflake Acceptance Testcases Failures ([#531](#531)). In this release, updates have been made to the acceptance testcases for various SQL functions in the open-source library. The DENSE RANK function's testcase has been updated with a window specification and ORDER BY clause in both Snowflake and Databricks SQL syntaxes, ensuring accurate test results. The LAG function's testcase now includes a PARTITION BY and ORDER BY clause, as well as the NULLS LAST keyword in Databricks SQL, for improved accuracy and consistency. The SQL queries in the Snowflake testcase for the `last_value` function have been updated with a window specification, ORDER BY clause, and NULLS LAST directive for Databricks SQL. Test case failures in the Snowflake acceptance testsuite have been addressed with updates to the LEAD function, MONTH_NAME to MONTHNAME renaming, and DATE_FORMAT to TO_DATE conversion, improving reliability and consistency. The ntile function's testcase has been updated with PARTITION BY and ORDER BY clauses, and the NULLS LAST keyword has been added to the Databricks SQL query. The SQL query for null-safe equality comparison has been updated with a conditional expression compatible with Snowflake. The ranking function's testcase has been improved with the appropriate partition and order by clauses, and the NULLS LAST keyword has been added to the Databricks SQL query, enhancing accuracy and consistency. Lastly, updates have been made to the ROW_NUMBER function's testcase, ensuring accurate and consistent row numbering for both Snowflake and Databricks.
* Fixed TSQL transpiler ([#735](#735)). In this release, we have implemented a fix for the TSQL transpiler, addressing the issue [#7](#7). This enhancement allows the library to accurately convert TSQL code into an equivalent format that is compatible with other databases. The fix resolves reported bugs related to incorrect syntax interpretation, thereby improving the overall reliability and functionality of the transpiler. Software engineers and developers relying on TSQL compatibility for cross-database operations will benefit from this improvement. We encourage users to test and provide feedback on this updated feature.
* Fixed `SELECT TOP X PERCENT` IR translation for TSQL ([#733](#733)). In this release, we have made several enhancements to the open-source library to improve compatibility with T-SQL and Catalyst. We have added a new dependency, `pprint_${scala.binary.version}` version 0.8.1 from the `com.lihaoyi` group, to provide advanced pretty-printing functionality for Scala. We have also fixed the translation of the TSQL `SELECT TOP X PERCENT` feature in the parser for intermediate expressions, addressing the difference in syntax between TSQL and SQL for limiting the number of rows returned by a query. Additionally, we have modified the implementation of the `WITH` clause and added a new expression for the `SELECT TOP` clause in T-SQL, improving the compatibility of the codebase with T-SQL and aligning it with Catalyst. We also introduced a new abstract class `Rule` and a case class `Rules` in the `com.databricks.labs.remorph.parsers.intermediate` package to fix the `SELECT TOP X PERCENT` IR translation for TSQL by adding new rules. Furthermore, we have added a new Scala file, `subqueries.scala`, containing abstract class `SubqueryExpression` and two case classes that extend it, and made changes to the `trees.scala` file to improve the tree string representation for better readability and consistency in the codebase. These changes aim to improve the overall functionality of the library and make it easier for new users to understand and adopt the project.
* Fixed invalid null constraint and FQN ([#517](#517)). In this change, we addressed issues [#516](#516) and [#517](#517), which involved resolving an invalid null constraint and correcting a fully qualified name (FQN) in our open-source library. The `read_data` function in the `databricks.py` file was updated to improve null constraint handling and ensure the FQN is valid. Previously, the catalog was always appended to the table name, potentially resulting in an invalid FQN and null constraint issues. Now, the code checks if the catalog exists before appending it to the table name, and if not provided, the schema and table name are concatenated directly. Additionally, we removed the NOT NULL constraint from the catalog field in the source_table and target_table structs in the main SQL file, allowing null values for this field. These changes maintain backward compatibility and enhance the overall functionality and robustness of the project, ensuring accurate query results and avoiding potential errors.
* Generate SQL for arithmetic operators ([#726](#726)). In this release, we have introduced a new private method `arithmetic` to the `ExpressionGenerator` class that generates SQL for arithmetic operations, including unary minus, unary plus, multiplication, division, modulo, addition, and subtraction. This improves the readability and maintainability of the code by separating concerns and making the functionality more explicit. Additionally, we have introduced a new trait named `Arithmetic` to group arithmetic expressions together, which enables easier manipulation and identification of arithmetic expressions in the code. A new test suite has also been added for arithmetic operations in the `ExpressionGenerator` class, which improves test coverage and ensures the correct SQL is generated for these operations. These changes provide a welcome addition for developers looking to extend the functionality of the `ExpressionGenerator` class for arithmetic operations.
* Generate SQL for bitwise operators. The ExpressionGenerator class in the remorph project has been updated to support generating SQL for bitwise operators (OR, AND, XOR, NOT) through the addition of a new private method `bitwise` that converts bitwise operations to equivalent SQL expressions. The `expression` method has also been updated to utilize the new `bitwise` method for any input ir.Expression that is a bitwise operation. To facilitate this change, a new trait called `Bitwise` and updated case classes for bitwise operations, including `BitwiseNot`, `BitwiseAnd`, `BitwiseOr`, and `BitwiseXor`, have been implemented. The updated case classes extend the new `Bitwise` trait and include the `dataType` override method to return the data type of the left expression. A new test case in ExpressionGeneratorTest for bitwise operators has been added to validate the functionality, and the `expression` method in ExpressionGenerator now utilizes the GeneratorContext() instead of new GeneratorContext(). These changes enable the ExpressionGenerator to generate SQL code for bitwise operations, expanding its capabilities.
* Generate `.. LIKE ..` ([#723](#723)). In this commit, the `ExpressionGenerator` class has been enhanced with a new method, `like`, which generates SQL `LIKE` expressions. The method takes a `GeneratorContext` object and an `ir.Like` object as arguments, and returns a string representation of the `LIKE` expression. It uses the `expression` method to generate the left and right sides of the `LIKE` operator, and also handles the optional escape character. Additionally, the `timestampLiteral` and `dateLiteral` methods have been updated to take an `ir.Literal` object and better handle `NULL` values. A new test case has also been added for the `ExpressionGenerator` class, which checks the `like` function and includes examples for basic usage and usage with an escape character. This commit improves the functionality of the `ExpressionGenerator` class, allowing it to handle `LIKE` expressions and better handle `NULL` values for timestamps and dates.
* Generate `DISTINCT`, `*` ([#739](#739)). In this release, we've enhanced the ExpressionGenerator class to support generating `DISTINCT` and `*` (star) expressions in SQL queries. Previously, the class did not handle these cases, resulting in incomplete or incorrect SQL queries. With the introduction of the `distinct` method, the class can now generate the `DISTINCT` keyword followed by the expression to be applied to, and the `star` method produces the `*` symbol, optionally followed by the name of the object (table or subquery) to which it applies. These improvements make the ExpressionGenerator class more robust and compatible with various SQL dialects, resulting in more accurate query outcomes. We've also added new test cases for the `ExpressionGenerator` class to ensure that `DISTINCT` and `*` expressions are generated correctly. Additionally, support for generating SQL `DISTINCT` and `*` (wildcard) has been added to the transpilation of Logical Plans to SQL, specifically in the `ir.Project` class. This ensures that the correct SQL `SELECT * FROM table` syntax is generated when a wildcard is used in the expression list. These enhancements significantly improve the functionality and compatibility of our open-source library.
* Generate `LIMIT` ([#732](#732)). A new method has been added to generate a SQL LIMIT clause for a logical plan in the data transformation tool. A new `case` branch has been implemented in the `generate` method of the `LogicalPlanGenerator` class to handle the `ir.Limit` case, which generates the SQL LIMIT clause. If a percentage limit is specified, the tool will throw an exception as it is not currently supported. The `generate` method of the `ExpressionGenerator` class has been replaced with a new `generate` method in the `ir.Project`, `ir.Filter`, and new `ir.Limit` cases to ensure consistent expression generation. A new case class `Limit` has been added to the `com.databricks.labs.remorph.parsers.intermediate.relations` package, which extends the `UnaryNode` class and has four parameters: `input`, `limit`, `is_percentage`, and `with_ties`. This new class enables limiting the number of rows returned by a query, with the ability to specify a percentage of rows or include ties in the result set. Additionally, a new test case has been added to the `LogicalPlanGeneratorTest` class to verify the transpilation of a `Limit` node to its SQL equivalent, ensuring that the `Limit` node is correctly handled during transpilation.
* Generate `OFFSET` SQL clauses ([#736](#736)). The Remorph project's latest update introduces a new OFFSET clause generation feature for SQL queries in the LogicalPlanGenerator class. This change adds support for skipping a specified number of rows before returning results in a SQL query, enhancing the library's query generation capabilities. The implementation includes a new case in the match statement of the generate function to handle ir.Offset nodes, creating a string representation of the OFFSET clause using the provide offset expression. Additionally, the commit includes a new test case in the LogicalPlanGeneratorTest class to validate the OFFSET clause generation, ensuring that the LogicalPlanGenerator can translate Offset AST nodes into corresponding SQL statements. Overall, this update enables the generation of more comprehensive SQL queries with OFFSET support, providing software engineers with greater flexibility for pagination and other data processing tasks.
* Generate `ORDER BY` SQL clauses ([#737](#737)). This commit introduces new classes and enumerations for sort direction and null ordering, as well as an updated SortOrder case class, enabling the generation of ORDER BY SQL clauses. The LogicalPlanGenerator and SnowflakeExpressionBuilder classes have been modified to utilize these changes, allowing for more flexible and customizable sorting and null ordering when generating SQL queries. Additionally, the TSqlRelationBuilderSpec test suite has been updated to reflect these changes, and new test cases have been added to ensure the correct transpilation of ORDER BY clauses. Overall, these improvements enhance the Remorph project's capability to parse and generate SQL expressions with various sorting scenarios, providing a more robust and maintainable codebase.
* Generate `UNION` / `EXCEPT` / `INTERSECT` ([#731](#731)). In this release, we have introduced support for the `UNION`, `EXCEPT`, and `INTERSECT` set operations in our data processing system's generator. A new `unknown` method has been added to the `Generator` trait to return a `TranspileException` when encountering unsupported operations, allowing for better error handling and more informative error messages. The `LogicalPlanGenerator` class in the remorph project has been extended to support generating `UNION`, `EXCEPT`, and `INTERSECT` SQL operations with the addition of a new parameter, `explicitDistinct`, to enable explicit specification of `DISTINCT` for these set operations. A new test suite has been added to the `LogicalPlanGenerator` to test the generation of these operations using the `SetOperation` class, which now has four possible set operations: `UnionSetOp`, `IntersectSetOp`, `ExceptSetOp`, and `UnspecifiedSetOp`. With these changes, our system can handle a wider range of input and provide meaningful error messages for unsupported operations, making it more versatile in handling complex SQL queries.
* Generate `VALUES` SQL clauses ([#738](#738)). The latest commit introduces a new feature to generate `VALUES` SQL clauses in the context of the logical plan generator. A new case branch has been implemented in the `generate` method to manage `ir.Values` expressions, converting input data (lists of lists of expressions) into a string representation compatible with `VALUES` clauses. The existing functionality remains unchanged. Additionally, a new test case has been added for the `LogicalPlanGenerator` class, which checks the correct transpilation to `VALUES` SQL clauses. This test case ensures that the `ir.Values` method, which takes a sequence of sequences of literals, generates the corresponding `VALUES` SQL clause, specifically checking the input `Seq(Seq(ir.Literal(1), ir.Literal(2)), Seq(ir.Literal(3), ir.Literal(4)))` against the SQL clause `"VALUES (1,2), (3,4)"`. This change enables testing the functionality of generating `VALUES` SQL clauses using the `LogicalPlanGenerator` class.
* Generate predicate expressions. This commit introduces the generation of predicate expressions as part of the SQL ExpressionGenerator in the `com.databricks.labs.remorph.generators.sql` package, enabling the creation of more complex SQL expressions. The changes include the addition of a new private method, `predicate(ctx: GeneratorContext, expr: Expression)`, to handle predicate expressions, and the introduction of two new predicate expression types, LessThan and LessThanOrEqual, for comparing the relative ordering of two expressions. Existing predicate expression types have been updated with consistent naming. Additionally, the commit incorporates improvements to the handling of comparison operators in the SnowflakeExpressionBuilder and TSqlExpressionBuilder classes, addressing bugs and ensuring precise predicate expression generation. The `ParserTestCommon` trait has also been updated to reorder certain operators in a logical plan, maintaining consistent comparison results in tests. New test cases have been added to several test suites to ensure the correct interpretation and generation of predicate expressions involving different data types and search conditions. Overall, these enhancements provide more fine-grained comparison of expressions, enable more nuanced condition checking, and improve the robustness and accuracy of the SQL expression generation process.
* Merge remote-tracking branch 'origin/main'. In this update, the `ExpressionGenerator` class in the `com.databricks.labs.remorph.generators.sql` package has been enhanced with two new private methods: `dateLiteral` and `timestampLiteral`. These methods are designed to generate the SQL literal representation of `DateType` and `TimestampType` expressions, respectively. The introduction of these methods addresses the previous limitations of formatting date and timestamp values directly, which lacked extensibility and required duplicated code for handling null values. By extracting the formatting logic into separate methods, this commit significantly improves code maintainability and reusability, enhancing the overall readability and understandability of the `ExpressionGenerator` class for developers. The `dateLiteral` method handles `DateType` values by formatting them using the `dateFormat` `SimpleDateFormat` instance, returning `NULL` if the value is missing. Likewise, the `timestampLiteral` method formats `TimestampType` values using the `timeFormat` `SimpleDateFormat` instance, returning `NULL` if the value is missing. These methods will enable developers to grasp the code's functionality more easily and make future enhancements to the class.
* Modified dataclass for table threshold and added documentation ([#714](#714)). A series of modifications have been implemented to enhance the threshold configuration and validation for table reconciliation in the open-source library. The `TableThresholds` dataclass has been updated to accept a string for the `model` attribute, replacing the previously used `TableThresholdModel` Enum. Additionally, a new `validate_threshold_model` method has been added to `TableThresholds` to ensure proper validation of the `model` attribute. A new exception class, `InvalidModelForTableThreshold`, has been introduced to handle invalid settings. Column-specific thresholds can now be set using the `ColumnThresholds` configuration option. The `recon_capture.py` and `recon_config.py` files have been updated accordingly, and the documentation has been revised to clarify these changes. These improvements offer greater flexibility and control for users configuring thresholds while also refining validation and error handling.
* Support CTAS in TSQL Grammar and add more comparison operators ([#545](#545)). In this release, we have added support for the CTAS (CREATE TABLE AS) statement in the TSQL (T-SQL) grammar, as well as introduced new comparison operators: !=, !<, and !>. The CTAS statement allows for the creation of a new table by selecting from an existing table or query, potentially improving code readability and performance. The new comparison operators provide alternative ways of expressing inequalities, increasing flexibility for developers. The keyword `REPLICATE` has also been added for creating a full copy of a database or availability group. These changes enhance the overall functionality of the TSQL grammar and improve the user's ability to express various operations in TSQL. The CTAS statement is implemented as a new rule, and the new comparison operators are added as methods in the TSqlExpressionBuilder class. These changes provide increased capability and flexibility for TSQL parsing and query handling. The new methods are not adding any new external dependencies, and the project remains self-contained. The additions have been tested with the TSqlExpressionBuilderSpec test suite, ensuring the functionality and compatibility of the TSQL parser.
* Support translation of TSQL INGORE NULLS clause in windowing functions ([#511](#511)). The latest change introduces support for translating the TSQL IGNORE NULLS and RESPECT NULLS clauses in windowing functions to their equivalents in Databricks SQL. In TSQL, these clauses appear after the function name and before the OVER clause, affecting how the functions handle null values. Databricks SQL represents this functionality with an optional trailing boolean parameter for specific windowing functions. With this update, when the IGNORE NULLS clause is specified in TSQL, a boolean option is appended to the corresponding Databicks SQL windowing functions, with RESPECT NULLS as the default. This enhancement is facilitated by a new private method, `buildNullIgnore`, which adds the boolean parameter to the original expression when IGNORE NULLS is specified in the OVER clause. The alteration is exemplified in new test examples for the TSqlFunctionSpec, which include testing the LEAD function with and without the IGNORE NULLS clause, and updates to the translation of functions with non-standard syntax.
* TSQL: Implement TSQL UPDATE/DELETE statements ([#540](#540)). In this release, we have added support for TSQL UPDATE and DELETE statements in all syntactical forms, including UDF column transformations, in the TSqlParser.g4 file. The implementation of both statements is done in a single file and they share many common clauses. We have also introduced two new case classes, UpdateTable and MergeTables, in the extensions.scala file to implement the TSQL UPDATE and DELETE statements, respectively. Additionally, we have added new methods to handle various clauses and elements associated with these statements in the TSqlErrorStrategy class. A series of tests have been included to ensure the correct translation of various UPDATE and DELETE queries to their respective Abstract Syntax Trees (ASTs). These changes bring TSQL UPDATE and DELETE statement functionality to the project and allow for their use in a variety of contexts, providing developers with more flexibility and control when working with TSQL UPDATE and DELETE statements in the parser.
* TSQL: Implement translation of INSERT statement ([#515](#515)). In this release, we have implemented the TSQL INSERT statement in its entirety, including all target options, optional clauses, and Common Table Expressions (CTEs) in our open-source library. The change includes updates to the TSqlParser.g4 file to support the INSERT statement's various clauses, such as TOP, INTO, WITH TABLE HINTS, outputClause, and optionClause. We have also added new case classes to the TSQL AST to handle various aspects of the INSERT statement, including LocalVarTable, Output, InsertIntoTable, DerivedRows, DefaultValues, and Default. The TSqlExpressionBuilder and TSqlRelationBuilder classes have been updated to support the new INSERT statement, including handling output column lists, aliases, and JSON clauses. We have added specification tests to TSqlAstBuilderSpec.scala to demonstrate the various ways that the INSERT statement can be written and to ensure the correct translation of TSQL INSERT statements into the remorph project.
* TSQL: Remove the SIGN fragment from numerical tokens ([#547](#547)). In this release, we have made changes to the TSQL expression builder in the remorph project that affect how negative and positive numerical literals are parsed. Negative literals, such as -2, will now be parsed as UMinus(Literal(2)) instead of Literal(-2), and positive literals, such as +1, will be parsed as UPlus(Literal(1)) instead of Literal(1). This change was made to address issue [#546](#546), but it is not an ideal solution as it may cause inconvenience in downstream processes. The affected numerical tokens include INT, HEX, FLOAT, REAL, and MONEY, which have been simplified by removing the SIGN fragment. We have updated the buildPrimitive method to handle INT, REAL, and FLOAT token types and ensure that numerical tokens are parsed correctly. We intend to keep issue [#546](#546) open for further exploration of a better solution. The tests have been updated to reflect these changes.
* TSQL: Simplifies named table tableSource, implements columnAlias list ([#512](#512)). This change introduces significant updates to the TSqlParser's grammar for tableSource, simplifying and consolidating rules related to table aliases and column alias lists. A new Relation called TableWithHints has been added to collect and process table hints, some of which have direct counterparts in the Catalyst optimizer or can be used as comments for migration purposes. The TSQLExpressionBuilder and TSqlRelationBuilder classes have been modified to handle table hints and column aliases, and the TSqlAstBuilderSpec test suite has been updated to include new tests for table hints in T-SQL SELECT statements. These changes aim to improve parsing, handling, and optimization of table sources, table hints, and column aliases in TSQL queries.
* TSQL: Support generic FOR options ([#525](#525)). In this release, we have added support for parsing T-SQL (Transact-SQL) options that contain the keyword `FOR` using the standard syntax `[FOR]`. This change is necessary as `FOR` cannot be used directly as an identifier without escaping, as it would otherwise be seen as a table alias in a `SELECT` statement. The ANTLR rule for parsing generic options has been expanded to handle these special cases correctly, allowing for the proper parsing of options such as `OPTIMIZE FOR UNKNOWN` in a `SELECT` statement. Additionally, a new case has been added to the OptionBuilder class to handle the `FOR` keyword and elide it, converting `OPTIMIZE FOR UNKNOWN` to `OPTIMIZE` with an id of 'UNKNOWN'. This ensures the proper handling of options containing `FOR` and avoids any conflicts with the `FOR` clause in T-SQL statements. This change was implemented by Valentin Kasas and involves adding a new alternative to the `genericOption` rule in the TSqlParser.g4 file, but no new methods have been added.
* Updated Dialect Variable Name ([#535](#535)). In this release, the `source` variable name in the `QueryBuilder` class, which refers to the `Dialect` instance, has been updated to `engine` to accurately reflect its meaning as referring to either `Source` or `Target`. This change includes updating the usage of `source` to `engine` in the `build_query`, `_get_with_clause`, and `build_threshold_query` methods, as well as removing unnecessary parentheses in a list. These changes improve the code's clarity, accuracy, and readability, while maintaining the overall functionality of the affected methods.
* Use Oracle library only if the recon source is Oracle ([#532](#532)). In this release, we have added a new `ReconcileConfig` configuration object and a `SourceType` enumeration in the `databricks.labs.remorph.config` and `databricks.labs.remorph.reconcile.constants` modules, respectively. These objects are introduced to determine whether to include the Oracle JDBC driver library in a reconciliation job's task libraries. The `deploy_job` method and the `_job_recon_task` method have been updated to use the `_recon_config` attribute to decide whether to include the Oracle JDBC driver library. Additionally, the `_deploy_reconcile_job` method in the `install.py` file has been modified to include a new parameter called `reconcile`, which is passed as an argument from the `_config` object. This change enhances the flexibility and customization of the reconcile job deployment. Furthermore, new fixtures `oracle_recon_config` and `snowflake_reconcile_config` have been introduced for `ReconcileConfig` objects with Oracle and Snowflake specific configurations, respectively. These fixtures are used in the test functions for deploying jobs, ensuring that the tests are more focused and better reflect the actual behavior of the code.
* [chore] Make singletons for the relevant `DataType` instances ([#705](#705)). This commit introduces case objects for various data types, such as NullType, StringType, and others, effectively making them singletons. This change simplifies the creation of data type instances and ensures that each type has a single instance throughout the application. The new case objects are utilized in building SQL expressions, affecting functions such as Cast and TRY_CAST in the TSqlExpressionBuilder class. Additionally, the test file TSqlExpressionBuilderSpec.scala has been updated to include the new case objects and handle errors by returning null. The data types tested include integer types, decimal types, date and time types, string types, binary types, and JSON. The primary goal of this change is to improve the management and identification of data types in the codebase, as well as to enhance code readability and maintainability.

Dependency updates:

 * Bump sqlglot from 25.1.0 to 25.5.1 ([#534](#534)).
 * Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 ([#555](#555)).
 * Bump sqlglot from 25.5.1 to 25.6.1 ([#585](#585)).
@nfx nfx mentioned this pull request Jul 30, 2024
nfx added a commit that referenced this pull request Jul 30, 2024
* Added TSql transpiler
([#734](#734)). In this
release, we have developed a new open-source library feature that
enhances the transpilation of T-SQL code to Databricks-compatible code.
The new TSqlToDatabricksTranspiler class has been added, which extends
the Transpiler abstract class and defines the transpile method. This
method converts T-SQL code to Databricks-compatible code by creating a
lexer, token stream, parser, and parsed tree from the input string using
TSqlLexer, CommonTokenStream, and tSqlFile. The parsed tree is then
passed to the TSqlAstBuilder's visit method to generate a logical plan,
which is optimized using an optimizer object with rules such as
PullLimitUpwards and TopPercentToLimitSubquery. The optimized plan is
then passed to the LogicalPlanGenerator's generate method to generate
the final transpiled code. Additionally, a new class,
IsTranspiledFromTSqlQueryRunner, has been added to the QueryRunner
object to transpile T-SQL queries into the format expected by the
Databricks query runner using the new TSqlToDatabricksTranspiler. The
AcceptanceTestRunner class in the com.databricks.labs.remorph.coverage
package has been updated to replace TSqlAstBuilder with
IsTranspiledFromTSqlQueryRunner in the TSqlAcceptanceSuite class,
indicating a change in the code responsible for handling TSql queries.
This new feature aims to provide a smooth and efficient way to convert
T-SQL code to Databricks-compatible code for further processing and
execution.
* Added the missing info for the reconciliation documentation
([#520](#520)). In this
release, we have made significant improvements to the reconciliation
feature of our open-source library. We have added a configuration folder
and provided a template for creating configuration files for specific
table sources. The config files will contain necessary configurations
for table-specific reconciliation. Additionally, we have included a note
in the transformation section detailing the usage of user-defined
functions (UDFs) in transformation expressions, with an example UDF
called `sort_array_input()` provided. The reconcile configuration sample
documentation has also been added, with a JSON configuration for
reconciling tables using various operations like drop, join,
transformation, threshold, filter, and JDBC ReaderOptions. The commit
also includes examples of source and target tables, data overviews, and
reconciliation configurations for various scenarios, such as basic
config and column mapping, user transformations, explicit select,
explicit drop, filters, and thresholds comparison, among others. These
changes aim to make it easier for users to set up and execute the
reconciliation process for specific table sources and provide clear and
concise information about using UDFs in transformation expressions for
reconciliation. This commit is co-authored by Vijay Pavan Nissankararao
and SundarShankar89.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0
([#555](#555)). In this
pull request, the sigstore/gh-action-sigstore-python dependency is being
updated from version 2.1.1 to 3.0.0. This new version includes several
changes and improvements, such as the addition of recursive globbing
with ** to the inputs and the removal of certain settings like
fulcio-url, rekor-url, ctfe, and rekor-root-pubkey. The signature,
certificate, and bundle output settings have also been removed.
Furthermore, the way inputs are parsed has been changed, and they are
now made optional under certain conditions. The default suffix has been
updated to .sigstore.json. The 3.0.0 version also resolves various
deprecations present in sigstore-python's 2.x series and supports CI
runners that use PEP 668 to constrain global package prefixes.
* Bump sqlglot from 25.1.0 to 25.5.1
([#534](#534)). In the
latest release, the `sqlglot` package has been updated from version
25.1.0 to 25.5.1, which includes bug fixes, breaking changes, new
features, and refactors for parsing, analyzing, and rewriting SQL
queries. The new version introduces optimizations for coalesced USING
columns, preserves EXTRACT(date_part FROM datetime) calls, decouples
NVL() from COALESCE(), and supports FROM CHANGES in Snowflake. It also
provides configurable transpilation of Snowflake VARIANT, and supports
view schema binding options for Spark and Databricks. The update
addresses several issues, such as the use of timestamp with time zone
over timestamptz, switch off table alias columns generation, and parse
rhs of x::varchar(max) into a type. Additionally, the update cleans up
CurrentTimestamp generation logic for Teradata. The `sqlglot` dependency
has also been updated in the 'experimental.py' file of the
'databricks/labs/remorph/snow' module, along with the addition of a new
private method `_parse_json` to the `Generator` class for parsing JSON
data. Software engineers should review the changes and update their code
accordingly, as conflicts with existing code will be resolved
automatically by Dependabot, as long as the pull request is not altered
manually.
* Bump sqlglot from 25.5.1 to 25.6.1
([#585](#585)). In this
release, the `sqlglot` dependency is updated from version 25.5.1 to
25.6.1 in the 'pyproject.toml' file. This update includes bug fixes,
breaking changes, new features, and improvements. Breaking changes
consist of updates to the QUALIFY clause in queries and the
canonicalization of struct and array inline constructor. New features
include support for ORDER BY ALL, FROM ROWS FROM (...), RPAD & LPAD
functions, and exp.TimestampAdd. Bug fixes address issues related to the
QUALIFY clause in queries, expansion of SELECT * REPLACE, RENAME,
transpiling UDFs from Databricks, and more. The pull request also
includes a detailed changelog, commit history, instructions for
triggering Dependabot actions, and commands for reference, with the
exception of the compatibility score for the new version, which is not
taken into account in this pull request.
* Feature/reconcile table mismatch threshold
([#550](#550)). This
commit enhances the reconciliation process in the open-source library
with several new features, addressing issue
[#504](#504). A new
`get_record_count` method is added to the `Reconcile` class, providing
record count data for source and target tables, facilitating
comprehensive analysis of table mismatches. A `CountQueryBuilder` class
is introduced to build record count queries for different layers and SQL
dialects, ensuring consistency in data processing. The `Thresholds`
class is refactored into `ColumnThresholds` and `TableThresholds`,
allowing for more granular control over comparisons and customizable
threshold settings. New methods `_is_mismatch_within_threshold_limits`
and `_insert_into_metrics_table` are added to the `recon_capture.py`
file, improving fine-grained control over the reconciliation process and
preventing false positives. Additionally, new classes, methods, and data
structures have been implemented in the `execute` module to handle
reconciliation queries and data more efficiently. These improvements
contribute to a more accurate and robust reconciliation system.
* Feature: introduce core transpiler
([#715](#715)). A new
core transpiler, `SnowflakeToDatabricksTranspiler`, has been introduced
to convert Snowflake queries into Databricks SQL, streamlining
integration and compatibility between the two systems. This transpiler
is integrated into the coverage test suites for thorough testing, and is
used to convert various types of logical plans, handling cases such as
`Batch`, `WithCTE`, `Project`, `NamedTable`, `Filter`, and `Join`. The
`SnowflakeToDatabricksTranspiler` class tokenizes input Snowflake query
strings, initializes a `SnowflakeParser` instance, parses the input
Snowflake query, generates a logical plan, and applies the
`LogicalPlanGenerator` to the logical plan to generate the equivalent
Databricks SQL query. Additionally, the `SnowflakeAstBuilder` class has
been updated to alter the way `Batch` logical plans are built and
improve overall functionality of the transpiler.
* Fixed LEFT and RIGHT JOIN syntax in Snowflake ANTLR grammar
([#526](#526)). A fix
has been implemented to address issues with the Snowflake ANTLR grammar
related to the proper parsing of LEFT and RIGHT JOIN statements.
Previously, the keywords LEFT and RIGHT were incorrectly allowed as
identifiers, but they are hard keywords that must be escaped to be used
as column names. This change updates the grammar to escape these
keywords in JOIN statements, improving the overall parsing of queries
that include LEFT and RIGHT JOINs. Additionally, semantic predicates
have been suggested to handle cases where LEFT or RIGHT are used as
column names without escaping, although this is not yet implemented. To
ensure the correctness of the updated grammar, new tests have been added
to the SnowflakeAstBuilderSpec for LEFT and RIGHT JOINs, which check
that the Abstract Syntax Tree (AST) is built correctly for these
queries.
* Fixed Snowflake Acceptance Testcases Failures
([#531](#531)). In this
release, updates have been made to the acceptance testcases for various
SQL functions in the open-source library. The DENSE RANK function's
testcase has been updated with a window specification and ORDER BY
clause in both Snowflake and Databricks SQL syntaxes, ensuring accurate
test results. The LAG function's testcase now includes a PARTITION BY
and ORDER BY clause, as well as the NULLS LAST keyword in Databricks
SQL, for improved accuracy and consistency. The SQL queries in the
Snowflake testcase for the `last_value` function have been updated with
a window specification, ORDER BY clause, and NULLS LAST directive for
Databricks SQL. Test case failures in the Snowflake acceptance testsuite
have been addressed with updates to the LEAD function, MONTH_NAME to
MONTHNAME renaming, and DATE_FORMAT to TO_DATE conversion, improving
reliability and consistency. The ntile function's testcase has been
updated with PARTITION BY and ORDER BY clauses, and the NULLS LAST
keyword has been added to the Databricks SQL query. The SQL query for
null-safe equality comparison has been updated with a conditional
expression compatible with Snowflake. The ranking function's testcase
has been improved with the appropriate partition and order by clauses,
and the NULLS LAST keyword has been added to the Databricks SQL query,
enhancing accuracy and consistency. Lastly, updates have been made to
the ROW_NUMBER function's testcase, ensuring accurate and consistent row
numbering for both Snowflake and Databricks.
* Fixed TSQL transpiler
([#735](#735)). In this
release, we have implemented a fix for the TSQL transpiler, addressing
the issue [#7](#7). This
enhancement allows the library to accurately convert TSQL code into an
equivalent format that is compatible with other databases. The fix
resolves reported bugs related to incorrect syntax interpretation,
thereby improving the overall reliability and functionality of the
transpiler. Software engineers and developers relying on TSQL
compatibility for cross-database operations will benefit from this
improvement. We encourage users to test and provide feedback on this
updated feature.
* Fixed `SELECT TOP X PERCENT` IR translation for TSQL
([#733](#733)). In this
release, we have made several enhancements to the open-source library to
improve compatibility with T-SQL and Catalyst. We have added a new
dependency, `pprint_${scala.binary.version}` version 0.8.1 from the
`com.lihaoyi` group, to provide advanced pretty-printing functionality
for Scala. We have also fixed the translation of the TSQL `SELECT TOP X
PERCENT` feature in the parser for intermediate expressions, addressing
the difference in syntax between TSQL and SQL for limiting the number of
rows returned by a query. Additionally, we have modified the
implementation of the `WITH` clause and added a new expression for the
`SELECT TOP` clause in T-SQL, improving the compatibility of the
codebase with T-SQL and aligning it with Catalyst. We also introduced a
new abstract class `Rule` and a case class `Rules` in the
`com.databricks.labs.remorph.parsers.intermediate` package to fix the
`SELECT TOP X PERCENT` IR translation for TSQL by adding new rules.
Furthermore, we have added a new Scala file, `subqueries.scala`,
containing abstract class `SubqueryExpression` and two case classes that
extend it, and made changes to the `trees.scala` file to improve the
tree string representation for better readability and consistency in the
codebase. These changes aim to improve the overall functionality of the
library and make it easier for new users to understand and adopt the
project.
* Fixed invalid null constraint and FQN
([#517](#517)). In this
change, we addressed issues
[#516](#516) and
[#517](#517), which
involved resolving an invalid null constraint and correcting a fully
qualified name (FQN) in our open-source library. The `read_data`
function in the `databricks.py` file was updated to improve null
constraint handling and ensure the FQN is valid. Previously, the catalog
was always appended to the table name, potentially resulting in an
invalid FQN and null constraint issues. Now, the code checks if the
catalog exists before appending it to the table name, and if not
provided, the schema and table name are concatenated directly.
Additionally, we removed the NOT NULL constraint from the catalog field
in the source_table and target_table structs in the main SQL file,
allowing null values for this field. These changes maintain backward
compatibility and enhance the overall functionality and robustness of
the project, ensuring accurate query results and avoiding potential
errors.
* Generate SQL for arithmetic operators
([#726](#726)). In this
release, we have introduced a new private method `arithmetic` to the
`ExpressionGenerator` class that generates SQL for arithmetic
operations, including unary minus, unary plus, multiplication, division,
modulo, addition, and subtraction. This improves the readability and
maintainability of the code by separating concerns and making the
functionality more explicit. Additionally, we have introduced a new
trait named `Arithmetic` to group arithmetic expressions together, which
enables easier manipulation and identification of arithmetic expressions
in the code. A new test suite has also been added for arithmetic
operations in the `ExpressionGenerator` class, which improves test
coverage and ensures the correct SQL is generated for these operations.
These changes provide a welcome addition for developers looking to
extend the functionality of the `ExpressionGenerator` class for
arithmetic operations.
* Generate SQL for bitwise operators. The ExpressionGenerator class in
the remorph project has been updated to support generating SQL for
bitwise operators (OR, AND, XOR, NOT) through the addition of a new
private method `bitwise` that converts bitwise operations to equivalent
SQL expressions. The `expression` method has also been updated to
utilize the new `bitwise` method for any input ir.Expression that is a
bitwise operation. To facilitate this change, a new trait called
`Bitwise` and updated case classes for bitwise operations, including
`BitwiseNot`, `BitwiseAnd`, `BitwiseOr`, and `BitwiseXor`, have been
implemented. The updated case classes extend the new `Bitwise` trait and
include the `dataType` override method to return the data type of the
left expression. A new test case in ExpressionGeneratorTest for bitwise
operators has been added to validate the functionality, and the
`expression` method in ExpressionGenerator now utilizes the
GeneratorContext() instead of new GeneratorContext(). These changes
enable the ExpressionGenerator to generate SQL code for bitwise
operations, expanding its capabilities.
* Generate `.. LIKE ..`
([#723](#723)). In this
commit, the `ExpressionGenerator` class has been enhanced with a new
method, `like`, which generates SQL `LIKE` expressions. The method takes
a `GeneratorContext` object and an `ir.Like` object as arguments, and
returns a string representation of the `LIKE` expression. It uses the
`expression` method to generate the left and right sides of the `LIKE`
operator, and also handles the optional escape character. Additionally,
the `timestampLiteral` and `dateLiteral` methods have been updated to
take an `ir.Literal` object and better handle `NULL` values. A new test
case has also been added for the `ExpressionGenerator` class, which
checks the `like` function and includes examples for basic usage and
usage with an escape character. This commit improves the functionality
of the `ExpressionGenerator` class, allowing it to handle `LIKE`
expressions and better handle `NULL` values for timestamps and dates.
* Generate `DISTINCT`, `*`
([#739](#739)). In this
release, we've enhanced the ExpressionGenerator class to support
generating `DISTINCT` and `*` (star) expressions in SQL queries.
Previously, the class did not handle these cases, resulting in
incomplete or incorrect SQL queries. With the introduction of the
`distinct` method, the class can now generate the `DISTINCT` keyword
followed by the expression to be applied to, and the `star` method
produces the `*` symbol, optionally followed by the name of the object
(table or subquery) to which it applies. These improvements make the
ExpressionGenerator class more robust and compatible with various SQL
dialects, resulting in more accurate query outcomes. We've also added
new test cases for the `ExpressionGenerator` class to ensure that
`DISTINCT` and `*` expressions are generated correctly. Additionally,
support for generating SQL `DISTINCT` and `*` (wildcard) has been added
to the transpilation of Logical Plans to SQL, specifically in the
`ir.Project` class. This ensures that the correct SQL `SELECT * FROM
table` syntax is generated when a wildcard is used in the expression
list. These enhancements significantly improve the functionality and
compatibility of our open-source library.
* Generate `LIMIT`
([#732](#732)). A new
method has been added to generate a SQL LIMIT clause for a logical plan
in the data transformation tool. A new `case` branch has been
implemented in the `generate` method of the `LogicalPlanGenerator` class
to handle the `ir.Limit` case, which generates the SQL LIMIT clause. If
a percentage limit is specified, the tool will throw an exception as it
is not currently supported. The `generate` method of the
`ExpressionGenerator` class has been replaced with a new `generate`
method in the `ir.Project`, `ir.Filter`, and new `ir.Limit` cases to
ensure consistent expression generation. A new case class `Limit` has
been added to the
`com.databricks.labs.remorph.parsers.intermediate.relations` package,
which extends the `UnaryNode` class and has four parameters: `input`,
`limit`, `is_percentage`, and `with_ties`. This new class enables
limiting the number of rows returned by a query, with the ability to
specify a percentage of rows or include ties in the result set.
Additionally, a new test case has been added to the
`LogicalPlanGeneratorTest` class to verify the transpilation of a
`Limit` node to its SQL equivalent, ensuring that the `Limit` node is
correctly handled during transpilation.
* Generate `OFFSET` SQL clauses
([#736](#736)). The
Remorph project's latest update introduces a new OFFSET clause
generation feature for SQL queries in the LogicalPlanGenerator class.
This change adds support for skipping a specified number of rows before
returning results in a SQL query, enhancing the library's query
generation capabilities. The implementation includes a new case in the
match statement of the generate function to handle ir.Offset nodes,
creating a string representation of the OFFSET clause using the provide
offset expression. Additionally, the commit includes a new test case in
the LogicalPlanGeneratorTest class to validate the OFFSET clause
generation, ensuring that the LogicalPlanGenerator can translate Offset
AST nodes into corresponding SQL statements. Overall, this update
enables the generation of more comprehensive SQL queries with OFFSET
support, providing software engineers with greater flexibility for
pagination and other data processing tasks.
* Generate `ORDER BY` SQL clauses
([#737](#737)). This
commit introduces new classes and enumerations for sort direction and
null ordering, as well as an updated SortOrder case class, enabling the
generation of ORDER BY SQL clauses. The LogicalPlanGenerator and
SnowflakeExpressionBuilder classes have been modified to utilize these
changes, allowing for more flexible and customizable sorting and null
ordering when generating SQL queries. Additionally, the
TSqlRelationBuilderSpec test suite has been updated to reflect these
changes, and new test cases have been added to ensure the correct
transpilation of ORDER BY clauses. Overall, these improvements enhance
the Remorph project's capability to parse and generate SQL expressions
with various sorting scenarios, providing a more robust and maintainable
codebase.
* Generate `UNION` / `EXCEPT` / `INTERSECT`
([#731](#731)). In this
release, we have introduced support for the `UNION`, `EXCEPT`, and
`INTERSECT` set operations in our data processing system's generator. A
new `unknown` method has been added to the `Generator` trait to return a
`TranspileException` when encountering unsupported operations, allowing
for better error handling and more informative error messages. The
`LogicalPlanGenerator` class in the remorph project has been extended to
support generating `UNION`, `EXCEPT`, and `INTERSECT` SQL operations
with the addition of a new parameter, `explicitDistinct`, to enable
explicit specification of `DISTINCT` for these set operations. A new
test suite has been added to the `LogicalPlanGenerator` to test the
generation of these operations using the `SetOperation` class, which now
has four possible set operations: `UnionSetOp`, `IntersectSetOp`,
`ExceptSetOp`, and `UnspecifiedSetOp`. With these changes, our system
can handle a wider range of input and provide meaningful error messages
for unsupported operations, making it more versatile in handling complex
SQL queries.
* Generate `VALUES` SQL clauses
([#738](#738)). The
latest commit introduces a new feature to generate `VALUES` SQL clauses
in the context of the logical plan generator. A new case branch has been
implemented in the `generate` method to manage `ir.Values` expressions,
converting input data (lists of lists of expressions) into a string
representation compatible with `VALUES` clauses. The existing
functionality remains unchanged. Additionally, a new test case has been
added for the `LogicalPlanGenerator` class, which checks the correct
transpilation to `VALUES` SQL clauses. This test case ensures that the
`ir.Values` method, which takes a sequence of sequences of literals,
generates the corresponding `VALUES` SQL clause, specifically checking
the input `Seq(Seq(ir.Literal(1), ir.Literal(2)), Seq(ir.Literal(3),
ir.Literal(4)))` against the SQL clause `"VALUES (1,2), (3,4)"`. This
change enables testing the functionality of generating `VALUES` SQL
clauses using the `LogicalPlanGenerator` class.
* Generate predicate expressions. This commit introduces the generation
of predicate expressions as part of the SQL ExpressionGenerator in the
`com.databricks.labs.remorph.generators.sql` package, enabling the
creation of more complex SQL expressions. The changes include the
addition of a new private method, `predicate(ctx: GeneratorContext,
expr: Expression)`, to handle predicate expressions, and the
introduction of two new predicate expression types, LessThan and
LessThanOrEqual, for comparing the relative ordering of two expressions.
Existing predicate expression types have been updated with consistent
naming. Additionally, the commit incorporates improvements to the
handling of comparison operators in the SnowflakeExpressionBuilder and
TSqlExpressionBuilder classes, addressing bugs and ensuring precise
predicate expression generation. The `ParserTestCommon` trait has also
been updated to reorder certain operators in a logical plan, maintaining
consistent comparison results in tests. New test cases have been added
to several test suites to ensure the correct interpretation and
generation of predicate expressions involving different data types and
search conditions. Overall, these enhancements provide more fine-grained
comparison of expressions, enable more nuanced condition checking, and
improve the robustness and accuracy of the SQL expression generation
process.
* Merge remote-tracking branch 'origin/main'. In this update, the
`ExpressionGenerator` class in the
`com.databricks.labs.remorph.generators.sql` package has been enhanced
with two new private methods: `dateLiteral` and `timestampLiteral`.
These methods are designed to generate the SQL literal representation of
`DateType` and `TimestampType` expressions, respectively. The
introduction of these methods addresses the previous limitations of
formatting date and timestamp values directly, which lacked
extensibility and required duplicated code for handling null values. By
extracting the formatting logic into separate methods, this commit
significantly improves code maintainability and reusability, enhancing
the overall readability and understandability of the
`ExpressionGenerator` class for developers. The `dateLiteral` method
handles `DateType` values by formatting them using the `dateFormat`
`SimpleDateFormat` instance, returning `NULL` if the value is missing.
Likewise, the `timestampLiteral` method formats `TimestampType` values
using the `timeFormat` `SimpleDateFormat` instance, returning `NULL` if
the value is missing. These methods will enable developers to grasp the
code's functionality more easily and make future enhancements to the
class.
* Modified dataclass for table threshold and added documentation
([#714](#714)). A series
of modifications have been implemented to enhance the threshold
configuration and validation for table reconciliation in the open-source
library. The `TableThresholds` dataclass has been updated to accept a
string for the `model` attribute, replacing the previously used
`TableThresholdModel` Enum. Additionally, a new
`validate_threshold_model` method has been added to `TableThresholds` to
ensure proper validation of the `model` attribute. A new exception
class, `InvalidModelForTableThreshold`, has been introduced to handle
invalid settings. Column-specific thresholds can now be set using the
`ColumnThresholds` configuration option. The `recon_capture.py` and
`recon_config.py` files have been updated accordingly, and the
documentation has been revised to clarify these changes. These
improvements offer greater flexibility and control for users configuring
thresholds while also refining validation and error handling.
* Support CTAS in TSQL Grammar and add more comparison operators
([#545](#545)). In this
release, we have added support for the CTAS (CREATE TABLE AS) statement
in the TSQL (T-SQL) grammar, as well as introduced new comparison
operators: !=, !<, and !>. The CTAS statement allows for the creation of
a new table by selecting from an existing table or query, potentially
improving code readability and performance. The new comparison operators
provide alternative ways of expressing inequalities, increasing
flexibility for developers. The keyword `REPLICATE` has also been added
for creating a full copy of a database or availability group. These
changes enhance the overall functionality of the TSQL grammar and
improve the user's ability to express various operations in TSQL. The
CTAS statement is implemented as a new rule, and the new comparison
operators are added as methods in the TSqlExpressionBuilder class. These
changes provide increased capability and flexibility for TSQL parsing
and query handling. The new methods are not adding any new external
dependencies, and the project remains self-contained. The additions have
been tested with the TSqlExpressionBuilderSpec test suite, ensuring the
functionality and compatibility of the TSQL parser.
* Support translation of TSQL INGORE NULLS clause in windowing functions
([#511](#511)). The
latest change introduces support for translating the TSQL IGNORE NULLS
and RESPECT NULLS clauses in windowing functions to their equivalents in
Databricks SQL. In TSQL, these clauses appear after the function name
and before the OVER clause, affecting how the functions handle null
values. Databricks SQL represents this functionality with an optional
trailing boolean parameter for specific windowing functions. With this
update, when the IGNORE NULLS clause is specified in TSQL, a boolean
option is appended to the corresponding Databicks SQL windowing
functions, with RESPECT NULLS as the default. This enhancement is
facilitated by a new private method, `buildNullIgnore`, which adds the
boolean parameter to the original expression when IGNORE NULLS is
specified in the OVER clause. The alteration is exemplified in new test
examples for the TSqlFunctionSpec, which include testing the LEAD
function with and without the IGNORE NULLS clause, and updates to the
translation of functions with non-standard syntax.
* TSQL: Implement TSQL UPDATE/DELETE statements
([#540](#540)). In this
release, we have added support for TSQL UPDATE and DELETE statements in
all syntactical forms, including UDF column transformations, in the
TSqlParser.g4 file. The implementation of both statements is done in a
single file and they share many common clauses. We have also introduced
two new case classes, UpdateTable and MergeTables, in the
extensions.scala file to implement the TSQL UPDATE and DELETE
statements, respectively. Additionally, we have added new methods to
handle various clauses and elements associated with these statements in
the TSqlErrorStrategy class. A series of tests have been included to
ensure the correct translation of various UPDATE and DELETE queries to
their respective Abstract Syntax Trees (ASTs). These changes bring TSQL
UPDATE and DELETE statement functionality to the project and allow for
their use in a variety of contexts, providing developers with more
flexibility and control when working with TSQL UPDATE and DELETE
statements in the parser.
* TSQL: Implement translation of INSERT statement
([#515](#515)). In this
release, we have implemented the TSQL INSERT statement in its entirety,
including all target options, optional clauses, and Common Table
Expressions (CTEs) in our open-source library. The change includes
updates to the TSqlParser.g4 file to support the INSERT statement's
various clauses, such as TOP, INTO, WITH TABLE HINTS, outputClause, and
optionClause. We have also added new case classes to the TSQL AST to
handle various aspects of the INSERT statement, including LocalVarTable,
Output, InsertIntoTable, DerivedRows, DefaultValues, and Default. The
TSqlExpressionBuilder and TSqlRelationBuilder classes have been updated
to support the new INSERT statement, including handling output column
lists, aliases, and JSON clauses. We have added specification tests to
TSqlAstBuilderSpec.scala to demonstrate the various ways that the INSERT
statement can be written and to ensure the correct translation of TSQL
INSERT statements into the remorph project.
* TSQL: Remove the SIGN fragment from numerical tokens
([#547](#547)). In this
release, we have made changes to the TSQL expression builder in the
remorph project that affect how negative and positive numerical literals
are parsed. Negative literals, such as -2, will now be parsed as
UMinus(Literal(2)) instead of Literal(-2), and positive literals, such
as +1, will be parsed as UPlus(Literal(1)) instead of Literal(1). This
change was made to address issue
[#546](#546), but it is
not an ideal solution as it may cause inconvenience in downstream
processes. The affected numerical tokens include INT, HEX, FLOAT, REAL,
and MONEY, which have been simplified by removing the SIGN fragment. We
have updated the buildPrimitive method to handle INT, REAL, and FLOAT
token types and ensure that numerical tokens are parsed correctly. We
intend to keep issue
[#546](#546) open for
further exploration of a better solution. The tests have been updated to
reflect these changes.
* TSQL: Simplifies named table tableSource, implements columnAlias list
([#512](#512)). This
change introduces significant updates to the TSqlParser's grammar for
tableSource, simplifying and consolidating rules related to table
aliases and column alias lists. A new Relation called TableWithHints has
been added to collect and process table hints, some of which have direct
counterparts in the Catalyst optimizer or can be used as comments for
migration purposes. The TSQLExpressionBuilder and TSqlRelationBuilder
classes have been modified to handle table hints and column aliases, and
the TSqlAstBuilderSpec test suite has been updated to include new tests
for table hints in T-SQL SELECT statements. These changes aim to improve
parsing, handling, and optimization of table sources, table hints, and
column aliases in TSQL queries.
* TSQL: Support generic FOR options
([#525](#525)). In this
release, we have added support for parsing T-SQL (Transact-SQL) options
that contain the keyword `FOR` using the standard syntax `[FOR]`. This
change is necessary as `FOR` cannot be used directly as an identifier
without escaping, as it would otherwise be seen as a table alias in a
`SELECT` statement. The ANTLR rule for parsing generic options has been
expanded to handle these special cases correctly, allowing for the
proper parsing of options such as `OPTIMIZE FOR UNKNOWN` in a `SELECT`
statement. Additionally, a new case has been added to the OptionBuilder
class to handle the `FOR` keyword and elide it, converting `OPTIMIZE FOR
UNKNOWN` to `OPTIMIZE` with an id of 'UNKNOWN'. This ensures the proper
handling of options containing `FOR` and avoids any conflicts with the
`FOR` clause in T-SQL statements. This change was implemented by
Valentin Kasas and involves adding a new alternative to the
`genericOption` rule in the TSqlParser.g4 file, but no new methods have
been added.
* Updated Dialect Variable Name
([#535](#535)). In this
release, the `source` variable name in the `QueryBuilder` class, which
refers to the `Dialect` instance, has been updated to `engine` to
accurately reflect its meaning as referring to either `Source` or
`Target`. This change includes updating the usage of `source` to
`engine` in the `build_query`, `_get_with_clause`, and
`build_threshold_query` methods, as well as removing unnecessary
parentheses in a list. These changes improve the code's clarity,
accuracy, and readability, while maintaining the overall functionality
of the affected methods.
* Use Oracle library only if the recon source is Oracle
([#532](#532)). In this
release, we have added a new `ReconcileConfig` configuration object and
a `SourceType` enumeration in the `databricks.labs.remorph.config` and
`databricks.labs.remorph.reconcile.constants` modules, respectively.
These objects are introduced to determine whether to include the Oracle
JDBC driver library in a reconciliation job's task libraries. The
`deploy_job` method and the `_job_recon_task` method have been updated
to use the `_recon_config` attribute to decide whether to include the
Oracle JDBC driver library. Additionally, the `_deploy_reconcile_job`
method in the `install.py` file has been modified to include a new
parameter called `reconcile`, which is passed as an argument from the
`_config` object. This change enhances the flexibility and customization
of the reconcile job deployment. Furthermore, new fixtures
`oracle_recon_config` and `snowflake_reconcile_config` have been
introduced for `ReconcileConfig` objects with Oracle and Snowflake
specific configurations, respectively. These fixtures are used in the
test functions for deploying jobs, ensuring that the tests are more
focused and better reflect the actual behavior of the code.
* [chore] Make singletons for the relevant `DataType` instances
([#705](#705)). This
commit introduces case objects for various data types, such as NullType,
StringType, and others, effectively making them singletons. This change
simplifies the creation of data type instances and ensures that each
type has a single instance throughout the application. The new case
objects are utilized in building SQL expressions, affecting functions
such as Cast and TRY_CAST in the TSqlExpressionBuilder class.
Additionally, the test file TSqlExpressionBuilderSpec.scala has been
updated to include the new case objects and handle errors by returning
null. The data types tested include integer types, decimal types, date
and time types, string types, binary types, and JSON. The primary goal
of this change is to improve the management and identification of data
types in the codebase, as well as to enhance code readability and
maintainability.

Dependency updates:

* Bump sqlglot from 25.1.0 to 25.5.1
([#534](#534)).
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0
([#555](#555)).
* Bump sqlglot from 25.5.1 to 25.6.1
([#585](#585)).
sundarshankar89 added a commit to sundarshankar89/remorph that referenced this pull request Jan 2, 2025
Bumps [sqlglot](https://github.com/tobymao/sqlglot) from 25.1.0 to
25.5.1.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/tobymao/sqlglot/blob/main/CHANGELOG.md">sqlglot's
changelog</a>.</em></p>
<blockquote>
<h2>[v25.5.1] - 2024-07-05</h2>
<h3>:bug: Bug Fixes</h3>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/2bdde2221b8017791ce4cc619abb2706464ca408"><code>2bdde22</code></a>
- <strong>optimizer</strong>: only qualify coalesced USING columns if
they exist in table schemas <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3740">#3740</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em></li>
</ul>
<h2>[v25.5.0] - 2024-07-04</h2>
<h3>:boom: BREAKING CHANGES</h3>
<ul>
<li>
<p>due to <a
href="https://github.com/tobymao/sqlglot/commit/8335ba10e60c7c63881d7559a6f1fada11b0e55d"><code>8335ba1</code></a>
- preserve EXTRACT(date_part FROM datetime) calls <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3729">#3729</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em>:</p>
<p>preserve EXTRACT(date_part FROM datetime) calls (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3729">#3729</a>)</p>
</li>
<li>
<p>due to <a
href="https://github.com/tobymao/sqlglot/commit/fb066a6167e1f887bd8c1a1369d063fe70f36a8a"><code>fb066a6</code></a>
- Decouple NVL() from COALESCE() <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3734">#3734</a>
by <a
href="https://github.com/VaggelisD"><code>@​VaggelisD</code></a>)</em>:</p>
<p>Decouple NVL() from COALESCE() (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3734">#3734</a>)</p>
</li>
</ul>
<h3>:sparkles: New Features</h3>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/0c032992fac622ebaee114cd9f6e405be1820054"><code>0c03299</code></a>
- <strong>teradata</strong>: random lower upper closes <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3721">#3721</a>
<em>(commit by <a
href="https://github.com/tobymao"><code>@​tobymao</code></a>)</em></li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/37b6e2d806f6da1338c75803919c602f8705acac"><code>37b6e2d</code></a>
- <strong>snowflake</strong>: add support for VECTOR(type, size) <em>(PR
<a
href="https://redirect.github.com/tobymao/sqlglot/pull/3724">#3724</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em></li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/1e07c4d29a43192fb57c120f3b9c1c2fa27d0fa6"><code>1e07c4d</code></a>
- <strong>presto, trino</strong>: Configurable transpilation of
Snowflake VARIANT <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3725">#3725</a>
by <a
href="https://github.com/VaggelisD"><code>@​VaggelisD</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>addresses issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3713">#3713</a>
opened by <a
href="https://github.com/Leonti"><code>@​Leonti</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/e5a53aaa015806574cd3c4bbe46b5788e960903e"><code>e5a53aa</code></a>
- <strong>snowflake</strong>: Support for FROM CHANGES <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3731">#3731</a>
by <a
href="https://github.com/VaggelisD"><code>@​VaggelisD</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>addresses issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3730">#3730</a>
opened by <a
href="https://github.com/achicoine-coveo"><code>@​achicoine-coveo</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/820d66430bb23bff88d0057b22842d313e1431c5"><code>820d664</code></a>
- <strong>presto</strong>: wrap md5 string arguments in to_utf8 <em>(PR
<a
href="https://redirect.github.com/tobymao/sqlglot/pull/3732">#3732</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>addresses issue <a
href="https://redirect.github.com/TobikoData/sqlmesh/issues/2855">#2855</a>
opened by <a
href="https://github.com/MikeWallis42"><code>@​MikeWallis42</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/912bc84791008ecce545cfbd3b0c9d4362131eb3"><code>912bc84</code></a>
- <strong>spark, databricks</strong>: Support view schema binding
options <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3739">#3739</a>
by <a
href="https://github.com/VaggelisD"><code>@​VaggelisD</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>addresses issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3738">#3738</a>
opened by <a
href="https://github.com/aersam"><code>@​aersam</code></a></em></li>
</ul>
</li>
</ul>
<h3>:bug: Bug Fixes</h3>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/3454f861f22f680f6b8c18cca466154d3b9fe8d1"><code>3454f86</code></a>
- <strong>teradata</strong>: use timestamp with time zone over
timestamptz <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3723">#3723</a>
by <a href="https://github.com/mtagle"><code>@​mtagle</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>fixes issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3722">#3722</a>
opened by <a
href="https://github.com/mtagle"><code>@​mtagle</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/f4a28721fd33edb3178c1d99746209dadfbba487"><code>f4a2872</code></a>
- <strong>clickhouse</strong>: switch off table alias columns generation
<em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3727">#3727</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>fixes issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3726">#3726</a>
opened by <a
href="https://github.com/cpcloud"><code>@​cpcloud</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/8335ba10e60c7c63881d7559a6f1fada11b0e55d"><code>8335ba1</code></a>
- <strong>clickhouse</strong>: preserve EXTRACT(date_part FROM datetime)
calls <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3729">#3729</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>fixes issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3728">#3728</a>
opened by <a
href="https://github.com/cpcloud"><code>@​cpcloud</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/fb066a6167e1f887bd8c1a1369d063fe70f36a8a"><code>fb066a6</code></a>
- <strong>oracle</strong>: Decouple NVL() from COALESCE() <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3734">#3734</a>
by <a
href="https://github.com/VaggelisD"><code>@​VaggelisD</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>fixes issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3733">#3733</a>
opened by <a
href="https://github.com/Hal-H2Apps"><code>@​Hal-H2Apps</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/c790c3b1fa274d7b0faf9f75e7dbc62bc4f55c67"><code>c790c3b</code></a>
- <strong>tsql</strong>: parse rhs of x::varchar(max) into a type
<em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3737">#3737</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em></li>
</ul>
<h3>:recycle: Refactors</h3>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/84416d207a2e397aba12a4138fcbd1fab382c22d"><code>84416d2</code></a>
- <strong>teradata</strong>: clean up CurrentTimestamp generation logic
<em>(commit by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em></li>
</ul>
<h2>[v25.4.1] - 2024-06-29</h2>
<h3>:bug: Bug Fixes</h3>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/6bf9853fd0b26d5a4e93e37447c3b275cd108872"><code>6bf9853</code></a>
- <strong>tsql</strong>: cast shorthand closes <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3760">#3760</a>
<em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3720">#3720</a>
by <a
href="https://github.com/tobymao"><code>@​tobymao</code></a>)</em></li>
</ul>
<h2>[v25.4.0] - 2024-06-28</h2>
<h3>:boom: BREAKING CHANGES</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/2bdde2221b8017791ce4cc619abb2706464ca408"><code>2bdde22</code></a>
Fix(optimizer): only qualify coalesced USING columns if they exist in
table s...</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/e449e3d50e2d8eb6cef87b8ebcd172329a6b79e9"><code>e449e3d</code></a>
docs: update API docs, CHANGELOG.md for v25.5.0 [skip ci]</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/912bc84791008ecce545cfbd3b0c9d4362131eb3"><code>912bc84</code></a>
feat(spark, databricks): Support view schema binding options (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3739">#3739</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/c790c3b1fa274d7b0faf9f75e7dbc62bc4f55c67"><code>c790c3b</code></a>
Fix(tsql): parse rhs of x::varchar(max) into a type (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3737">#3737</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/820d66430bb23bff88d0057b22842d313e1431c5"><code>820d664</code></a>
Feat(presto): wrap md5 string arguments in to_utf8 (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3732">#3732</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/fb066a6167e1f887bd8c1a1369d063fe70f36a8a"><code>fb066a6</code></a>
fix(oracle)!: Decouple NVL() from COALESCE() (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3734">#3734</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/e5a53aaa015806574cd3c4bbe46b5788e960903e"><code>e5a53aa</code></a>
feat(snowflake): Support for FROM CHANGES (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3731">#3731</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/1e07c4d29a43192fb57c120f3b9c1c2fa27d0fa6"><code>1e07c4d</code></a>
feat(presto, trino): Configurable transpilation of Snowflake VARIANT (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3725">#3725</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/8335ba10e60c7c63881d7559a6f1fada11b0e55d"><code>8335ba1</code></a>
Fix(clickhouse)!: preserve EXTRACT(date_part FROM datetime) calls (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3729">#3729</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/f4a28721fd33edb3178c1d99746209dadfbba487"><code>f4a2872</code></a>
Fix(clickhouse): switch off table alias columns generation (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3727">#3727</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/tobymao/sqlglot/compare/v25.1.0...v25.5.1">compare
view</a></li>
</ul>
</details>
<br />

<details>
<summary>Most Recent Ignore Conditions Applied to This Pull
Request</summary>

| Dependency Name | Ignore Conditions |
| --- | --- |
| sqlglot | [>= 24.a, < 25] |
</details>


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=sqlglot&package-manager=pip&previous-version=25.1.0&new-version=25.5.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: sundar.shankar <[email protected]>
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this pull request Jan 2, 2025
* Added TSql transpiler
([databrickslabs#734](databrickslabs#734)). In this
release, we have developed a new open-source library feature that
enhances the transpilation of T-SQL code to Databricks-compatible code.
The new TSqlToDatabricksTranspiler class has been added, which extends
the Transpiler abstract class and defines the transpile method. This
method converts T-SQL code to Databricks-compatible code by creating a
lexer, token stream, parser, and parsed tree from the input string using
TSqlLexer, CommonTokenStream, and tSqlFile. The parsed tree is then
passed to the TSqlAstBuilder's visit method to generate a logical plan,
which is optimized using an optimizer object with rules such as
PullLimitUpwards and TopPercentToLimitSubquery. The optimized plan is
then passed to the LogicalPlanGenerator's generate method to generate
the final transpiled code. Additionally, a new class,
IsTranspiledFromTSqlQueryRunner, has been added to the QueryRunner
object to transpile T-SQL queries into the format expected by the
Databricks query runner using the new TSqlToDatabricksTranspiler. The
AcceptanceTestRunner class in the com.databricks.labs.remorph.coverage
package has been updated to replace TSqlAstBuilder with
IsTranspiledFromTSqlQueryRunner in the TSqlAcceptanceSuite class,
indicating a change in the code responsible for handling TSql queries.
This new feature aims to provide a smooth and efficient way to convert
T-SQL code to Databricks-compatible code for further processing and
execution.
* Added the missing info for the reconciliation documentation
([databrickslabs#520](databrickslabs#520)). In this
release, we have made significant improvements to the reconciliation
feature of our open-source library. We have added a configuration folder
and provided a template for creating configuration files for specific
table sources. The config files will contain necessary configurations
for table-specific reconciliation. Additionally, we have included a note
in the transformation section detailing the usage of user-defined
functions (UDFs) in transformation expressions, with an example UDF
called `sort_array_input()` provided. The reconcile configuration sample
documentation has also been added, with a JSON configuration for
reconciling tables using various operations like drop, join,
transformation, threshold, filter, and JDBC ReaderOptions. The commit
also includes examples of source and target tables, data overviews, and
reconciliation configurations for various scenarios, such as basic
config and column mapping, user transformations, explicit select,
explicit drop, filters, and thresholds comparison, among others. These
changes aim to make it easier for users to set up and execute the
reconciliation process for specific table sources and provide clear and
concise information about using UDFs in transformation expressions for
reconciliation. This commit is co-authored by Vijay Pavan Nissankararao
and SundarShankar89.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0
([databrickslabs#555](databrickslabs#555)). In this
pull request, the sigstore/gh-action-sigstore-python dependency is being
updated from version 2.1.1 to 3.0.0. This new version includes several
changes and improvements, such as the addition of recursive globbing
with ** to the inputs and the removal of certain settings like
fulcio-url, rekor-url, ctfe, and rekor-root-pubkey. The signature,
certificate, and bundle output settings have also been removed.
Furthermore, the way inputs are parsed has been changed, and they are
now made optional under certain conditions. The default suffix has been
updated to .sigstore.json. The 3.0.0 version also resolves various
deprecations present in sigstore-python's 2.x series and supports CI
runners that use PEP 668 to constrain global package prefixes.
* Bump sqlglot from 25.1.0 to 25.5.1
([databrickslabs#534](databrickslabs#534)). In the
latest release, the `sqlglot` package has been updated from version
25.1.0 to 25.5.1, which includes bug fixes, breaking changes, new
features, and refactors for parsing, analyzing, and rewriting SQL
queries. The new version introduces optimizations for coalesced USING
columns, preserves EXTRACT(date_part FROM datetime) calls, decouples
NVL() from COALESCE(), and supports FROM CHANGES in Snowflake. It also
provides configurable transpilation of Snowflake VARIANT, and supports
view schema binding options for Spark and Databricks. The update
addresses several issues, such as the use of timestamp with time zone
over timestamptz, switch off table alias columns generation, and parse
rhs of x::varchar(max) into a type. Additionally, the update cleans up
CurrentTimestamp generation logic for Teradata. The `sqlglot` dependency
has also been updated in the 'experimental.py' file of the
'databricks/labs/remorph/snow' module, along with the addition of a new
private method `_parse_json` to the `Generator` class for parsing JSON
data. Software engineers should review the changes and update their code
accordingly, as conflicts with existing code will be resolved
automatically by Dependabot, as long as the pull request is not altered
manually.
* Bump sqlglot from 25.5.1 to 25.6.1
([databrickslabs#585](databrickslabs#585)). In this
release, the `sqlglot` dependency is updated from version 25.5.1 to
25.6.1 in the 'pyproject.toml' file. This update includes bug fixes,
breaking changes, new features, and improvements. Breaking changes
consist of updates to the QUALIFY clause in queries and the
canonicalization of struct and array inline constructor. New features
include support for ORDER BY ALL, FROM ROWS FROM (...), RPAD & LPAD
functions, and exp.TimestampAdd. Bug fixes address issues related to the
QUALIFY clause in queries, expansion of SELECT * REPLACE, RENAME,
transpiling UDFs from Databricks, and more. The pull request also
includes a detailed changelog, commit history, instructions for
triggering Dependabot actions, and commands for reference, with the
exception of the compatibility score for the new version, which is not
taken into account in this pull request.
* Feature/reconcile table mismatch threshold
([databrickslabs#550](databrickslabs#550)). This
commit enhances the reconciliation process in the open-source library
with several new features, addressing issue
[databrickslabs#504](databrickslabs#504). A new
`get_record_count` method is added to the `Reconcile` class, providing
record count data for source and target tables, facilitating
comprehensive analysis of table mismatches. A `CountQueryBuilder` class
is introduced to build record count queries for different layers and SQL
dialects, ensuring consistency in data processing. The `Thresholds`
class is refactored into `ColumnThresholds` and `TableThresholds`,
allowing for more granular control over comparisons and customizable
threshold settings. New methods `_is_mismatch_within_threshold_limits`
and `_insert_into_metrics_table` are added to the `recon_capture.py`
file, improving fine-grained control over the reconciliation process and
preventing false positives. Additionally, new classes, methods, and data
structures have been implemented in the `execute` module to handle
reconciliation queries and data more efficiently. These improvements
contribute to a more accurate and robust reconciliation system.
* Feature: introduce core transpiler
([databrickslabs#715](databrickslabs#715)). A new
core transpiler, `SnowflakeToDatabricksTranspiler`, has been introduced
to convert Snowflake queries into Databricks SQL, streamlining
integration and compatibility between the two systems. This transpiler
is integrated into the coverage test suites for thorough testing, and is
used to convert various types of logical plans, handling cases such as
`Batch`, `WithCTE`, `Project`, `NamedTable`, `Filter`, and `Join`. The
`SnowflakeToDatabricksTranspiler` class tokenizes input Snowflake query
strings, initializes a `SnowflakeParser` instance, parses the input
Snowflake query, generates a logical plan, and applies the
`LogicalPlanGenerator` to the logical plan to generate the equivalent
Databricks SQL query. Additionally, the `SnowflakeAstBuilder` class has
been updated to alter the way `Batch` logical plans are built and
improve overall functionality of the transpiler.
* Fixed LEFT and RIGHT JOIN syntax in Snowflake ANTLR grammar
([databrickslabs#526](databrickslabs#526)). A fix
has been implemented to address issues with the Snowflake ANTLR grammar
related to the proper parsing of LEFT and RIGHT JOIN statements.
Previously, the keywords LEFT and RIGHT were incorrectly allowed as
identifiers, but they are hard keywords that must be escaped to be used
as column names. This change updates the grammar to escape these
keywords in JOIN statements, improving the overall parsing of queries
that include LEFT and RIGHT JOINs. Additionally, semantic predicates
have been suggested to handle cases where LEFT or RIGHT are used as
column names without escaping, although this is not yet implemented. To
ensure the correctness of the updated grammar, new tests have been added
to the SnowflakeAstBuilderSpec for LEFT and RIGHT JOINs, which check
that the Abstract Syntax Tree (AST) is built correctly for these
queries.
* Fixed Snowflake Acceptance Testcases Failures
([databrickslabs#531](databrickslabs#531)). In this
release, updates have been made to the acceptance testcases for various
SQL functions in the open-source library. The DENSE RANK function's
testcase has been updated with a window specification and ORDER BY
clause in both Snowflake and Databricks SQL syntaxes, ensuring accurate
test results. The LAG function's testcase now includes a PARTITION BY
and ORDER BY clause, as well as the NULLS LAST keyword in Databricks
SQL, for improved accuracy and consistency. The SQL queries in the
Snowflake testcase for the `last_value` function have been updated with
a window specification, ORDER BY clause, and NULLS LAST directive for
Databricks SQL. Test case failures in the Snowflake acceptance testsuite
have been addressed with updates to the LEAD function, MONTH_NAME to
MONTHNAME renaming, and DATE_FORMAT to TO_DATE conversion, improving
reliability and consistency. The ntile function's testcase has been
updated with PARTITION BY and ORDER BY clauses, and the NULLS LAST
keyword has been added to the Databricks SQL query. The SQL query for
null-safe equality comparison has been updated with a conditional
expression compatible with Snowflake. The ranking function's testcase
has been improved with the appropriate partition and order by clauses,
and the NULLS LAST keyword has been added to the Databricks SQL query,
enhancing accuracy and consistency. Lastly, updates have been made to
the ROW_NUMBER function's testcase, ensuring accurate and consistent row
numbering for both Snowflake and Databricks.
* Fixed TSQL transpiler
([databrickslabs#735](databrickslabs#735)). In this
release, we have implemented a fix for the TSQL transpiler, addressing
the issue [databrickslabs#7](databrickslabs#7). This
enhancement allows the library to accurately convert TSQL code into an
equivalent format that is compatible with other databases. The fix
resolves reported bugs related to incorrect syntax interpretation,
thereby improving the overall reliability and functionality of the
transpiler. Software engineers and developers relying on TSQL
compatibility for cross-database operations will benefit from this
improvement. We encourage users to test and provide feedback on this
updated feature.
* Fixed `SELECT TOP X PERCENT` IR translation for TSQL
([databrickslabs#733](databrickslabs#733)). In this
release, we have made several enhancements to the open-source library to
improve compatibility with T-SQL and Catalyst. We have added a new
dependency, `pprint_${scala.binary.version}` version 0.8.1 from the
`com.lihaoyi` group, to provide advanced pretty-printing functionality
for Scala. We have also fixed the translation of the TSQL `SELECT TOP X
PERCENT` feature in the parser for intermediate expressions, addressing
the difference in syntax between TSQL and SQL for limiting the number of
rows returned by a query. Additionally, we have modified the
implementation of the `WITH` clause and added a new expression for the
`SELECT TOP` clause in T-SQL, improving the compatibility of the
codebase with T-SQL and aligning it with Catalyst. We also introduced a
new abstract class `Rule` and a case class `Rules` in the
`com.databricks.labs.remorph.parsers.intermediate` package to fix the
`SELECT TOP X PERCENT` IR translation for TSQL by adding new rules.
Furthermore, we have added a new Scala file, `subqueries.scala`,
containing abstract class `SubqueryExpression` and two case classes that
extend it, and made changes to the `trees.scala` file to improve the
tree string representation for better readability and consistency in the
codebase. These changes aim to improve the overall functionality of the
library and make it easier for new users to understand and adopt the
project.
* Fixed invalid null constraint and FQN
([databrickslabs#517](databrickslabs#517)). In this
change, we addressed issues
[databrickslabs#516](databrickslabs#516) and
[databrickslabs#517](databrickslabs#517), which
involved resolving an invalid null constraint and correcting a fully
qualified name (FQN) in our open-source library. The `read_data`
function in the `databricks.py` file was updated to improve null
constraint handling and ensure the FQN is valid. Previously, the catalog
was always appended to the table name, potentially resulting in an
invalid FQN and null constraint issues. Now, the code checks if the
catalog exists before appending it to the table name, and if not
provided, the schema and table name are concatenated directly.
Additionally, we removed the NOT NULL constraint from the catalog field
in the source_table and target_table structs in the main SQL file,
allowing null values for this field. These changes maintain backward
compatibility and enhance the overall functionality and robustness of
the project, ensuring accurate query results and avoiding potential
errors.
* Generate SQL for arithmetic operators
([databrickslabs#726](databrickslabs#726)). In this
release, we have introduced a new private method `arithmetic` to the
`ExpressionGenerator` class that generates SQL for arithmetic
operations, including unary minus, unary plus, multiplication, division,
modulo, addition, and subtraction. This improves the readability and
maintainability of the code by separating concerns and making the
functionality more explicit. Additionally, we have introduced a new
trait named `Arithmetic` to group arithmetic expressions together, which
enables easier manipulation and identification of arithmetic expressions
in the code. A new test suite has also been added for arithmetic
operations in the `ExpressionGenerator` class, which improves test
coverage and ensures the correct SQL is generated for these operations.
These changes provide a welcome addition for developers looking to
extend the functionality of the `ExpressionGenerator` class for
arithmetic operations.
* Generate SQL for bitwise operators. The ExpressionGenerator class in
the remorph project has been updated to support generating SQL for
bitwise operators (OR, AND, XOR, NOT) through the addition of a new
private method `bitwise` that converts bitwise operations to equivalent
SQL expressions. The `expression` method has also been updated to
utilize the new `bitwise` method for any input ir.Expression that is a
bitwise operation. To facilitate this change, a new trait called
`Bitwise` and updated case classes for bitwise operations, including
`BitwiseNot`, `BitwiseAnd`, `BitwiseOr`, and `BitwiseXor`, have been
implemented. The updated case classes extend the new `Bitwise` trait and
include the `dataType` override method to return the data type of the
left expression. A new test case in ExpressionGeneratorTest for bitwise
operators has been added to validate the functionality, and the
`expression` method in ExpressionGenerator now utilizes the
GeneratorContext() instead of new GeneratorContext(). These changes
enable the ExpressionGenerator to generate SQL code for bitwise
operations, expanding its capabilities.
* Generate `.. LIKE ..`
([databrickslabs#723](databrickslabs#723)). In this
commit, the `ExpressionGenerator` class has been enhanced with a new
method, `like`, which generates SQL `LIKE` expressions. The method takes
a `GeneratorContext` object and an `ir.Like` object as arguments, and
returns a string representation of the `LIKE` expression. It uses the
`expression` method to generate the left and right sides of the `LIKE`
operator, and also handles the optional escape character. Additionally,
the `timestampLiteral` and `dateLiteral` methods have been updated to
take an `ir.Literal` object and better handle `NULL` values. A new test
case has also been added for the `ExpressionGenerator` class, which
checks the `like` function and includes examples for basic usage and
usage with an escape character. This commit improves the functionality
of the `ExpressionGenerator` class, allowing it to handle `LIKE`
expressions and better handle `NULL` values for timestamps and dates.
* Generate `DISTINCT`, `*`
([databrickslabs#739](databrickslabs#739)). In this
release, we've enhanced the ExpressionGenerator class to support
generating `DISTINCT` and `*` (star) expressions in SQL queries.
Previously, the class did not handle these cases, resulting in
incomplete or incorrect SQL queries. With the introduction of the
`distinct` method, the class can now generate the `DISTINCT` keyword
followed by the expression to be applied to, and the `star` method
produces the `*` symbol, optionally followed by the name of the object
(table or subquery) to which it applies. These improvements make the
ExpressionGenerator class more robust and compatible with various SQL
dialects, resulting in more accurate query outcomes. We've also added
new test cases for the `ExpressionGenerator` class to ensure that
`DISTINCT` and `*` expressions are generated correctly. Additionally,
support for generating SQL `DISTINCT` and `*` (wildcard) has been added
to the transpilation of Logical Plans to SQL, specifically in the
`ir.Project` class. This ensures that the correct SQL `SELECT * FROM
table` syntax is generated when a wildcard is used in the expression
list. These enhancements significantly improve the functionality and
compatibility of our open-source library.
* Generate `LIMIT`
([databrickslabs#732](databrickslabs#732)). A new
method has been added to generate a SQL LIMIT clause for a logical plan
in the data transformation tool. A new `case` branch has been
implemented in the `generate` method of the `LogicalPlanGenerator` class
to handle the `ir.Limit` case, which generates the SQL LIMIT clause. If
a percentage limit is specified, the tool will throw an exception as it
is not currently supported. The `generate` method of the
`ExpressionGenerator` class has been replaced with a new `generate`
method in the `ir.Project`, `ir.Filter`, and new `ir.Limit` cases to
ensure consistent expression generation. A new case class `Limit` has
been added to the
`com.databricks.labs.remorph.parsers.intermediate.relations` package,
which extends the `UnaryNode` class and has four parameters: `input`,
`limit`, `is_percentage`, and `with_ties`. This new class enables
limiting the number of rows returned by a query, with the ability to
specify a percentage of rows or include ties in the result set.
Additionally, a new test case has been added to the
`LogicalPlanGeneratorTest` class to verify the transpilation of a
`Limit` node to its SQL equivalent, ensuring that the `Limit` node is
correctly handled during transpilation.
* Generate `OFFSET` SQL clauses
([databrickslabs#736](databrickslabs#736)). The
Remorph project's latest update introduces a new OFFSET clause
generation feature for SQL queries in the LogicalPlanGenerator class.
This change adds support for skipping a specified number of rows before
returning results in a SQL query, enhancing the library's query
generation capabilities. The implementation includes a new case in the
match statement of the generate function to handle ir.Offset nodes,
creating a string representation of the OFFSET clause using the provide
offset expression. Additionally, the commit includes a new test case in
the LogicalPlanGeneratorTest class to validate the OFFSET clause
generation, ensuring that the LogicalPlanGenerator can translate Offset
AST nodes into corresponding SQL statements. Overall, this update
enables the generation of more comprehensive SQL queries with OFFSET
support, providing software engineers with greater flexibility for
pagination and other data processing tasks.
* Generate `ORDER BY` SQL clauses
([databrickslabs#737](databrickslabs#737)). This
commit introduces new classes and enumerations for sort direction and
null ordering, as well as an updated SortOrder case class, enabling the
generation of ORDER BY SQL clauses. The LogicalPlanGenerator and
SnowflakeExpressionBuilder classes have been modified to utilize these
changes, allowing for more flexible and customizable sorting and null
ordering when generating SQL queries. Additionally, the
TSqlRelationBuilderSpec test suite has been updated to reflect these
changes, and new test cases have been added to ensure the correct
transpilation of ORDER BY clauses. Overall, these improvements enhance
the Remorph project's capability to parse and generate SQL expressions
with various sorting scenarios, providing a more robust and maintainable
codebase.
* Generate `UNION` / `EXCEPT` / `INTERSECT`
([databrickslabs#731](databrickslabs#731)). In this
release, we have introduced support for the `UNION`, `EXCEPT`, and
`INTERSECT` set operations in our data processing system's generator. A
new `unknown` method has been added to the `Generator` trait to return a
`TranspileException` when encountering unsupported operations, allowing
for better error handling and more informative error messages. The
`LogicalPlanGenerator` class in the remorph project has been extended to
support generating `UNION`, `EXCEPT`, and `INTERSECT` SQL operations
with the addition of a new parameter, `explicitDistinct`, to enable
explicit specification of `DISTINCT` for these set operations. A new
test suite has been added to the `LogicalPlanGenerator` to test the
generation of these operations using the `SetOperation` class, which now
has four possible set operations: `UnionSetOp`, `IntersectSetOp`,
`ExceptSetOp`, and `UnspecifiedSetOp`. With these changes, our system
can handle a wider range of input and provide meaningful error messages
for unsupported operations, making it more versatile in handling complex
SQL queries.
* Generate `VALUES` SQL clauses
([databrickslabs#738](databrickslabs#738)). The
latest commit introduces a new feature to generate `VALUES` SQL clauses
in the context of the logical plan generator. A new case branch has been
implemented in the `generate` method to manage `ir.Values` expressions,
converting input data (lists of lists of expressions) into a string
representation compatible with `VALUES` clauses. The existing
functionality remains unchanged. Additionally, a new test case has been
added for the `LogicalPlanGenerator` class, which checks the correct
transpilation to `VALUES` SQL clauses. This test case ensures that the
`ir.Values` method, which takes a sequence of sequences of literals,
generates the corresponding `VALUES` SQL clause, specifically checking
the input `Seq(Seq(ir.Literal(1), ir.Literal(2)), Seq(ir.Literal(3),
ir.Literal(4)))` against the SQL clause `"VALUES (1,2), (3,4)"`. This
change enables testing the functionality of generating `VALUES` SQL
clauses using the `LogicalPlanGenerator` class.
* Generate predicate expressions. This commit introduces the generation
of predicate expressions as part of the SQL ExpressionGenerator in the
`com.databricks.labs.remorph.generators.sql` package, enabling the
creation of more complex SQL expressions. The changes include the
addition of a new private method, `predicate(ctx: GeneratorContext,
expr: Expression)`, to handle predicate expressions, and the
introduction of two new predicate expression types, LessThan and
LessThanOrEqual, for comparing the relative ordering of two expressions.
Existing predicate expression types have been updated with consistent
naming. Additionally, the commit incorporates improvements to the
handling of comparison operators in the SnowflakeExpressionBuilder and
TSqlExpressionBuilder classes, addressing bugs and ensuring precise
predicate expression generation. The `ParserTestCommon` trait has also
been updated to reorder certain operators in a logical plan, maintaining
consistent comparison results in tests. New test cases have been added
to several test suites to ensure the correct interpretation and
generation of predicate expressions involving different data types and
search conditions. Overall, these enhancements provide more fine-grained
comparison of expressions, enable more nuanced condition checking, and
improve the robustness and accuracy of the SQL expression generation
process.
* Merge remote-tracking branch 'origin/main'. In this update, the
`ExpressionGenerator` class in the
`com.databricks.labs.remorph.generators.sql` package has been enhanced
with two new private methods: `dateLiteral` and `timestampLiteral`.
These methods are designed to generate the SQL literal representation of
`DateType` and `TimestampType` expressions, respectively. The
introduction of these methods addresses the previous limitations of
formatting date and timestamp values directly, which lacked
extensibility and required duplicated code for handling null values. By
extracting the formatting logic into separate methods, this commit
significantly improves code maintainability and reusability, enhancing
the overall readability and understandability of the
`ExpressionGenerator` class for developers. The `dateLiteral` method
handles `DateType` values by formatting them using the `dateFormat`
`SimpleDateFormat` instance, returning `NULL` if the value is missing.
Likewise, the `timestampLiteral` method formats `TimestampType` values
using the `timeFormat` `SimpleDateFormat` instance, returning `NULL` if
the value is missing. These methods will enable developers to grasp the
code's functionality more easily and make future enhancements to the
class.
* Modified dataclass for table threshold and added documentation
([databrickslabs#714](databrickslabs#714)). A series
of modifications have been implemented to enhance the threshold
configuration and validation for table reconciliation in the open-source
library. The `TableThresholds` dataclass has been updated to accept a
string for the `model` attribute, replacing the previously used
`TableThresholdModel` Enum. Additionally, a new
`validate_threshold_model` method has been added to `TableThresholds` to
ensure proper validation of the `model` attribute. A new exception
class, `InvalidModelForTableThreshold`, has been introduced to handle
invalid settings. Column-specific thresholds can now be set using the
`ColumnThresholds` configuration option. The `recon_capture.py` and
`recon_config.py` files have been updated accordingly, and the
documentation has been revised to clarify these changes. These
improvements offer greater flexibility and control for users configuring
thresholds while also refining validation and error handling.
* Support CTAS in TSQL Grammar and add more comparison operators
([databrickslabs#545](databrickslabs#545)). In this
release, we have added support for the CTAS (CREATE TABLE AS) statement
in the TSQL (T-SQL) grammar, as well as introduced new comparison
operators: !=, !<, and !>. The CTAS statement allows for the creation of
a new table by selecting from an existing table or query, potentially
improving code readability and performance. The new comparison operators
provide alternative ways of expressing inequalities, increasing
flexibility for developers. The keyword `REPLICATE` has also been added
for creating a full copy of a database or availability group. These
changes enhance the overall functionality of the TSQL grammar and
improve the user's ability to express various operations in TSQL. The
CTAS statement is implemented as a new rule, and the new comparison
operators are added as methods in the TSqlExpressionBuilder class. These
changes provide increased capability and flexibility for TSQL parsing
and query handling. The new methods are not adding any new external
dependencies, and the project remains self-contained. The additions have
been tested with the TSqlExpressionBuilderSpec test suite, ensuring the
functionality and compatibility of the TSQL parser.
* Support translation of TSQL INGORE NULLS clause in windowing functions
([databrickslabs#511](databrickslabs#511)). The
latest change introduces support for translating the TSQL IGNORE NULLS
and RESPECT NULLS clauses in windowing functions to their equivalents in
Databricks SQL. In TSQL, these clauses appear after the function name
and before the OVER clause, affecting how the functions handle null
values. Databricks SQL represents this functionality with an optional
trailing boolean parameter for specific windowing functions. With this
update, when the IGNORE NULLS clause is specified in TSQL, a boolean
option is appended to the corresponding Databicks SQL windowing
functions, with RESPECT NULLS as the default. This enhancement is
facilitated by a new private method, `buildNullIgnore`, which adds the
boolean parameter to the original expression when IGNORE NULLS is
specified in the OVER clause. The alteration is exemplified in new test
examples for the TSqlFunctionSpec, which include testing the LEAD
function with and without the IGNORE NULLS clause, and updates to the
translation of functions with non-standard syntax.
* TSQL: Implement TSQL UPDATE/DELETE statements
([databrickslabs#540](databrickslabs#540)). In this
release, we have added support for TSQL UPDATE and DELETE statements in
all syntactical forms, including UDF column transformations, in the
TSqlParser.g4 file. The implementation of both statements is done in a
single file and they share many common clauses. We have also introduced
two new case classes, UpdateTable and MergeTables, in the
extensions.scala file to implement the TSQL UPDATE and DELETE
statements, respectively. Additionally, we have added new methods to
handle various clauses and elements associated with these statements in
the TSqlErrorStrategy class. A series of tests have been included to
ensure the correct translation of various UPDATE and DELETE queries to
their respective Abstract Syntax Trees (ASTs). These changes bring TSQL
UPDATE and DELETE statement functionality to the project and allow for
their use in a variety of contexts, providing developers with more
flexibility and control when working with TSQL UPDATE and DELETE
statements in the parser.
* TSQL: Implement translation of INSERT statement
([databrickslabs#515](databrickslabs#515)). In this
release, we have implemented the TSQL INSERT statement in its entirety,
including all target options, optional clauses, and Common Table
Expressions (CTEs) in our open-source library. The change includes
updates to the TSqlParser.g4 file to support the INSERT statement's
various clauses, such as TOP, INTO, WITH TABLE HINTS, outputClause, and
optionClause. We have also added new case classes to the TSQL AST to
handle various aspects of the INSERT statement, including LocalVarTable,
Output, InsertIntoTable, DerivedRows, DefaultValues, and Default. The
TSqlExpressionBuilder and TSqlRelationBuilder classes have been updated
to support the new INSERT statement, including handling output column
lists, aliases, and JSON clauses. We have added specification tests to
TSqlAstBuilderSpec.scala to demonstrate the various ways that the INSERT
statement can be written and to ensure the correct translation of TSQL
INSERT statements into the remorph project.
* TSQL: Remove the SIGN fragment from numerical tokens
([databrickslabs#547](databrickslabs#547)). In this
release, we have made changes to the TSQL expression builder in the
remorph project that affect how negative and positive numerical literals
are parsed. Negative literals, such as -2, will now be parsed as
UMinus(Literal(2)) instead of Literal(-2), and positive literals, such
as +1, will be parsed as UPlus(Literal(1)) instead of Literal(1). This
change was made to address issue
[databrickslabs#546](databrickslabs#546), but it is
not an ideal solution as it may cause inconvenience in downstream
processes. The affected numerical tokens include INT, HEX, FLOAT, REAL,
and MONEY, which have been simplified by removing the SIGN fragment. We
have updated the buildPrimitive method to handle INT, REAL, and FLOAT
token types and ensure that numerical tokens are parsed correctly. We
intend to keep issue
[databrickslabs#546](databrickslabs#546) open for
further exploration of a better solution. The tests have been updated to
reflect these changes.
* TSQL: Simplifies named table tableSource, implements columnAlias list
([databrickslabs#512](databrickslabs#512)). This
change introduces significant updates to the TSqlParser's grammar for
tableSource, simplifying and consolidating rules related to table
aliases and column alias lists. A new Relation called TableWithHints has
been added to collect and process table hints, some of which have direct
counterparts in the Catalyst optimizer or can be used as comments for
migration purposes. The TSQLExpressionBuilder and TSqlRelationBuilder
classes have been modified to handle table hints and column aliases, and
the TSqlAstBuilderSpec test suite has been updated to include new tests
for table hints in T-SQL SELECT statements. These changes aim to improve
parsing, handling, and optimization of table sources, table hints, and
column aliases in TSQL queries.
* TSQL: Support generic FOR options
([databrickslabs#525](databrickslabs#525)). In this
release, we have added support for parsing T-SQL (Transact-SQL) options
that contain the keyword `FOR` using the standard syntax `[FOR]`. This
change is necessary as `FOR` cannot be used directly as an identifier
without escaping, as it would otherwise be seen as a table alias in a
`SELECT` statement. The ANTLR rule for parsing generic options has been
expanded to handle these special cases correctly, allowing for the
proper parsing of options such as `OPTIMIZE FOR UNKNOWN` in a `SELECT`
statement. Additionally, a new case has been added to the OptionBuilder
class to handle the `FOR` keyword and elide it, converting `OPTIMIZE FOR
UNKNOWN` to `OPTIMIZE` with an id of 'UNKNOWN'. This ensures the proper
handling of options containing `FOR` and avoids any conflicts with the
`FOR` clause in T-SQL statements. This change was implemented by
Valentin Kasas and involves adding a new alternative to the
`genericOption` rule in the TSqlParser.g4 file, but no new methods have
been added.
* Updated Dialect Variable Name
([databrickslabs#535](databrickslabs#535)). In this
release, the `source` variable name in the `QueryBuilder` class, which
refers to the `Dialect` instance, has been updated to `engine` to
accurately reflect its meaning as referring to either `Source` or
`Target`. This change includes updating the usage of `source` to
`engine` in the `build_query`, `_get_with_clause`, and
`build_threshold_query` methods, as well as removing unnecessary
parentheses in a list. These changes improve the code's clarity,
accuracy, and readability, while maintaining the overall functionality
of the affected methods.
* Use Oracle library only if the recon source is Oracle
([databrickslabs#532](databrickslabs#532)). In this
release, we have added a new `ReconcileConfig` configuration object and
a `SourceType` enumeration in the `databricks.labs.remorph.config` and
`databricks.labs.remorph.reconcile.constants` modules, respectively.
These objects are introduced to determine whether to include the Oracle
JDBC driver library in a reconciliation job's task libraries. The
`deploy_job` method and the `_job_recon_task` method have been updated
to use the `_recon_config` attribute to decide whether to include the
Oracle JDBC driver library. Additionally, the `_deploy_reconcile_job`
method in the `install.py` file has been modified to include a new
parameter called `reconcile`, which is passed as an argument from the
`_config` object. This change enhances the flexibility and customization
of the reconcile job deployment. Furthermore, new fixtures
`oracle_recon_config` and `snowflake_reconcile_config` have been
introduced for `ReconcileConfig` objects with Oracle and Snowflake
specific configurations, respectively. These fixtures are used in the
test functions for deploying jobs, ensuring that the tests are more
focused and better reflect the actual behavior of the code.
* [chore] Make singletons for the relevant `DataType` instances
([databrickslabs#705](databrickslabs#705)). This
commit introduces case objects for various data types, such as NullType,
StringType, and others, effectively making them singletons. This change
simplifies the creation of data type instances and ensures that each
type has a single instance throughout the application. The new case
objects are utilized in building SQL expressions, affecting functions
such as Cast and TRY_CAST in the TSqlExpressionBuilder class.
Additionally, the test file TSqlExpressionBuilderSpec.scala has been
updated to include the new case objects and handle errors by returning
null. The data types tested include integer types, decimal types, date
and time types, string types, binary types, and JSON. The primary goal
of this change is to improve the management and identification of data
types in the codebase, as well as to enhance code readability and
maintainability.

Dependency updates:

* Bump sqlglot from 25.1.0 to 25.5.1
([databrickslabs#534](databrickslabs#534)).
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0
([databrickslabs#555](databrickslabs#555)).
* Bump sqlglot from 25.5.1 to 25.6.1
([databrickslabs#585](databrickslabs#585)).
sundarshankar89 added a commit to sundarshankar89/remorph that referenced this pull request Jan 3, 2025
Bumps [sqlglot](https://github.com/tobymao/sqlglot) from 25.1.0 to
25.5.1.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/tobymao/sqlglot/blob/main/CHANGELOG.md">sqlglot's
changelog</a>.</em></p>
<blockquote>
<h2>[v25.5.1] - 2024-07-05</h2>
<h3>:bug: Bug Fixes</h3>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/2bdde2221b8017791ce4cc619abb2706464ca408"><code>2bdde22</code></a>
- <strong>optimizer</strong>: only qualify coalesced USING columns if
they exist in table schemas <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3740">#3740</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em></li>
</ul>
<h2>[v25.5.0] - 2024-07-04</h2>
<h3>:boom: BREAKING CHANGES</h3>
<ul>
<li>
<p>due to <a
href="https://github.com/tobymao/sqlglot/commit/8335ba10e60c7c63881d7559a6f1fada11b0e55d"><code>8335ba1</code></a>
- preserve EXTRACT(date_part FROM datetime) calls <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3729">#3729</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em>:</p>
<p>preserve EXTRACT(date_part FROM datetime) calls (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3729">#3729</a>)</p>
</li>
<li>
<p>due to <a
href="https://github.com/tobymao/sqlglot/commit/fb066a6167e1f887bd8c1a1369d063fe70f36a8a"><code>fb066a6</code></a>
- Decouple NVL() from COALESCE() <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3734">#3734</a>
by <a
href="https://github.com/VaggelisD"><code>@​VaggelisD</code></a>)</em>:</p>
<p>Decouple NVL() from COALESCE() (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3734">#3734</a>)</p>
</li>
</ul>
<h3>:sparkles: New Features</h3>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/0c032992fac622ebaee114cd9f6e405be1820054"><code>0c03299</code></a>
- <strong>teradata</strong>: random lower upper closes <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3721">#3721</a>
<em>(commit by <a
href="https://github.com/tobymao"><code>@​tobymao</code></a>)</em></li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/37b6e2d806f6da1338c75803919c602f8705acac"><code>37b6e2d</code></a>
- <strong>snowflake</strong>: add support for VECTOR(type, size) <em>(PR
<a
href="https://redirect.github.com/tobymao/sqlglot/pull/3724">#3724</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em></li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/1e07c4d29a43192fb57c120f3b9c1c2fa27d0fa6"><code>1e07c4d</code></a>
- <strong>presto, trino</strong>: Configurable transpilation of
Snowflake VARIANT <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3725">#3725</a>
by <a
href="https://github.com/VaggelisD"><code>@​VaggelisD</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>addresses issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3713">#3713</a>
opened by <a
href="https://github.com/Leonti"><code>@​Leonti</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/e5a53aaa015806574cd3c4bbe46b5788e960903e"><code>e5a53aa</code></a>
- <strong>snowflake</strong>: Support for FROM CHANGES <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3731">#3731</a>
by <a
href="https://github.com/VaggelisD"><code>@​VaggelisD</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>addresses issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3730">#3730</a>
opened by <a
href="https://github.com/achicoine-coveo"><code>@​achicoine-coveo</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/820d66430bb23bff88d0057b22842d313e1431c5"><code>820d664</code></a>
- <strong>presto</strong>: wrap md5 string arguments in to_utf8 <em>(PR
<a
href="https://redirect.github.com/tobymao/sqlglot/pull/3732">#3732</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>addresses issue <a
href="https://redirect.github.com/TobikoData/sqlmesh/issues/2855">#2855</a>
opened by <a
href="https://github.com/MikeWallis42"><code>@​MikeWallis42</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/912bc84791008ecce545cfbd3b0c9d4362131eb3"><code>912bc84</code></a>
- <strong>spark, databricks</strong>: Support view schema binding
options <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3739">#3739</a>
by <a
href="https://github.com/VaggelisD"><code>@​VaggelisD</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>addresses issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3738">#3738</a>
opened by <a
href="https://github.com/aersam"><code>@​aersam</code></a></em></li>
</ul>
</li>
</ul>
<h3>:bug: Bug Fixes</h3>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/3454f861f22f680f6b8c18cca466154d3b9fe8d1"><code>3454f86</code></a>
- <strong>teradata</strong>: use timestamp with time zone over
timestamptz <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3723">#3723</a>
by <a href="https://github.com/mtagle"><code>@​mtagle</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>fixes issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3722">#3722</a>
opened by <a
href="https://github.com/mtagle"><code>@​mtagle</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/f4a28721fd33edb3178c1d99746209dadfbba487"><code>f4a2872</code></a>
- <strong>clickhouse</strong>: switch off table alias columns generation
<em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3727">#3727</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>fixes issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3726">#3726</a>
opened by <a
href="https://github.com/cpcloud"><code>@​cpcloud</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/8335ba10e60c7c63881d7559a6f1fada11b0e55d"><code>8335ba1</code></a>
- <strong>clickhouse</strong>: preserve EXTRACT(date_part FROM datetime)
calls <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3729">#3729</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>fixes issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3728">#3728</a>
opened by <a
href="https://github.com/cpcloud"><code>@​cpcloud</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/fb066a6167e1f887bd8c1a1369d063fe70f36a8a"><code>fb066a6</code></a>
- <strong>oracle</strong>: Decouple NVL() from COALESCE() <em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3734">#3734</a>
by <a
href="https://github.com/VaggelisD"><code>@​VaggelisD</code></a>)</em>
<ul>
<li>:arrow_lower_right: <em>fixes issue <a
href="https://redirect.github.com/tobymao/sqlglot/issues/3733">#3733</a>
opened by <a
href="https://github.com/Hal-H2Apps"><code>@​Hal-H2Apps</code></a></em></li>
</ul>
</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/c790c3b1fa274d7b0faf9f75e7dbc62bc4f55c67"><code>c790c3b</code></a>
- <strong>tsql</strong>: parse rhs of x::varchar(max) into a type
<em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3737">#3737</a>
by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em></li>
</ul>
<h3>:recycle: Refactors</h3>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/84416d207a2e397aba12a4138fcbd1fab382c22d"><code>84416d2</code></a>
- <strong>teradata</strong>: clean up CurrentTimestamp generation logic
<em>(commit by <a
href="https://github.com/georgesittas"><code>@​georgesittas</code></a>)</em></li>
</ul>
<h2>[v25.4.1] - 2024-06-29</h2>
<h3>:bug: Bug Fixes</h3>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/6bf9853fd0b26d5a4e93e37447c3b275cd108872"><code>6bf9853</code></a>
- <strong>tsql</strong>: cast shorthand closes <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3760">#3760</a>
<em>(PR <a
href="https://redirect.github.com/tobymao/sqlglot/pull/3720">#3720</a>
by <a
href="https://github.com/tobymao"><code>@​tobymao</code></a>)</em></li>
</ul>
<h2>[v25.4.0] - 2024-06-28</h2>
<h3>:boom: BREAKING CHANGES</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/tobymao/sqlglot/commit/2bdde2221b8017791ce4cc619abb2706464ca408"><code>2bdde22</code></a>
Fix(optimizer): only qualify coalesced USING columns if they exist in
table s...</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/e449e3d50e2d8eb6cef87b8ebcd172329a6b79e9"><code>e449e3d</code></a>
docs: update API docs, CHANGELOG.md for v25.5.0 [skip ci]</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/912bc84791008ecce545cfbd3b0c9d4362131eb3"><code>912bc84</code></a>
feat(spark, databricks): Support view schema binding options (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3739">#3739</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/c790c3b1fa274d7b0faf9f75e7dbc62bc4f55c67"><code>c790c3b</code></a>
Fix(tsql): parse rhs of x::varchar(max) into a type (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3737">#3737</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/820d66430bb23bff88d0057b22842d313e1431c5"><code>820d664</code></a>
Feat(presto): wrap md5 string arguments in to_utf8 (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3732">#3732</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/fb066a6167e1f887bd8c1a1369d063fe70f36a8a"><code>fb066a6</code></a>
fix(oracle)!: Decouple NVL() from COALESCE() (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3734">#3734</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/e5a53aaa015806574cd3c4bbe46b5788e960903e"><code>e5a53aa</code></a>
feat(snowflake): Support for FROM CHANGES (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3731">#3731</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/1e07c4d29a43192fb57c120f3b9c1c2fa27d0fa6"><code>1e07c4d</code></a>
feat(presto, trino): Configurable transpilation of Snowflake VARIANT (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3725">#3725</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/8335ba10e60c7c63881d7559a6f1fada11b0e55d"><code>8335ba1</code></a>
Fix(clickhouse)!: preserve EXTRACT(date_part FROM datetime) calls (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3729">#3729</a>)</li>
<li><a
href="https://github.com/tobymao/sqlglot/commit/f4a28721fd33edb3178c1d99746209dadfbba487"><code>f4a2872</code></a>
Fix(clickhouse): switch off table alias columns generation (<a
href="https://redirect.github.com/tobymao/sqlglot/issues/3727">#3727</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/tobymao/sqlglot/compare/v25.1.0...v25.5.1">compare
view</a></li>
</ul>
</details>
<br />

<details>
<summary>Most Recent Ignore Conditions Applied to This Pull
Request</summary>

| Dependency Name | Ignore Conditions |
| --- | --- |
| sqlglot | [>= 24.a, < 25] |
</details>


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=sqlglot&package-manager=pip&previous-version=25.1.0&new-version=25.5.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: sundar.shankar <[email protected]>
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this pull request Jan 3, 2025
* Added TSql transpiler
([databrickslabs#734](databrickslabs#734)). In this
release, we have developed a new open-source library feature that
enhances the transpilation of T-SQL code to Databricks-compatible code.
The new TSqlToDatabricksTranspiler class has been added, which extends
the Transpiler abstract class and defines the transpile method. This
method converts T-SQL code to Databricks-compatible code by creating a
lexer, token stream, parser, and parsed tree from the input string using
TSqlLexer, CommonTokenStream, and tSqlFile. The parsed tree is then
passed to the TSqlAstBuilder's visit method to generate a logical plan,
which is optimized using an optimizer object with rules such as
PullLimitUpwards and TopPercentToLimitSubquery. The optimized plan is
then passed to the LogicalPlanGenerator's generate method to generate
the final transpiled code. Additionally, a new class,
IsTranspiledFromTSqlQueryRunner, has been added to the QueryRunner
object to transpile T-SQL queries into the format expected by the
Databricks query runner using the new TSqlToDatabricksTranspiler. The
AcceptanceTestRunner class in the com.databricks.labs.remorph.coverage
package has been updated to replace TSqlAstBuilder with
IsTranspiledFromTSqlQueryRunner in the TSqlAcceptanceSuite class,
indicating a change in the code responsible for handling TSql queries.
This new feature aims to provide a smooth and efficient way to convert
T-SQL code to Databricks-compatible code for further processing and
execution.
* Added the missing info for the reconciliation documentation
([databrickslabs#520](databrickslabs#520)). In this
release, we have made significant improvements to the reconciliation
feature of our open-source library. We have added a configuration folder
and provided a template for creating configuration files for specific
table sources. The config files will contain necessary configurations
for table-specific reconciliation. Additionally, we have included a note
in the transformation section detailing the usage of user-defined
functions (UDFs) in transformation expressions, with an example UDF
called `sort_array_input()` provided. The reconcile configuration sample
documentation has also been added, with a JSON configuration for
reconciling tables using various operations like drop, join,
transformation, threshold, filter, and JDBC ReaderOptions. The commit
also includes examples of source and target tables, data overviews, and
reconciliation configurations for various scenarios, such as basic
config and column mapping, user transformations, explicit select,
explicit drop, filters, and thresholds comparison, among others. These
changes aim to make it easier for users to set up and execute the
reconciliation process for specific table sources and provide clear and
concise information about using UDFs in transformation expressions for
reconciliation. This commit is co-authored by Vijay Pavan Nissankararao
and SundarShankar89.
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0
([databrickslabs#555](databrickslabs#555)). In this
pull request, the sigstore/gh-action-sigstore-python dependency is being
updated from version 2.1.1 to 3.0.0. This new version includes several
changes and improvements, such as the addition of recursive globbing
with ** to the inputs and the removal of certain settings like
fulcio-url, rekor-url, ctfe, and rekor-root-pubkey. The signature,
certificate, and bundle output settings have also been removed.
Furthermore, the way inputs are parsed has been changed, and they are
now made optional under certain conditions. The default suffix has been
updated to .sigstore.json. The 3.0.0 version also resolves various
deprecations present in sigstore-python's 2.x series and supports CI
runners that use PEP 668 to constrain global package prefixes.
* Bump sqlglot from 25.1.0 to 25.5.1
([databrickslabs#534](databrickslabs#534)). In the
latest release, the `sqlglot` package has been updated from version
25.1.0 to 25.5.1, which includes bug fixes, breaking changes, new
features, and refactors for parsing, analyzing, and rewriting SQL
queries. The new version introduces optimizations for coalesced USING
columns, preserves EXTRACT(date_part FROM datetime) calls, decouples
NVL() from COALESCE(), and supports FROM CHANGES in Snowflake. It also
provides configurable transpilation of Snowflake VARIANT, and supports
view schema binding options for Spark and Databricks. The update
addresses several issues, such as the use of timestamp with time zone
over timestamptz, switch off table alias columns generation, and parse
rhs of x::varchar(max) into a type. Additionally, the update cleans up
CurrentTimestamp generation logic for Teradata. The `sqlglot` dependency
has also been updated in the 'experimental.py' file of the
'databricks/labs/remorph/snow' module, along with the addition of a new
private method `_parse_json` to the `Generator` class for parsing JSON
data. Software engineers should review the changes and update their code
accordingly, as conflicts with existing code will be resolved
automatically by Dependabot, as long as the pull request is not altered
manually.
* Bump sqlglot from 25.5.1 to 25.6.1
([databrickslabs#585](databrickslabs#585)). In this
release, the `sqlglot` dependency is updated from version 25.5.1 to
25.6.1 in the 'pyproject.toml' file. This update includes bug fixes,
breaking changes, new features, and improvements. Breaking changes
consist of updates to the QUALIFY clause in queries and the
canonicalization of struct and array inline constructor. New features
include support for ORDER BY ALL, FROM ROWS FROM (...), RPAD & LPAD
functions, and exp.TimestampAdd. Bug fixes address issues related to the
QUALIFY clause in queries, expansion of SELECT * REPLACE, RENAME,
transpiling UDFs from Databricks, and more. The pull request also
includes a detailed changelog, commit history, instructions for
triggering Dependabot actions, and commands for reference, with the
exception of the compatibility score for the new version, which is not
taken into account in this pull request.
* Feature/reconcile table mismatch threshold
([databrickslabs#550](databrickslabs#550)). This
commit enhances the reconciliation process in the open-source library
with several new features, addressing issue
[databrickslabs#504](databrickslabs#504). A new
`get_record_count` method is added to the `Reconcile` class, providing
record count data for source and target tables, facilitating
comprehensive analysis of table mismatches. A `CountQueryBuilder` class
is introduced to build record count queries for different layers and SQL
dialects, ensuring consistency in data processing. The `Thresholds`
class is refactored into `ColumnThresholds` and `TableThresholds`,
allowing for more granular control over comparisons and customizable
threshold settings. New methods `_is_mismatch_within_threshold_limits`
and `_insert_into_metrics_table` are added to the `recon_capture.py`
file, improving fine-grained control over the reconciliation process and
preventing false positives. Additionally, new classes, methods, and data
structures have been implemented in the `execute` module to handle
reconciliation queries and data more efficiently. These improvements
contribute to a more accurate and robust reconciliation system.
* Feature: introduce core transpiler
([databrickslabs#715](databrickslabs#715)). A new
core transpiler, `SnowflakeToDatabricksTranspiler`, has been introduced
to convert Snowflake queries into Databricks SQL, streamlining
integration and compatibility between the two systems. This transpiler
is integrated into the coverage test suites for thorough testing, and is
used to convert various types of logical plans, handling cases such as
`Batch`, `WithCTE`, `Project`, `NamedTable`, `Filter`, and `Join`. The
`SnowflakeToDatabricksTranspiler` class tokenizes input Snowflake query
strings, initializes a `SnowflakeParser` instance, parses the input
Snowflake query, generates a logical plan, and applies the
`LogicalPlanGenerator` to the logical plan to generate the equivalent
Databricks SQL query. Additionally, the `SnowflakeAstBuilder` class has
been updated to alter the way `Batch` logical plans are built and
improve overall functionality of the transpiler.
* Fixed LEFT and RIGHT JOIN syntax in Snowflake ANTLR grammar
([databrickslabs#526](databrickslabs#526)). A fix
has been implemented to address issues with the Snowflake ANTLR grammar
related to the proper parsing of LEFT and RIGHT JOIN statements.
Previously, the keywords LEFT and RIGHT were incorrectly allowed as
identifiers, but they are hard keywords that must be escaped to be used
as column names. This change updates the grammar to escape these
keywords in JOIN statements, improving the overall parsing of queries
that include LEFT and RIGHT JOINs. Additionally, semantic predicates
have been suggested to handle cases where LEFT or RIGHT are used as
column names without escaping, although this is not yet implemented. To
ensure the correctness of the updated grammar, new tests have been added
to the SnowflakeAstBuilderSpec for LEFT and RIGHT JOINs, which check
that the Abstract Syntax Tree (AST) is built correctly for these
queries.
* Fixed Snowflake Acceptance Testcases Failures
([databrickslabs#531](databrickslabs#531)). In this
release, updates have been made to the acceptance testcases for various
SQL functions in the open-source library. The DENSE RANK function's
testcase has been updated with a window specification and ORDER BY
clause in both Snowflake and Databricks SQL syntaxes, ensuring accurate
test results. The LAG function's testcase now includes a PARTITION BY
and ORDER BY clause, as well as the NULLS LAST keyword in Databricks
SQL, for improved accuracy and consistency. The SQL queries in the
Snowflake testcase for the `last_value` function have been updated with
a window specification, ORDER BY clause, and NULLS LAST directive for
Databricks SQL. Test case failures in the Snowflake acceptance testsuite
have been addressed with updates to the LEAD function, MONTH_NAME to
MONTHNAME renaming, and DATE_FORMAT to TO_DATE conversion, improving
reliability and consistency. The ntile function's testcase has been
updated with PARTITION BY and ORDER BY clauses, and the NULLS LAST
keyword has been added to the Databricks SQL query. The SQL query for
null-safe equality comparison has been updated with a conditional
expression compatible with Snowflake. The ranking function's testcase
has been improved with the appropriate partition and order by clauses,
and the NULLS LAST keyword has been added to the Databricks SQL query,
enhancing accuracy and consistency. Lastly, updates have been made to
the ROW_NUMBER function's testcase, ensuring accurate and consistent row
numbering for both Snowflake and Databricks.
* Fixed TSQL transpiler
([databrickslabs#735](databrickslabs#735)). In this
release, we have implemented a fix for the TSQL transpiler, addressing
the issue [databrickslabs#7](databrickslabs#7). This
enhancement allows the library to accurately convert TSQL code into an
equivalent format that is compatible with other databases. The fix
resolves reported bugs related to incorrect syntax interpretation,
thereby improving the overall reliability and functionality of the
transpiler. Software engineers and developers relying on TSQL
compatibility for cross-database operations will benefit from this
improvement. We encourage users to test and provide feedback on this
updated feature.
* Fixed `SELECT TOP X PERCENT` IR translation for TSQL
([databrickslabs#733](databrickslabs#733)). In this
release, we have made several enhancements to the open-source library to
improve compatibility with T-SQL and Catalyst. We have added a new
dependency, `pprint_${scala.binary.version}` version 0.8.1 from the
`com.lihaoyi` group, to provide advanced pretty-printing functionality
for Scala. We have also fixed the translation of the TSQL `SELECT TOP X
PERCENT` feature in the parser for intermediate expressions, addressing
the difference in syntax between TSQL and SQL for limiting the number of
rows returned by a query. Additionally, we have modified the
implementation of the `WITH` clause and added a new expression for the
`SELECT TOP` clause in T-SQL, improving the compatibility of the
codebase with T-SQL and aligning it with Catalyst. We also introduced a
new abstract class `Rule` and a case class `Rules` in the
`com.databricks.labs.remorph.parsers.intermediate` package to fix the
`SELECT TOP X PERCENT` IR translation for TSQL by adding new rules.
Furthermore, we have added a new Scala file, `subqueries.scala`,
containing abstract class `SubqueryExpression` and two case classes that
extend it, and made changes to the `trees.scala` file to improve the
tree string representation for better readability and consistency in the
codebase. These changes aim to improve the overall functionality of the
library and make it easier for new users to understand and adopt the
project.
* Fixed invalid null constraint and FQN
([databrickslabs#517](databrickslabs#517)). In this
change, we addressed issues
[databrickslabs#516](databrickslabs#516) and
[databrickslabs#517](databrickslabs#517), which
involved resolving an invalid null constraint and correcting a fully
qualified name (FQN) in our open-source library. The `read_data`
function in the `databricks.py` file was updated to improve null
constraint handling and ensure the FQN is valid. Previously, the catalog
was always appended to the table name, potentially resulting in an
invalid FQN and null constraint issues. Now, the code checks if the
catalog exists before appending it to the table name, and if not
provided, the schema and table name are concatenated directly.
Additionally, we removed the NOT NULL constraint from the catalog field
in the source_table and target_table structs in the main SQL file,
allowing null values for this field. These changes maintain backward
compatibility and enhance the overall functionality and robustness of
the project, ensuring accurate query results and avoiding potential
errors.
* Generate SQL for arithmetic operators
([databrickslabs#726](databrickslabs#726)). In this
release, we have introduced a new private method `arithmetic` to the
`ExpressionGenerator` class that generates SQL for arithmetic
operations, including unary minus, unary plus, multiplication, division,
modulo, addition, and subtraction. This improves the readability and
maintainability of the code by separating concerns and making the
functionality more explicit. Additionally, we have introduced a new
trait named `Arithmetic` to group arithmetic expressions together, which
enables easier manipulation and identification of arithmetic expressions
in the code. A new test suite has also been added for arithmetic
operations in the `ExpressionGenerator` class, which improves test
coverage and ensures the correct SQL is generated for these operations.
These changes provide a welcome addition for developers looking to
extend the functionality of the `ExpressionGenerator` class for
arithmetic operations.
* Generate SQL for bitwise operators. The ExpressionGenerator class in
the remorph project has been updated to support generating SQL for
bitwise operators (OR, AND, XOR, NOT) through the addition of a new
private method `bitwise` that converts bitwise operations to equivalent
SQL expressions. The `expression` method has also been updated to
utilize the new `bitwise` method for any input ir.Expression that is a
bitwise operation. To facilitate this change, a new trait called
`Bitwise` and updated case classes for bitwise operations, including
`BitwiseNot`, `BitwiseAnd`, `BitwiseOr`, and `BitwiseXor`, have been
implemented. The updated case classes extend the new `Bitwise` trait and
include the `dataType` override method to return the data type of the
left expression. A new test case in ExpressionGeneratorTest for bitwise
operators has been added to validate the functionality, and the
`expression` method in ExpressionGenerator now utilizes the
GeneratorContext() instead of new GeneratorContext(). These changes
enable the ExpressionGenerator to generate SQL code for bitwise
operations, expanding its capabilities.
* Generate `.. LIKE ..`
([databrickslabs#723](databrickslabs#723)). In this
commit, the `ExpressionGenerator` class has been enhanced with a new
method, `like`, which generates SQL `LIKE` expressions. The method takes
a `GeneratorContext` object and an `ir.Like` object as arguments, and
returns a string representation of the `LIKE` expression. It uses the
`expression` method to generate the left and right sides of the `LIKE`
operator, and also handles the optional escape character. Additionally,
the `timestampLiteral` and `dateLiteral` methods have been updated to
take an `ir.Literal` object and better handle `NULL` values. A new test
case has also been added for the `ExpressionGenerator` class, which
checks the `like` function and includes examples for basic usage and
usage with an escape character. This commit improves the functionality
of the `ExpressionGenerator` class, allowing it to handle `LIKE`
expressions and better handle `NULL` values for timestamps and dates.
* Generate `DISTINCT`, `*`
([databrickslabs#739](databrickslabs#739)). In this
release, we've enhanced the ExpressionGenerator class to support
generating `DISTINCT` and `*` (star) expressions in SQL queries.
Previously, the class did not handle these cases, resulting in
incomplete or incorrect SQL queries. With the introduction of the
`distinct` method, the class can now generate the `DISTINCT` keyword
followed by the expression to be applied to, and the `star` method
produces the `*` symbol, optionally followed by the name of the object
(table or subquery) to which it applies. These improvements make the
ExpressionGenerator class more robust and compatible with various SQL
dialects, resulting in more accurate query outcomes. We've also added
new test cases for the `ExpressionGenerator` class to ensure that
`DISTINCT` and `*` expressions are generated correctly. Additionally,
support for generating SQL `DISTINCT` and `*` (wildcard) has been added
to the transpilation of Logical Plans to SQL, specifically in the
`ir.Project` class. This ensures that the correct SQL `SELECT * FROM
table` syntax is generated when a wildcard is used in the expression
list. These enhancements significantly improve the functionality and
compatibility of our open-source library.
* Generate `LIMIT`
([databrickslabs#732](databrickslabs#732)). A new
method has been added to generate a SQL LIMIT clause for a logical plan
in the data transformation tool. A new `case` branch has been
implemented in the `generate` method of the `LogicalPlanGenerator` class
to handle the `ir.Limit` case, which generates the SQL LIMIT clause. If
a percentage limit is specified, the tool will throw an exception as it
is not currently supported. The `generate` method of the
`ExpressionGenerator` class has been replaced with a new `generate`
method in the `ir.Project`, `ir.Filter`, and new `ir.Limit` cases to
ensure consistent expression generation. A new case class `Limit` has
been added to the
`com.databricks.labs.remorph.parsers.intermediate.relations` package,
which extends the `UnaryNode` class and has four parameters: `input`,
`limit`, `is_percentage`, and `with_ties`. This new class enables
limiting the number of rows returned by a query, with the ability to
specify a percentage of rows or include ties in the result set.
Additionally, a new test case has been added to the
`LogicalPlanGeneratorTest` class to verify the transpilation of a
`Limit` node to its SQL equivalent, ensuring that the `Limit` node is
correctly handled during transpilation.
* Generate `OFFSET` SQL clauses
([databrickslabs#736](databrickslabs#736)). The
Remorph project's latest update introduces a new OFFSET clause
generation feature for SQL queries in the LogicalPlanGenerator class.
This change adds support for skipping a specified number of rows before
returning results in a SQL query, enhancing the library's query
generation capabilities. The implementation includes a new case in the
match statement of the generate function to handle ir.Offset nodes,
creating a string representation of the OFFSET clause using the provide
offset expression. Additionally, the commit includes a new test case in
the LogicalPlanGeneratorTest class to validate the OFFSET clause
generation, ensuring that the LogicalPlanGenerator can translate Offset
AST nodes into corresponding SQL statements. Overall, this update
enables the generation of more comprehensive SQL queries with OFFSET
support, providing software engineers with greater flexibility for
pagination and other data processing tasks.
* Generate `ORDER BY` SQL clauses
([databrickslabs#737](databrickslabs#737)). This
commit introduces new classes and enumerations for sort direction and
null ordering, as well as an updated SortOrder case class, enabling the
generation of ORDER BY SQL clauses. The LogicalPlanGenerator and
SnowflakeExpressionBuilder classes have been modified to utilize these
changes, allowing for more flexible and customizable sorting and null
ordering when generating SQL queries. Additionally, the
TSqlRelationBuilderSpec test suite has been updated to reflect these
changes, and new test cases have been added to ensure the correct
transpilation of ORDER BY clauses. Overall, these improvements enhance
the Remorph project's capability to parse and generate SQL expressions
with various sorting scenarios, providing a more robust and maintainable
codebase.
* Generate `UNION` / `EXCEPT` / `INTERSECT`
([databrickslabs#731](databrickslabs#731)). In this
release, we have introduced support for the `UNION`, `EXCEPT`, and
`INTERSECT` set operations in our data processing system's generator. A
new `unknown` method has been added to the `Generator` trait to return a
`TranspileException` when encountering unsupported operations, allowing
for better error handling and more informative error messages. The
`LogicalPlanGenerator` class in the remorph project has been extended to
support generating `UNION`, `EXCEPT`, and `INTERSECT` SQL operations
with the addition of a new parameter, `explicitDistinct`, to enable
explicit specification of `DISTINCT` for these set operations. A new
test suite has been added to the `LogicalPlanGenerator` to test the
generation of these operations using the `SetOperation` class, which now
has four possible set operations: `UnionSetOp`, `IntersectSetOp`,
`ExceptSetOp`, and `UnspecifiedSetOp`. With these changes, our system
can handle a wider range of input and provide meaningful error messages
for unsupported operations, making it more versatile in handling complex
SQL queries.
* Generate `VALUES` SQL clauses
([databrickslabs#738](databrickslabs#738)). The
latest commit introduces a new feature to generate `VALUES` SQL clauses
in the context of the logical plan generator. A new case branch has been
implemented in the `generate` method to manage `ir.Values` expressions,
converting input data (lists of lists of expressions) into a string
representation compatible with `VALUES` clauses. The existing
functionality remains unchanged. Additionally, a new test case has been
added for the `LogicalPlanGenerator` class, which checks the correct
transpilation to `VALUES` SQL clauses. This test case ensures that the
`ir.Values` method, which takes a sequence of sequences of literals,
generates the corresponding `VALUES` SQL clause, specifically checking
the input `Seq(Seq(ir.Literal(1), ir.Literal(2)), Seq(ir.Literal(3),
ir.Literal(4)))` against the SQL clause `"VALUES (1,2), (3,4)"`. This
change enables testing the functionality of generating `VALUES` SQL
clauses using the `LogicalPlanGenerator` class.
* Generate predicate expressions. This commit introduces the generation
of predicate expressions as part of the SQL ExpressionGenerator in the
`com.databricks.labs.remorph.generators.sql` package, enabling the
creation of more complex SQL expressions. The changes include the
addition of a new private method, `predicate(ctx: GeneratorContext,
expr: Expression)`, to handle predicate expressions, and the
introduction of two new predicate expression types, LessThan and
LessThanOrEqual, for comparing the relative ordering of two expressions.
Existing predicate expression types have been updated with consistent
naming. Additionally, the commit incorporates improvements to the
handling of comparison operators in the SnowflakeExpressionBuilder and
TSqlExpressionBuilder classes, addressing bugs and ensuring precise
predicate expression generation. The `ParserTestCommon` trait has also
been updated to reorder certain operators in a logical plan, maintaining
consistent comparison results in tests. New test cases have been added
to several test suites to ensure the correct interpretation and
generation of predicate expressions involving different data types and
search conditions. Overall, these enhancements provide more fine-grained
comparison of expressions, enable more nuanced condition checking, and
improve the robustness and accuracy of the SQL expression generation
process.
* Merge remote-tracking branch 'origin/main'. In this update, the
`ExpressionGenerator` class in the
`com.databricks.labs.remorph.generators.sql` package has been enhanced
with two new private methods: `dateLiteral` and `timestampLiteral`.
These methods are designed to generate the SQL literal representation of
`DateType` and `TimestampType` expressions, respectively. The
introduction of these methods addresses the previous limitations of
formatting date and timestamp values directly, which lacked
extensibility and required duplicated code for handling null values. By
extracting the formatting logic into separate methods, this commit
significantly improves code maintainability and reusability, enhancing
the overall readability and understandability of the
`ExpressionGenerator` class for developers. The `dateLiteral` method
handles `DateType` values by formatting them using the `dateFormat`
`SimpleDateFormat` instance, returning `NULL` if the value is missing.
Likewise, the `timestampLiteral` method formats `TimestampType` values
using the `timeFormat` `SimpleDateFormat` instance, returning `NULL` if
the value is missing. These methods will enable developers to grasp the
code's functionality more easily and make future enhancements to the
class.
* Modified dataclass for table threshold and added documentation
([databrickslabs#714](databrickslabs#714)). A series
of modifications have been implemented to enhance the threshold
configuration and validation for table reconciliation in the open-source
library. The `TableThresholds` dataclass has been updated to accept a
string for the `model` attribute, replacing the previously used
`TableThresholdModel` Enum. Additionally, a new
`validate_threshold_model` method has been added to `TableThresholds` to
ensure proper validation of the `model` attribute. A new exception
class, `InvalidModelForTableThreshold`, has been introduced to handle
invalid settings. Column-specific thresholds can now be set using the
`ColumnThresholds` configuration option. The `recon_capture.py` and
`recon_config.py` files have been updated accordingly, and the
documentation has been revised to clarify these changes. These
improvements offer greater flexibility and control for users configuring
thresholds while also refining validation and error handling.
* Support CTAS in TSQL Grammar and add more comparison operators
([databrickslabs#545](databrickslabs#545)). In this
release, we have added support for the CTAS (CREATE TABLE AS) statement
in the TSQL (T-SQL) grammar, as well as introduced new comparison
operators: !=, !<, and !>. The CTAS statement allows for the creation of
a new table by selecting from an existing table or query, potentially
improving code readability and performance. The new comparison operators
provide alternative ways of expressing inequalities, increasing
flexibility for developers. The keyword `REPLICATE` has also been added
for creating a full copy of a database or availability group. These
changes enhance the overall functionality of the TSQL grammar and
improve the user's ability to express various operations in TSQL. The
CTAS statement is implemented as a new rule, and the new comparison
operators are added as methods in the TSqlExpressionBuilder class. These
changes provide increased capability and flexibility for TSQL parsing
and query handling. The new methods are not adding any new external
dependencies, and the project remains self-contained. The additions have
been tested with the TSqlExpressionBuilderSpec test suite, ensuring the
functionality and compatibility of the TSQL parser.
* Support translation of TSQL INGORE NULLS clause in windowing functions
([databrickslabs#511](databrickslabs#511)). The
latest change introduces support for translating the TSQL IGNORE NULLS
and RESPECT NULLS clauses in windowing functions to their equivalents in
Databricks SQL. In TSQL, these clauses appear after the function name
and before the OVER clause, affecting how the functions handle null
values. Databricks SQL represents this functionality with an optional
trailing boolean parameter for specific windowing functions. With this
update, when the IGNORE NULLS clause is specified in TSQL, a boolean
option is appended to the corresponding Databicks SQL windowing
functions, with RESPECT NULLS as the default. This enhancement is
facilitated by a new private method, `buildNullIgnore`, which adds the
boolean parameter to the original expression when IGNORE NULLS is
specified in the OVER clause. The alteration is exemplified in new test
examples for the TSqlFunctionSpec, which include testing the LEAD
function with and without the IGNORE NULLS clause, and updates to the
translation of functions with non-standard syntax.
* TSQL: Implement TSQL UPDATE/DELETE statements
([databrickslabs#540](databrickslabs#540)). In this
release, we have added support for TSQL UPDATE and DELETE statements in
all syntactical forms, including UDF column transformations, in the
TSqlParser.g4 file. The implementation of both statements is done in a
single file and they share many common clauses. We have also introduced
two new case classes, UpdateTable and MergeTables, in the
extensions.scala file to implement the TSQL UPDATE and DELETE
statements, respectively. Additionally, we have added new methods to
handle various clauses and elements associated with these statements in
the TSqlErrorStrategy class. A series of tests have been included to
ensure the correct translation of various UPDATE and DELETE queries to
their respective Abstract Syntax Trees (ASTs). These changes bring TSQL
UPDATE and DELETE statement functionality to the project and allow for
their use in a variety of contexts, providing developers with more
flexibility and control when working with TSQL UPDATE and DELETE
statements in the parser.
* TSQL: Implement translation of INSERT statement
([databrickslabs#515](databrickslabs#515)). In this
release, we have implemented the TSQL INSERT statement in its entirety,
including all target options, optional clauses, and Common Table
Expressions (CTEs) in our open-source library. The change includes
updates to the TSqlParser.g4 file to support the INSERT statement's
various clauses, such as TOP, INTO, WITH TABLE HINTS, outputClause, and
optionClause. We have also added new case classes to the TSQL AST to
handle various aspects of the INSERT statement, including LocalVarTable,
Output, InsertIntoTable, DerivedRows, DefaultValues, and Default. The
TSqlExpressionBuilder and TSqlRelationBuilder classes have been updated
to support the new INSERT statement, including handling output column
lists, aliases, and JSON clauses. We have added specification tests to
TSqlAstBuilderSpec.scala to demonstrate the various ways that the INSERT
statement can be written and to ensure the correct translation of TSQL
INSERT statements into the remorph project.
* TSQL: Remove the SIGN fragment from numerical tokens
([databrickslabs#547](databrickslabs#547)). In this
release, we have made changes to the TSQL expression builder in the
remorph project that affect how negative and positive numerical literals
are parsed. Negative literals, such as -2, will now be parsed as
UMinus(Literal(2)) instead of Literal(-2), and positive literals, such
as +1, will be parsed as UPlus(Literal(1)) instead of Literal(1). This
change was made to address issue
[databrickslabs#546](databrickslabs#546), but it is
not an ideal solution as it may cause inconvenience in downstream
processes. The affected numerical tokens include INT, HEX, FLOAT, REAL,
and MONEY, which have been simplified by removing the SIGN fragment. We
have updated the buildPrimitive method to handle INT, REAL, and FLOAT
token types and ensure that numerical tokens are parsed correctly. We
intend to keep issue
[databrickslabs#546](databrickslabs#546) open for
further exploration of a better solution. The tests have been updated to
reflect these changes.
* TSQL: Simplifies named table tableSource, implements columnAlias list
([databrickslabs#512](databrickslabs#512)). This
change introduces significant updates to the TSqlParser's grammar for
tableSource, simplifying and consolidating rules related to table
aliases and column alias lists. A new Relation called TableWithHints has
been added to collect and process table hints, some of which have direct
counterparts in the Catalyst optimizer or can be used as comments for
migration purposes. The TSQLExpressionBuilder and TSqlRelationBuilder
classes have been modified to handle table hints and column aliases, and
the TSqlAstBuilderSpec test suite has been updated to include new tests
for table hints in T-SQL SELECT statements. These changes aim to improve
parsing, handling, and optimization of table sources, table hints, and
column aliases in TSQL queries.
* TSQL: Support generic FOR options
([databrickslabs#525](databrickslabs#525)). In this
release, we have added support for parsing T-SQL (Transact-SQL) options
that contain the keyword `FOR` using the standard syntax `[FOR]`. This
change is necessary as `FOR` cannot be used directly as an identifier
without escaping, as it would otherwise be seen as a table alias in a
`SELECT` statement. The ANTLR rule for parsing generic options has been
expanded to handle these special cases correctly, allowing for the
proper parsing of options such as `OPTIMIZE FOR UNKNOWN` in a `SELECT`
statement. Additionally, a new case has been added to the OptionBuilder
class to handle the `FOR` keyword and elide it, converting `OPTIMIZE FOR
UNKNOWN` to `OPTIMIZE` with an id of 'UNKNOWN'. This ensures the proper
handling of options containing `FOR` and avoids any conflicts with the
`FOR` clause in T-SQL statements. This change was implemented by
Valentin Kasas and involves adding a new alternative to the
`genericOption` rule in the TSqlParser.g4 file, but no new methods have
been added.
* Updated Dialect Variable Name
([databrickslabs#535](databrickslabs#535)). In this
release, the `source` variable name in the `QueryBuilder` class, which
refers to the `Dialect` instance, has been updated to `engine` to
accurately reflect its meaning as referring to either `Source` or
`Target`. This change includes updating the usage of `source` to
`engine` in the `build_query`, `_get_with_clause`, and
`build_threshold_query` methods, as well as removing unnecessary
parentheses in a list. These changes improve the code's clarity,
accuracy, and readability, while maintaining the overall functionality
of the affected methods.
* Use Oracle library only if the recon source is Oracle
([databrickslabs#532](databrickslabs#532)). In this
release, we have added a new `ReconcileConfig` configuration object and
a `SourceType` enumeration in the `databricks.labs.remorph.config` and
`databricks.labs.remorph.reconcile.constants` modules, respectively.
These objects are introduced to determine whether to include the Oracle
JDBC driver library in a reconciliation job's task libraries. The
`deploy_job` method and the `_job_recon_task` method have been updated
to use the `_recon_config` attribute to decide whether to include the
Oracle JDBC driver library. Additionally, the `_deploy_reconcile_job`
method in the `install.py` file has been modified to include a new
parameter called `reconcile`, which is passed as an argument from the
`_config` object. This change enhances the flexibility and customization
of the reconcile job deployment. Furthermore, new fixtures
`oracle_recon_config` and `snowflake_reconcile_config` have been
introduced for `ReconcileConfig` objects with Oracle and Snowflake
specific configurations, respectively. These fixtures are used in the
test functions for deploying jobs, ensuring that the tests are more
focused and better reflect the actual behavior of the code.
* [chore] Make singletons for the relevant `DataType` instances
([databrickslabs#705](databrickslabs#705)). This
commit introduces case objects for various data types, such as NullType,
StringType, and others, effectively making them singletons. This change
simplifies the creation of data type instances and ensures that each
type has a single instance throughout the application. The new case
objects are utilized in building SQL expressions, affecting functions
such as Cast and TRY_CAST in the TSqlExpressionBuilder class.
Additionally, the test file TSqlExpressionBuilderSpec.scala has been
updated to include the new case objects and handle errors by returning
null. The data types tested include integer types, decimal types, date
and time types, string types, binary types, and JSON. The primary goal
of this change is to improve the management and identification of data
types in the codebase, as well as to enhance code readability and
maintainability.

Dependency updates:

* Bump sqlglot from 25.1.0 to 25.5.1
([databrickslabs#534](databrickslabs#534)).
* Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0
([databrickslabs#555](databrickslabs#555)).
* Bump sqlglot from 25.5.1 to 25.6.1
([databrickslabs#585](databrickslabs#585)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants