Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Reconcile is not exiting smoothly for the TABLE NOT FOUND ERROR #388

Closed
1 task done
ganeshdogiparthi-db opened this issue May 24, 2024 · 0 comments · Fixed by #392
Closed
1 task done

[BUG]: Reconcile is not exiting smoothly for the TABLE NOT FOUND ERROR #388

ganeshdogiparthi-db opened this issue May 24, 2024 · 0 comments · Fixed by #392
Assignees
Labels
bug Something isn't working

Comments

@ganeshdogiparthi-db
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Category of Bug / Issue

Other

Current Behavior

No response

Expected Behavior

No response

Steps To Reproduce

No response

Relevant log output or Exception details

No response

Sample Query

No response

Operating System

macOS

Version

latest via Databricks CLI

@ganeshdogiparthi-db ganeshdogiparthi-db added the bug Something isn't working label May 24, 2024
@ganeshdogiparthi-db ganeshdogiparthi-db changed the title [BUG]: Reconcile is not existing smoothly for the TABLE NOT FOUND ERROR [BUG]: Reconcile is not exiting smoothly for the TABLE NOT FOUND ERROR May 25, 2024
github-merge-queue bot pushed a commit that referenced this issue May 27, 2024
nfx added a commit that referenced this issue May 29, 2024
* Capture Reconcile metadata in delta tables for dashbaords ([#369](#369)). In this release, changes have been made to improve version control management, reduce repository size, and enhance build times. A new directory, "spark-warehouse/", has been added to the Git ignore file to prevent unnecessary files from being tracked and included in the project. The `WriteToTableException` class has been added to the `exception.py` file to raise an error when a runtime exception occurs while writing data to a table. A new `ReconCapture` class has been implemented in the `reconcile` package to capture and persist reconciliation metadata in delta tables. The `recon` function has been updated to initialize this new class, passing in the required parameters. Additionally, a new file, `recon_capture.py`, has been added to the reconcile package, which implements the `ReconCapture` class responsible for capturing metadata related to data reconciliation. The `recon_config.py` file has been modified to introduce a new class, `ReconcileProcessDuration`, and restructure the classes `ReconcileOutput`, `MismatchOutput`, and `ThresholdOutput`. The commit also captures reconcile metadata in delta tables for dashboards in the context of unit tests in the `test_execute.py` file and includes a new file, `test_recon_capture.py`, to test the reconcile capture functionality of the `ReconCapture` class.
* Expand translation of Snowflake `expr` ([#351](#351)). In this release, the translation of the `expr` category in the Snowflake language has been significantly expanded, addressing uncovered grammar areas, incorrect interpretations, and duplicates. The `subquery` is now excluded as a valid `expr`, and new case classes such as `NextValue`, `ArrayAccess`, `JsonAccess`, `Collate`, and `Iff` have been added to the `Expression` class. These changes improve the comprehensiveness and accuracy of the Snowflake parser, allowing for a more flexible and accurate translation of various operations. Additionally, the `SnowflakeExpressionBuilder` class has been updated to handle previously unsupported cases, enhancing the parser's ability to parse Snowflake SQL expressions.
* Fixed orcale missing datatypes ([#333](#333)). In the latest release, the Oracle class of the Tokenizer in the open-source library has undergone a fix to address missing datatypes. Previously, the KEYWORDS mapping did not require Tokens for keys, which led to unsupported Oracle datatypes. This issue has been resolved by modifying the test_schema_compare.py file to ensure that all Oracle datatypes, including LONG, NCLOB, ROWID, UROWID, ANYTYPE, ANYDATA, ANYDATASET, XMLTYPE, SDO_GEOMETRY, SDO_TOPO_GEOMETRY, and SDO_GEORASTER, are now mapped to the TEXT TokenType. This improvement enhances the compatibility of the code with Oracle datatypes and increases the reliability of the schema comparison functionality, as demonstrated by the test function test_schema_compare, which now returns is_valid as True and a count of 0 for is_valid = `false` in the resulting dataframe.
* Fixed the recon_config functions to handle null values ([#399](#399)). In this release, the recon_config functions have been enhanced to manage null values and provide more flexible column mapping for reconciliation purposes. A `__post_init__` method has been added to certain classes to convert specified attributes to lowercase and handle null values. A new helper method, `_get_is_string`, has been introduced to determine if a column is of string type. Additionally, new functions such as `get_tgt_to_src_col_mapping_list`, `get_layer_tgt_to_src_col_mapping`, `get_src_to_tgt_col_mapping_list`, and `get_layer_src_to_tgt_col_mapping` have been added to retrieve column mappings, enhancing the overall functionality and robustness of the reconciliation process. These improvements will benefit software engineers by ensuring more accurate and reliable configuration handling, as well as providing more flexibility in mapping source and target columns during reconciliation.
* Improve Exception handling ([#392](#392)). The commit titled `Improve Exception Handling` enhances error handling in the project, addressing issues [#388](#388) and [#392](#392). Changes include refactoring the `create_adapter` method in the `DataSourceAdapter` class, updating method arguments in test functions, and adding new methods in the `test_execute.py` file for better test doubles. The `DataSourceAdapter` class is replaced with the `create_adapter` function, which takes the same arguments and returns an instance of the appropriate `DataSource` subclass based on the provided `engine` parameter. The diff also modifies the behavior of certain test methods to raise more specific and accurate exceptions. Overall, these changes improve exception handling, streamline the codebase, and provide clearer error messages for software engineers.
* Introduced morph_sql and morph_column_expr functions for inline transpilation and validation ([#328](#328)). Two new classes, TranspilationResult and ValidationResult, have been added to the config module of the remorph package to store the results of transpilation and validation. The morph_sql and morph_column_exp functions have been introduced to support inline transpilation and validation of SQL code and column expressions. A new class, Validator, has been added to the validation module to handle validation, and the validate_format_result method within this class has been updated to return a ValidationResult object. The _query method has also been added to the class, which executes a given SQL query and returns a tuple containing a boolean indicating success, any exception message, and the result of the query. Unit tests for these new functions have been updated to ensure proper functionality.
* Output for the reconcile function ([#389](#389)). A new function `get_key_form_dialect` has been added to the `config.py` module, which takes a `Dialect` object and returns the corresponding key used in the `SQLGLOT_DIALECTS` dictionary. Additionally, the `MorphConfig` dataclass has been updated to include a new attribute `__file__`, which sets the filename to "config.yml". The `get_dialect` function remains unchanged. Two new exceptions, `WriteToTableException` and `InvalidInputException`, have been introduced, and the existing `DataSourceRuntimeException` has been modified in the same module to improve error handling. The `execute.py` file's reconcile function has undergone several changes, including adding imports for `InvalidInputException`, `ReconCapture`, and `generate_final_reconcile_output` from `recon_exception` and `recon_capture` modules, and modifying the `ReconcileOutput` type. The `hash_query.py` file's reconcile function has been updated to include a new `_get_with_clause` method, which returns a `Select` object for a given DataFrame, and the `build_query` method has been updated to include a new query construction step using the `with_clause` object. The `threshold_query.py` file's reconcile function's output has been updated to include query and logger statements, a new method for allowing user transformations on threshold aliases, and the dialect specified in the sql method. A new `generate_final_reconcile_output` function has been added to the `recon_capture.py` file, which generates a reconcile output given a recon_id and a SparkSession. New classes and dataclasses, including `SchemaReconcileOutput`, `ReconcileProcessDuration`, `StatusOutput`, `ReconcileTableOutput`, and `ReconcileOutput`, have been introduced in the `reconcile/recon_config.py` file. The `tests/unit/reconcile/test_execute.py` file has been updated to include new test cases for the `recon` function, including tests for different report types and scenarios, such as data, schema, and all report types, exceptions, and incorrect report types. A new test case, `test_initialise_data_source`, has been added to test the `initialise_data_source` function, and the `test_recon_for_wrong_report_type` test case has been updated to expect an `InvalidInputException` when an incorrect report type is passed to the `recon` function. The `test_reconcile_data_with_threshold_and_row_report_type` test case has been added to test the `reconcile_data` method of the `Reconciliation` class with a row report type and threshold options. Overall, these changes improve the functionality and robustness of the reconcile process by providing more fine-grained control over the generation of the final reconcile output and better handling of exceptions and errors.
* Threshold Source and Target query builder ([#348](#348)). In this release, we've introduced a new method, `build_threshold_query`, that constructs a customizable threshold query based on a table's partition, join, and threshold columns configuration. The method identifies necessary columns, applies specified transformations, and includes a WHERE clause based on the filter defined in the table configuration. The resulting query is then converted to a SQL string using the dialect of the source database. Additionally, we've updated the test file for the threshold query builder in the reconcile package, including refactoring of function names and updated assertions for query comparison. We've added two new test methods: `test_build_threshold_query_with_single_threshold` and `test_build_threshold_query_with_multiple_thresholds`. These changes enhance the library's functionality, providing a more robust and customizable threshold query builder, and improve test coverage for various configurations and scenarios.
* Unpack nested alias ([#336](#336)). This release introduces a significant update to the 'lca_utils.py' file, addressing the limitation of not handling nested aliases in window expressions and where clauses, which resolves issue [#334](#334). The `unalias_lca_in_select` method has been implemented to recursively parse nested selects and unalias lateral column aliases, thereby identifying and handling unsupported lateral column aliases. This method is utilized in the `check_for_unsupported_lca` method to handle unsupported lateral column aliases in the input SQL string. Furthermore, the 'test_lca_utils.py' file has undergone changes, impacting several test functions and introducing two new ones, `test_fix_nested_lca` and 'test_fix_nested_lca_with_no_scope', to ensure the code's reliability and accuracy by preventing unnecessary assumptions and hallucinations. These updates demonstrate our commitment to improving the library's functionality and test coverage.
@nfx nfx mentioned this issue May 29, 2024
nfx added a commit that referenced this issue May 29, 2024
* Capture Reconcile metadata in delta tables for dashbaords
([#369](#369)). In this
release, changes have been made to improve version control management,
reduce repository size, and enhance build times. A new directory,
"spark-warehouse/", has been added to the Git ignore file to prevent
unnecessary files from being tracked and included in the project. The
`WriteToTableException` class has been added to the `exception.py` file
to raise an error when a runtime exception occurs while writing data to
a table. A new `ReconCapture` class has been implemented in the
`reconcile` package to capture and persist reconciliation metadata in
delta tables. The `recon` function has been updated to initialize this
new class, passing in the required parameters. Additionally, a new file,
`recon_capture.py`, has been added to the reconcile package, which
implements the `ReconCapture` class responsible for capturing metadata
related to data reconciliation. The `recon_config.py` file has been
modified to introduce a new class, `ReconcileProcessDuration`, and
restructure the classes `ReconcileOutput`, `MismatchOutput`, and
`ThresholdOutput`. The commit also captures reconcile metadata in delta
tables for dashboards in the context of unit tests in the
`test_execute.py` file and includes a new file, `test_recon_capture.py`,
to test the reconcile capture functionality of the `ReconCapture` class.
* Expand translation of Snowflake `expr`
([#351](#351)). In this
release, the translation of the `expr` category in the Snowflake
language has been significantly expanded, addressing uncovered grammar
areas, incorrect interpretations, and duplicates. The `subquery` is now
excluded as a valid `expr`, and new case classes such as `NextValue`,
`ArrayAccess`, `JsonAccess`, `Collate`, and `Iff` have been added to the
`Expression` class. These changes improve the comprehensiveness and
accuracy of the Snowflake parser, allowing for a more flexible and
accurate translation of various operations. Additionally, the
`SnowflakeExpressionBuilder` class has been updated to handle previously
unsupported cases, enhancing the parser's ability to parse Snowflake SQL
expressions.
* Fixed orcale missing datatypes
([#333](#333)). In the
latest release, the Oracle class of the Tokenizer in the open-source
library has undergone a fix to address missing datatypes. Previously,
the KEYWORDS mapping did not require Tokens for keys, which led to
unsupported Oracle datatypes. This issue has been resolved by modifying
the test_schema_compare.py file to ensure that all Oracle datatypes,
including LONG, NCLOB, ROWID, UROWID, ANYTYPE, ANYDATA, ANYDATASET,
XMLTYPE, SDO_GEOMETRY, SDO_TOPO_GEOMETRY, and SDO_GEORASTER, are now
mapped to the TEXT TokenType. This improvement enhances the
compatibility of the code with Oracle datatypes and increases the
reliability of the schema comparison functionality, as demonstrated by
the test function test_schema_compare, which now returns is_valid as
True and a count of 0 for is_valid = `false` in the resulting dataframe.
* Fixed the recon_config functions to handle null values
([#399](#399)). In this
release, the recon_config functions have been enhanced to manage null
values and provide more flexible column mapping for reconciliation
purposes. A `__post_init__` method has been added to certain classes to
convert specified attributes to lowercase and handle null values. A new
helper method, `_get_is_string`, has been introduced to determine if a
column is of string type. Additionally, new functions such as
`get_tgt_to_src_col_mapping_list`, `get_layer_tgt_to_src_col_mapping`,
`get_src_to_tgt_col_mapping_list`, and
`get_layer_src_to_tgt_col_mapping` have been added to retrieve column
mappings, enhancing the overall functionality and robustness of the
reconciliation process. These improvements will benefit software
engineers by ensuring more accurate and reliable configuration handling,
as well as providing more flexibility in mapping source and target
columns during reconciliation.
* Improve Exception handling
([#392](#392)). The
commit titled `Improve Exception Handling` enhances error handling in
the project, addressing issues
[#388](#388) and
[#392](#392). Changes
include refactoring the `create_adapter` method in the
`DataSourceAdapter` class, updating method arguments in test functions,
and adding new methods in the `test_execute.py` file for better test
doubles. The `DataSourceAdapter` class is replaced with the
`create_adapter` function, which takes the same arguments and returns an
instance of the appropriate `DataSource` subclass based on the provided
`engine` parameter. The diff also modifies the behavior of certain test
methods to raise more specific and accurate exceptions. Overall, these
changes improve exception handling, streamline the codebase, and provide
clearer error messages for software engineers.
* Introduced morph_sql and morph_column_expr functions for inline
transpilation and validation
([#328](#328)). Two new
classes, TranspilationResult and ValidationResult, have been added to
the config module of the remorph package to store the results of
transpilation and validation. The morph_sql and morph_column_exp
functions have been introduced to support inline transpilation and
validation of SQL code and column expressions. A new class, Validator,
has been added to the validation module to handle validation, and the
validate_format_result method within this class has been updated to
return a ValidationResult object. The _query method has also been added
to the class, which executes a given SQL query and returns a tuple
containing a boolean indicating success, any exception message, and the
result of the query. Unit tests for these new functions have been
updated to ensure proper functionality.
* Output for the reconcile function
([#389](#389)). A new
function `get_key_form_dialect` has been added to the `config.py`
module, which takes a `Dialect` object and returns the corresponding key
used in the `SQLGLOT_DIALECTS` dictionary. Additionally, the
`MorphConfig` dataclass has been updated to include a new attribute
`__file__`, which sets the filename to "config.yml". The `get_dialect`
function remains unchanged. Two new exceptions, `WriteToTableException`
and `InvalidInputException`, have been introduced, and the existing
`DataSourceRuntimeException` has been modified in the same module to
improve error handling. The `execute.py` file's reconcile function has
undergone several changes, including adding imports for
`InvalidInputException`, `ReconCapture`, and
`generate_final_reconcile_output` from `recon_exception` and
`recon_capture` modules, and modifying the `ReconcileOutput` type. The
`hash_query.py` file's reconcile function has been updated to include a
new `_get_with_clause` method, which returns a `Select` object for a
given DataFrame, and the `build_query` method has been updated to
include a new query construction step using the `with_clause` object.
The `threshold_query.py` file's reconcile function's output has been
updated to include query and logger statements, a new method for
allowing user transformations on threshold aliases, and the dialect
specified in the sql method. A new `generate_final_reconcile_output`
function has been added to the `recon_capture.py` file, which generates
a reconcile output given a recon_id and a SparkSession. New classes and
dataclasses, including `SchemaReconcileOutput`,
`ReconcileProcessDuration`, `StatusOutput`, `ReconcileTableOutput`, and
`ReconcileOutput`, have been introduced in the
`reconcile/recon_config.py` file. The
`tests/unit/reconcile/test_execute.py` file has been updated to include
new test cases for the `recon` function, including tests for different
report types and scenarios, such as data, schema, and all report types,
exceptions, and incorrect report types. A new test case,
`test_initialise_data_source`, has been added to test the
`initialise_data_source` function, and the
`test_recon_for_wrong_report_type` test case has been updated to expect
an `InvalidInputException` when an incorrect report type is passed to
the `recon` function. The
`test_reconcile_data_with_threshold_and_row_report_type` test case has
been added to test the `reconcile_data` method of the `Reconciliation`
class with a row report type and threshold options. Overall, these
changes improve the functionality and robustness of the reconcile
process by providing more fine-grained control over the generation of
the final reconcile output and better handling of exceptions and errors.
* Threshold Source and Target query builder
([#348](#348)). In this
release, we've introduced a new method, `build_threshold_query`, that
constructs a customizable threshold query based on a table's partition,
join, and threshold columns configuration. The method identifies
necessary columns, applies specified transformations, and includes a
WHERE clause based on the filter defined in the table configuration. The
resulting query is then converted to a SQL string using the dialect of
the source database. Additionally, we've updated the test file for the
threshold query builder in the reconcile package, including refactoring
of function names and updated assertions for query comparison. We've
added two new test methods:
`test_build_threshold_query_with_single_threshold` and
`test_build_threshold_query_with_multiple_thresholds`. These changes
enhance the library's functionality, providing a more robust and
customizable threshold query builder, and improve test coverage for
various configurations and scenarios.
* Unpack nested alias
([#336](#336)). This
release introduces a significant update to the 'lca_utils.py' file,
addressing the limitation of not handling nested aliases in window
expressions and where clauses, which resolves issue
[#334](#334). The
`unalias_lca_in_select` method has been implemented to recursively parse
nested selects and unalias lateral column aliases, thereby identifying
and handling unsupported lateral column aliases. This method is utilized
in the `check_for_unsupported_lca` method to handle unsupported lateral
column aliases in the input SQL string. Furthermore, the
'test_lca_utils.py' file has undergone changes, impacting several test
functions and introducing two new ones, `test_fix_nested_lca` and
'test_fix_nested_lca_with_no_scope', to ensure the code's reliability
and accuracy by preventing unnecessary assumptions and hallucinations.
These updates demonstrate our commitment to improving the library's
functionality and test coverage.
sundarshankar89 added a commit to sundarshankar89/remorph that referenced this issue Jan 2, 2025
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this issue Jan 2, 2025
* Capture Reconcile metadata in delta tables for dashbaords
([databrickslabs#369](databrickslabs#369)). In this
release, changes have been made to improve version control management,
reduce repository size, and enhance build times. A new directory,
"spark-warehouse/", has been added to the Git ignore file to prevent
unnecessary files from being tracked and included in the project. The
`WriteToTableException` class has been added to the `exception.py` file
to raise an error when a runtime exception occurs while writing data to
a table. A new `ReconCapture` class has been implemented in the
`reconcile` package to capture and persist reconciliation metadata in
delta tables. The `recon` function has been updated to initialize this
new class, passing in the required parameters. Additionally, a new file,
`recon_capture.py`, has been added to the reconcile package, which
implements the `ReconCapture` class responsible for capturing metadata
related to data reconciliation. The `recon_config.py` file has been
modified to introduce a new class, `ReconcileProcessDuration`, and
restructure the classes `ReconcileOutput`, `MismatchOutput`, and
`ThresholdOutput`. The commit also captures reconcile metadata in delta
tables for dashboards in the context of unit tests in the
`test_execute.py` file and includes a new file, `test_recon_capture.py`,
to test the reconcile capture functionality of the `ReconCapture` class.
* Expand translation of Snowflake `expr`
([databrickslabs#351](databrickslabs#351)). In this
release, the translation of the `expr` category in the Snowflake
language has been significantly expanded, addressing uncovered grammar
areas, incorrect interpretations, and duplicates. The `subquery` is now
excluded as a valid `expr`, and new case classes such as `NextValue`,
`ArrayAccess`, `JsonAccess`, `Collate`, and `Iff` have been added to the
`Expression` class. These changes improve the comprehensiveness and
accuracy of the Snowflake parser, allowing for a more flexible and
accurate translation of various operations. Additionally, the
`SnowflakeExpressionBuilder` class has been updated to handle previously
unsupported cases, enhancing the parser's ability to parse Snowflake SQL
expressions.
* Fixed orcale missing datatypes
([databrickslabs#333](databrickslabs#333)). In the
latest release, the Oracle class of the Tokenizer in the open-source
library has undergone a fix to address missing datatypes. Previously,
the KEYWORDS mapping did not require Tokens for keys, which led to
unsupported Oracle datatypes. This issue has been resolved by modifying
the test_schema_compare.py file to ensure that all Oracle datatypes,
including LONG, NCLOB, ROWID, UROWID, ANYTYPE, ANYDATA, ANYDATASET,
XMLTYPE, SDO_GEOMETRY, SDO_TOPO_GEOMETRY, and SDO_GEORASTER, are now
mapped to the TEXT TokenType. This improvement enhances the
compatibility of the code with Oracle datatypes and increases the
reliability of the schema comparison functionality, as demonstrated by
the test function test_schema_compare, which now returns is_valid as
True and a count of 0 for is_valid = `false` in the resulting dataframe.
* Fixed the recon_config functions to handle null values
([databrickslabs#399](databrickslabs#399)). In this
release, the recon_config functions have been enhanced to manage null
values and provide more flexible column mapping for reconciliation
purposes. A `__post_init__` method has been added to certain classes to
convert specified attributes to lowercase and handle null values. A new
helper method, `_get_is_string`, has been introduced to determine if a
column is of string type. Additionally, new functions such as
`get_tgt_to_src_col_mapping_list`, `get_layer_tgt_to_src_col_mapping`,
`get_src_to_tgt_col_mapping_list`, and
`get_layer_src_to_tgt_col_mapping` have been added to retrieve column
mappings, enhancing the overall functionality and robustness of the
reconciliation process. These improvements will benefit software
engineers by ensuring more accurate and reliable configuration handling,
as well as providing more flexibility in mapping source and target
columns during reconciliation.
* Improve Exception handling
([databrickslabs#392](databrickslabs#392)). The
commit titled `Improve Exception Handling` enhances error handling in
the project, addressing issues
[databrickslabs#388](databrickslabs#388) and
[databrickslabs#392](databrickslabs#392). Changes
include refactoring the `create_adapter` method in the
`DataSourceAdapter` class, updating method arguments in test functions,
and adding new methods in the `test_execute.py` file for better test
doubles. The `DataSourceAdapter` class is replaced with the
`create_adapter` function, which takes the same arguments and returns an
instance of the appropriate `DataSource` subclass based on the provided
`engine` parameter. The diff also modifies the behavior of certain test
methods to raise more specific and accurate exceptions. Overall, these
changes improve exception handling, streamline the codebase, and provide
clearer error messages for software engineers.
* Introduced morph_sql and morph_column_expr functions for inline
transpilation and validation
([databrickslabs#328](databrickslabs#328)). Two new
classes, TranspilationResult and ValidationResult, have been added to
the config module of the remorph package to store the results of
transpilation and validation. The morph_sql and morph_column_exp
functions have been introduced to support inline transpilation and
validation of SQL code and column expressions. A new class, Validator,
has been added to the validation module to handle validation, and the
validate_format_result method within this class has been updated to
return a ValidationResult object. The _query method has also been added
to the class, which executes a given SQL query and returns a tuple
containing a boolean indicating success, any exception message, and the
result of the query. Unit tests for these new functions have been
updated to ensure proper functionality.
* Output for the reconcile function
([databrickslabs#389](databrickslabs#389)). A new
function `get_key_form_dialect` has been added to the `config.py`
module, which takes a `Dialect` object and returns the corresponding key
used in the `SQLGLOT_DIALECTS` dictionary. Additionally, the
`MorphConfig` dataclass has been updated to include a new attribute
`__file__`, which sets the filename to "config.yml". The `get_dialect`
function remains unchanged. Two new exceptions, `WriteToTableException`
and `InvalidInputException`, have been introduced, and the existing
`DataSourceRuntimeException` has been modified in the same module to
improve error handling. The `execute.py` file's reconcile function has
undergone several changes, including adding imports for
`InvalidInputException`, `ReconCapture`, and
`generate_final_reconcile_output` from `recon_exception` and
`recon_capture` modules, and modifying the `ReconcileOutput` type. The
`hash_query.py` file's reconcile function has been updated to include a
new `_get_with_clause` method, which returns a `Select` object for a
given DataFrame, and the `build_query` method has been updated to
include a new query construction step using the `with_clause` object.
The `threshold_query.py` file's reconcile function's output has been
updated to include query and logger statements, a new method for
allowing user transformations on threshold aliases, and the dialect
specified in the sql method. A new `generate_final_reconcile_output`
function has been added to the `recon_capture.py` file, which generates
a reconcile output given a recon_id and a SparkSession. New classes and
dataclasses, including `SchemaReconcileOutput`,
`ReconcileProcessDuration`, `StatusOutput`, `ReconcileTableOutput`, and
`ReconcileOutput`, have been introduced in the
`reconcile/recon_config.py` file. The
`tests/unit/reconcile/test_execute.py` file has been updated to include
new test cases for the `recon` function, including tests for different
report types and scenarios, such as data, schema, and all report types,
exceptions, and incorrect report types. A new test case,
`test_initialise_data_source`, has been added to test the
`initialise_data_source` function, and the
`test_recon_for_wrong_report_type` test case has been updated to expect
an `InvalidInputException` when an incorrect report type is passed to
the `recon` function. The
`test_reconcile_data_with_threshold_and_row_report_type` test case has
been added to test the `reconcile_data` method of the `Reconciliation`
class with a row report type and threshold options. Overall, these
changes improve the functionality and robustness of the reconcile
process by providing more fine-grained control over the generation of
the final reconcile output and better handling of exceptions and errors.
* Threshold Source and Target query builder
([databrickslabs#348](databrickslabs#348)). In this
release, we've introduced a new method, `build_threshold_query`, that
constructs a customizable threshold query based on a table's partition,
join, and threshold columns configuration. The method identifies
necessary columns, applies specified transformations, and includes a
WHERE clause based on the filter defined in the table configuration. The
resulting query is then converted to a SQL string using the dialect of
the source database. Additionally, we've updated the test file for the
threshold query builder in the reconcile package, including refactoring
of function names and updated assertions for query comparison. We've
added two new test methods:
`test_build_threshold_query_with_single_threshold` and
`test_build_threshold_query_with_multiple_thresholds`. These changes
enhance the library's functionality, providing a more robust and
customizable threshold query builder, and improve test coverage for
various configurations and scenarios.
* Unpack nested alias
([databrickslabs#336](databrickslabs#336)). This
release introduces a significant update to the 'lca_utils.py' file,
addressing the limitation of not handling nested aliases in window
expressions and where clauses, which resolves issue
[databrickslabs#334](databrickslabs#334). The
`unalias_lca_in_select` method has been implemented to recursively parse
nested selects and unalias lateral column aliases, thereby identifying
and handling unsupported lateral column aliases. This method is utilized
in the `check_for_unsupported_lca` method to handle unsupported lateral
column aliases in the input SQL string. Furthermore, the
'test_lca_utils.py' file has undergone changes, impacting several test
functions and introducing two new ones, `test_fix_nested_lca` and
'test_fix_nested_lca_with_no_scope', to ensure the code's reliability
and accuracy by preventing unnecessary assumptions and hallucinations.
These updates demonstrate our commitment to improving the library's
functionality and test coverage.
sundarshankar89 added a commit to sundarshankar89/remorph that referenced this issue Jan 3, 2025
sundarshankar89 pushed a commit to sundarshankar89/remorph that referenced this issue Jan 3, 2025
* Capture Reconcile metadata in delta tables for dashbaords
([databrickslabs#369](databrickslabs#369)). In this
release, changes have been made to improve version control management,
reduce repository size, and enhance build times. A new directory,
"spark-warehouse/", has been added to the Git ignore file to prevent
unnecessary files from being tracked and included in the project. The
`WriteToTableException` class has been added to the `exception.py` file
to raise an error when a runtime exception occurs while writing data to
a table. A new `ReconCapture` class has been implemented in the
`reconcile` package to capture and persist reconciliation metadata in
delta tables. The `recon` function has been updated to initialize this
new class, passing in the required parameters. Additionally, a new file,
`recon_capture.py`, has been added to the reconcile package, which
implements the `ReconCapture` class responsible for capturing metadata
related to data reconciliation. The `recon_config.py` file has been
modified to introduce a new class, `ReconcileProcessDuration`, and
restructure the classes `ReconcileOutput`, `MismatchOutput`, and
`ThresholdOutput`. The commit also captures reconcile metadata in delta
tables for dashboards in the context of unit tests in the
`test_execute.py` file and includes a new file, `test_recon_capture.py`,
to test the reconcile capture functionality of the `ReconCapture` class.
* Expand translation of Snowflake `expr`
([databrickslabs#351](databrickslabs#351)). In this
release, the translation of the `expr` category in the Snowflake
language has been significantly expanded, addressing uncovered grammar
areas, incorrect interpretations, and duplicates. The `subquery` is now
excluded as a valid `expr`, and new case classes such as `NextValue`,
`ArrayAccess`, `JsonAccess`, `Collate`, and `Iff` have been added to the
`Expression` class. These changes improve the comprehensiveness and
accuracy of the Snowflake parser, allowing for a more flexible and
accurate translation of various operations. Additionally, the
`SnowflakeExpressionBuilder` class has been updated to handle previously
unsupported cases, enhancing the parser's ability to parse Snowflake SQL
expressions.
* Fixed orcale missing datatypes
([databrickslabs#333](databrickslabs#333)). In the
latest release, the Oracle class of the Tokenizer in the open-source
library has undergone a fix to address missing datatypes. Previously,
the KEYWORDS mapping did not require Tokens for keys, which led to
unsupported Oracle datatypes. This issue has been resolved by modifying
the test_schema_compare.py file to ensure that all Oracle datatypes,
including LONG, NCLOB, ROWID, UROWID, ANYTYPE, ANYDATA, ANYDATASET,
XMLTYPE, SDO_GEOMETRY, SDO_TOPO_GEOMETRY, and SDO_GEORASTER, are now
mapped to the TEXT TokenType. This improvement enhances the
compatibility of the code with Oracle datatypes and increases the
reliability of the schema comparison functionality, as demonstrated by
the test function test_schema_compare, which now returns is_valid as
True and a count of 0 for is_valid = `false` in the resulting dataframe.
* Fixed the recon_config functions to handle null values
([databrickslabs#399](databrickslabs#399)). In this
release, the recon_config functions have been enhanced to manage null
values and provide more flexible column mapping for reconciliation
purposes. A `__post_init__` method has been added to certain classes to
convert specified attributes to lowercase and handle null values. A new
helper method, `_get_is_string`, has been introduced to determine if a
column is of string type. Additionally, new functions such as
`get_tgt_to_src_col_mapping_list`, `get_layer_tgt_to_src_col_mapping`,
`get_src_to_tgt_col_mapping_list`, and
`get_layer_src_to_tgt_col_mapping` have been added to retrieve column
mappings, enhancing the overall functionality and robustness of the
reconciliation process. These improvements will benefit software
engineers by ensuring more accurate and reliable configuration handling,
as well as providing more flexibility in mapping source and target
columns during reconciliation.
* Improve Exception handling
([databrickslabs#392](databrickslabs#392)). The
commit titled `Improve Exception Handling` enhances error handling in
the project, addressing issues
[databrickslabs#388](databrickslabs#388) and
[databrickslabs#392](databrickslabs#392). Changes
include refactoring the `create_adapter` method in the
`DataSourceAdapter` class, updating method arguments in test functions,
and adding new methods in the `test_execute.py` file for better test
doubles. The `DataSourceAdapter` class is replaced with the
`create_adapter` function, which takes the same arguments and returns an
instance of the appropriate `DataSource` subclass based on the provided
`engine` parameter. The diff also modifies the behavior of certain test
methods to raise more specific and accurate exceptions. Overall, these
changes improve exception handling, streamline the codebase, and provide
clearer error messages for software engineers.
* Introduced morph_sql and morph_column_expr functions for inline
transpilation and validation
([databrickslabs#328](databrickslabs#328)). Two new
classes, TranspilationResult and ValidationResult, have been added to
the config module of the remorph package to store the results of
transpilation and validation. The morph_sql and morph_column_exp
functions have been introduced to support inline transpilation and
validation of SQL code and column expressions. A new class, Validator,
has been added to the validation module to handle validation, and the
validate_format_result method within this class has been updated to
return a ValidationResult object. The _query method has also been added
to the class, which executes a given SQL query and returns a tuple
containing a boolean indicating success, any exception message, and the
result of the query. Unit tests for these new functions have been
updated to ensure proper functionality.
* Output for the reconcile function
([databrickslabs#389](databrickslabs#389)). A new
function `get_key_form_dialect` has been added to the `config.py`
module, which takes a `Dialect` object and returns the corresponding key
used in the `SQLGLOT_DIALECTS` dictionary. Additionally, the
`MorphConfig` dataclass has been updated to include a new attribute
`__file__`, which sets the filename to "config.yml". The `get_dialect`
function remains unchanged. Two new exceptions, `WriteToTableException`
and `InvalidInputException`, have been introduced, and the existing
`DataSourceRuntimeException` has been modified in the same module to
improve error handling. The `execute.py` file's reconcile function has
undergone several changes, including adding imports for
`InvalidInputException`, `ReconCapture`, and
`generate_final_reconcile_output` from `recon_exception` and
`recon_capture` modules, and modifying the `ReconcileOutput` type. The
`hash_query.py` file's reconcile function has been updated to include a
new `_get_with_clause` method, which returns a `Select` object for a
given DataFrame, and the `build_query` method has been updated to
include a new query construction step using the `with_clause` object.
The `threshold_query.py` file's reconcile function's output has been
updated to include query and logger statements, a new method for
allowing user transformations on threshold aliases, and the dialect
specified in the sql method. A new `generate_final_reconcile_output`
function has been added to the `recon_capture.py` file, which generates
a reconcile output given a recon_id and a SparkSession. New classes and
dataclasses, including `SchemaReconcileOutput`,
`ReconcileProcessDuration`, `StatusOutput`, `ReconcileTableOutput`, and
`ReconcileOutput`, have been introduced in the
`reconcile/recon_config.py` file. The
`tests/unit/reconcile/test_execute.py` file has been updated to include
new test cases for the `recon` function, including tests for different
report types and scenarios, such as data, schema, and all report types,
exceptions, and incorrect report types. A new test case,
`test_initialise_data_source`, has been added to test the
`initialise_data_source` function, and the
`test_recon_for_wrong_report_type` test case has been updated to expect
an `InvalidInputException` when an incorrect report type is passed to
the `recon` function. The
`test_reconcile_data_with_threshold_and_row_report_type` test case has
been added to test the `reconcile_data` method of the `Reconciliation`
class with a row report type and threshold options. Overall, these
changes improve the functionality and robustness of the reconcile
process by providing more fine-grained control over the generation of
the final reconcile output and better handling of exceptions and errors.
* Threshold Source and Target query builder
([databrickslabs#348](databrickslabs#348)). In this
release, we've introduced a new method, `build_threshold_query`, that
constructs a customizable threshold query based on a table's partition,
join, and threshold columns configuration. The method identifies
necessary columns, applies specified transformations, and includes a
WHERE clause based on the filter defined in the table configuration. The
resulting query is then converted to a SQL string using the dialect of
the source database. Additionally, we've updated the test file for the
threshold query builder in the reconcile package, including refactoring
of function names and updated assertions for query comparison. We've
added two new test methods:
`test_build_threshold_query_with_single_threshold` and
`test_build_threshold_query_with_multiple_thresholds`. These changes
enhance the library's functionality, providing a more robust and
customizable threshold query builder, and improve test coverage for
various configurations and scenarios.
* Unpack nested alias
([databrickslabs#336](databrickslabs#336)). This
release introduces a significant update to the 'lca_utils.py' file,
addressing the limitation of not handling nested aliases in window
expressions and where clauses, which resolves issue
[databrickslabs#334](databrickslabs#334). The
`unalias_lca_in_select` method has been implemented to recursively parse
nested selects and unalias lateral column aliases, thereby identifying
and handling unsupported lateral column aliases. This method is utilized
in the `check_for_unsupported_lca` method to handle unsupported lateral
column aliases in the input SQL string. Furthermore, the
'test_lca_utils.py' file has undergone changes, impacting several test
functions and introducing two new ones, `test_fix_nested_lca` and
'test_fix_nested_lca_with_no_scope', to ensure the code's reliability
and accuracy by preventing unnecessary assumptions and hallucinations.
These updates demonstrate our commitment to improving the library's
functionality and test coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants