Release v0.52.0 (#3445)

* Added handling for Databricks errors during workspace listings in the table migration status refresher ([#3378](#3378)). In this release, we have implemented changes to enhance error handling and improve the stability of the table migration status refresher in the open-source library. We have resolved issue [#3262](#3262), which addressed Databricks errors during workspace listings. The `assessment` workflow has been updated, and new unit tests have been added to ensure proper error handling. The changes include the import of `DatabricksError` from the `databricks.sdk.errors` module and the addition of a new method `_iter_catalogs` to list catalogs with error handling for `DatabricksError`. The `_iter_schemas` method now replaces `_ws.catalogs.list()` with `self._iter_catalogs()`, also including error handling for `DatabricksError`. Furthermore, new unit tests have been developed to check the logging of the `TableMigration` class when listing tables in the Databricks workspace, focusing on handling errors during catalog, schema, and table listings. These changes improve the library's robustness and ensure that it can gracefully handle errors during the table migration status refresher process. * Convert READ_METADATA to UC BROWSE permission for tables, views and database ([#3403](#3403)). The `uc_grant_sql` method in the `grants.py` file has been modified to convert `READ_METADATA` permissions to `BROWSE` permissions for tables, views, and databases. This change involves adding new entries to the dictionary used to map permission types to their corresponding UC actions and has been manually tested. The behavior of the `grant_loader` function in the `hive_metastore` module has also been modified to change the action type of a grant from `READ_METADATA` to `EXECUTE` for a specific case. Additionally, the `test_grants.py` unit test file has been updated to include a new test case that verifies the conversion of `READ_METADATA` to `BROWSE` for a grant on a database and handles the conversion of `READ_METADATA` permission to `UC BROWSE` for a new `udf="function"` parameter. These changes resolve issue [#2023](#2023) and have been tested through manual testing and unit tests. No new methods have been added, and existing functionality has been changed in a limited scope. No new unit or integration tests have been added as it is assumed that the existing tests will continue to pass after these changes have been made. * Migrates Pipelines crawled during the assessment phase ([#2778](#2778)). A new utility class, `PipelineMigrator`, has been introduced in this release to facilitate the migration of Databricks Labs SQL (DLT) pipelines. This class is used in a new workflow that tests pipeline migration, which involves cloning DLT pipelines in the assessment phase with specific configurations to a new Unity Catalog (UC) pipeline. The migration can be skipped for certain pipelines by specifying their pipeline IDs in a list. Three test scenarios, each with different pipeline specifications, are defined to ensure the proper functioning of the migration process under various conditions. The class and the migration process are thoroughly tested with manual testing, unit tests, and integration tests, with no reliance on a staging environment. The migration process takes into account the `WorkspaceClient`, `WorkspaceContext`, `AccountClient`, and a flag for running the command as a collection. The `PipelinesMigrator` class uses a `PipelinesCrawler` and `JobsCrawler` to perform the migration and ensures better functionality for the users with additional parameters. The commit also introduces a new command, `migrate_dlt_pipelines`, to the CLI of the ucx package, which helps migrate DLT pipelines. The migration process is tested using a mock installation, unit tests, and integration tests. The tests cover the scenario where the installation has two jobs, `test` and 'assessment', with job IDs `123` and `456` respectively. The state of the installation is recorded in a `state.json` file. A configuration file `pipeline_mapping.csv` is used to map the source pipeline ID to the target catalog, schema, pipeline, and workspace names. * Removed `try-except` around verifying the migration progress prerequisites in the `migrate-tables` cli command ([#3439](#3439)). In the latest release, the `ucx` package's `migrate-tables` CLI command has undergone a significant modification in the handling of progress tracking prerequisites. The previous try-except block surrounding the verification has been removed, and the RuntimeWarning is now propagated, providing a more specific and helpful error message. If the prerequisites are not met, the `verify` method will raise an exception, and the migration will not proceed. This change enhances the accuracy of error messages for users and ensures that the prerequisites for migration are properly met. The tests for `migrate_tables` have been updated accordingly, including a new test case `test_migrate_tables_errors_out_before_assessment` that checks whether the migration does not proceed with the verification fails. This change affects the existing `databricks labs ucx migrate-tables` command and brings improved precision and reliability to the migration process. * Removed redundant internal methods from create_account_group ([#3395](#3395)). In this change, the `create_account_group` function's internal methods have been removed, and its signature has been modified to retrieve the workspace ID from `accountworkspace._workspaces()` instead of passing it as a parameter. This resolves issue [#3170](#3170) and improves code efficiency by removing unnecessary parameters and methods. The `AccountWorkspaces` class now accepts a list of workspace IDs upon instantiation, enhancing code readability and eliminating redundancy. The function has been tested with unit tests, ensuring it creates a group if it doesn't exist, throws an exception if a group already exists, filters system groups, and handles cases where a group already has the required number of members in a workspace. These changes simplify the codebase, eliminate redundancy, and improve the maintainability of the project. * Updated sqlglot requirement from <25.33,>=25.5.0 to >=25.5.0,<25.34 ([#3407](#3407)). In this release, we have updated the sqlglot requirement to version 25.33.9999 from a range that included versions 25.5.0 to 25.32.9999. This update allows us to utilize the latest version of sqlglot, which includes various bug fixes and new features. In v25.33.0, there were two breaking changes: the TIMESTAMP data type now maps to Type.TIMESTAMPTZ, and the NEXT keyword is now treated as a function keyword. Several new features were also introduced, including support for generated columns in PostgreSQL and the ability to preserve tables in the replace_table method. Additionally, there were several bug fixes, including fixes for issues related to BigQuery, Presto, and Spark. The v25.32.1 release contained two bug fixes related to BigQuery and one bug fix related to Presto. Furthermore, v25.32.0 had three breaking changes: support for ATTACH/DETACH statements, tokenization of hints as comments, and a fix to datetime coercion in the canonicalize rule. This release also introduced new features, such as support for TO_TIMESTAMP\* variants in Snowflake and improved error messages in the Redshift transpiler. Lastly, there were several bug fixes, including fixes for issues related to SQL Server, MySQL, and PostgreSQL. * Updated sqlglot requirement from <25.33,>=25.5.0 to >=25.5.0,<25.35 ([#3413](#3413)). In this release, the `sqlglot` dependency has been updated from a version range that allows up to `25.33`, but excludes `25.34`, to a version range that allows `25.5.0` and above, but excludes `25.35`. This update was made to enable the latest version of `sqlglot`, which includes one breaking change related to the alias expansion of USING STRUCT fields. This version also introduces two new features, an optimization for alias expansion of USING STRUCT fields, and support for generated columns in PostgreSQL. Additionally, two bug fixes were implemented, addressing proper consumption of dashed table parts and removal of parentheses from CURRENT_USER in Presto. The update also includes a fix to make TIMESTAMP map to Type.TIMESTAMPTZ, a fix to parse DEFAULT in VALUES clause into a Var, and changes to the BigQuery and Snowflake dialects to improve transpilation and JSONPathTokenizer leniency. The commit message includes a reference to issue `[#3413](https://github.com/databrickslabs/ucx/issues/3413)` and a link to the `sqlglot` changelog for further reference. * Updated sqlglot requirement from <25.35,>=25.5.0 to >=25.5.0,<26.1 ([#3433](#3433)). In this release, we have updated the required version of the `sqlglot` library to a range that includes version 25.5.0 but excludes version 26.1. This change is crucial due to the breaking changes introduced in `sqlglot` v26.0.0 that are not yet compatible with our project. The commit message includes the changelog for `sqlglot` v26.0.0, which highlights the breaking changes, new features, bug fixes, and other modifications in this version. Additionally, the commit includes a list of commits merged into the `sqlglot` repository for a comprehensive understanding of the changes. As a software engineer, I recommend approving this change to maintain compatibility with `sqlglot`. However, I advise thorough testing to ensure the updated version does not introduce any new issues. Furthermore, I suggest keeping track of future `sqlglot` updates to ensure the project stays up-to-date with the library. * changing table_migration to user_isolation ([#3389](#3389)). In this release, the job cluster name in the Hive Metastore to Unity Catalog migration workflows has been changed from `table_migration` to "user_isolation." This renaming change affects all references to the job cluster in various methods including convert_managed_table, migrate_external_tables_sync, migrate_dbfs_root_delta_tables, migrate_dbfs_root_non_delta_tables, migrate_views, migrate_hive_serde_in_place, and update_migration_status, as well as job_task decorators that specify the job cluster. This change enhances user isolation during the migration process and resolves issue [#3172](#3172). Engineers should note that this change purely affects naming and does not modify the functionality of the code.
databrickslabs · Dec 12, 2024 · 136c536 · 136c536
1 parent 4ded40e
commit 136c536
Show file tree

Hide file tree

Showing 2 changed files with 14 additions and 1 deletion.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,18 @@
 # Version changelog
 
+## 0.52.0
+
+* Added handling for Databricks errors during workspace listings in the table migration status refresher ([#3378](https://github.com/databrickslabs/ucx/issues/3378)). In this release, we have implemented changes to enhance error handling and improve the stability of the table migration status refresher in the open-source library. We have resolved issue [#3262](https://github.com/databrickslabs/ucx/issues/3262), which addressed Databricks errors during workspace listings. The `assessment` workflow has been updated, and new unit tests have been added to ensure proper error handling. The changes include the import of `DatabricksError` from the `databricks.sdk.errors` module and the addition of a new method `_iter_catalogs` to list catalogs with error handling for `DatabricksError`. The `_iter_schemas` method now replaces `_ws.catalogs.list()` with `self._iter_catalogs()`, also including error handling for `DatabricksError`. Furthermore, new unit tests have been developed to check the logging of the `TableMigration` class when listing tables in the Databricks workspace, focusing on handling errors during catalog, schema, and table listings. These changes improve the library's robustness and ensure that it can gracefully handle errors during the table migration status refresher process.
+* Convert READ_METADATA to UC BROWSE permission for tables, views and database ([#3403](https://github.com/databrickslabs/ucx/issues/3403)). The `uc_grant_sql` method in the `grants.py` file has been modified to convert `READ_METADATA` permissions to `BROWSE` permissions for tables, views, and databases. This change involves adding new entries to the dictionary used to map permission types to their corresponding UC actions and has been manually tested. The behavior of the `grant_loader` function in the `hive_metastore` module has also been modified to change the action type of a grant from `READ_METADATA` to `EXECUTE` for a specific case. Additionally, the `test_grants.py` unit test file has been updated to include a new test case that verifies the conversion of `READ_METADATA` to `BROWSE` for a grant on a database and handles the conversion of `READ_METADATA` permission to `UC BROWSE` for a new `udf="function"` parameter. These changes resolve issue [#2023](https://github.com/databrickslabs/ucx/issues/2023) and have been tested through manual testing and unit tests. No new methods have been added, and existing functionality has been changed in a limited scope. No new unit or integration tests have been added as it is assumed that the existing tests will continue to pass after these changes have been made.
+* Migrates Pipelines crawled during the assessment phase ([#2778](https://github.com/databrickslabs/ucx/issues/2778)). A new utility class, `PipelineMigrator`, has been introduced in this release to facilitate the migration of Databricks Labs SQL (DLT) pipelines. This class is used in a new workflow that tests pipeline migration, which involves cloning DLT pipelines in the assessment phase with specific configurations to a new Unity Catalog (UC) pipeline. The migration can be skipped for certain pipelines by specifying their pipeline IDs in a list. Three test scenarios, each with different pipeline specifications, are defined to ensure the proper functioning of the migration process under various conditions. The class and the migration process are thoroughly tested with manual testing, unit tests, and integration tests, with no reliance on a staging environment. The migration process takes into account the `WorkspaceClient`, `WorkspaceContext`, `AccountClient`, and a flag for running the command as a collection. The `PipelinesMigrator` class uses a `PipelinesCrawler` and `JobsCrawler` to perform the migration and ensures better functionality for the users with additional parameters. The commit also introduces a new command, `migrate_dlt_pipelines`, to the CLI of the ucx package, which helps migrate DLT pipelines. The migration process is tested using a mock installation, unit tests, and integration tests. The tests cover the scenario where the installation has two jobs, `test` and 'assessment', with job IDs `123` and `456` respectively. The state of the installation is recorded in a `state.json` file. A configuration file `pipeline_mapping.csv` is used to map the source pipeline ID to the target catalog, schema, pipeline, and workspace names.
+* Removed `try-except` around verifying the migration progress prerequisites in the `migrate-tables` cli command ([#3439](https://github.com/databrickslabs/ucx/issues/3439)). In the latest release, the `ucx` package's `migrate-tables` CLI command has undergone a significant modification in the handling of progress tracking prerequisites. The previous try-except block surrounding the verification has been removed, and the RuntimeWarning is now propagated, providing a more specific and helpful error message. If the prerequisites are not met, the `verify` method will raise an exception, and the migration will not proceed. This change enhances the accuracy of error messages for users and ensures that the prerequisites for migration are properly met. The tests for `migrate_tables` have been updated accordingly, including a new test case `test_migrate_tables_errors_out_before_assessment` that checks whether the migration does not proceed with the verification fails. This change affects the existing `databricks labs ucx migrate-tables` command and brings improved precision and reliability to the migration process.
+* Removed redundant internal methods from create_account_group ([#3395](https://github.com/databrickslabs/ucx/issues/3395)). In this change, the `create_account_group` function's internal methods have been removed, and its signature has been modified to retrieve the workspace ID from `accountworkspace._workspaces()` instead of passing it as a parameter. This resolves issue [#3170](https://github.com/databrickslabs/ucx/issues/3170) and improves code efficiency by removing unnecessary parameters and methods. The `AccountWorkspaces` class now accepts a list of workspace IDs upon instantiation, enhancing code readability and eliminating redundancy. The function has been tested with unit tests, ensuring it creates a group if it doesn't exist, throws an exception if a group already exists, filters system groups, and handles cases where a group already has the required number of members in a workspace. These changes simplify the codebase, eliminate redundancy, and improve the maintainability of the project.
+* Updated sqlglot requirement from <25.33,>=25.5.0 to >=25.5.0,<25.34 ([#3407](https://github.com/databrickslabs/ucx/issues/3407)). In this release, we have updated the sqlglot requirement to version 25.33.9999 from a range that included versions 25.5.0 to 25.32.9999. This update allows us to utilize the latest version of sqlglot, which includes various bug fixes and new features. In v25.33.0, there were two breaking changes: the TIMESTAMP data type now maps to Type.TIMESTAMPTZ, and the NEXT keyword is now treated as a function keyword. Several new features were also introduced, including support for generated columns in PostgreSQL and the ability to preserve tables in the replace_table method. Additionally, there were several bug fixes, including fixes for issues related to BigQuery, Presto, and Spark. The v25.32.1 release contained two bug fixes related to BigQuery and one bug fix related to Presto. Furthermore, v25.32.0 had three breaking changes: support for ATTACH/DETACH statements, tokenization of hints as comments, and a fix to datetime coercion in the canonicalize rule. This release also introduced new features, such as support for TO_TIMESTAMP\* variants in Snowflake and improved error messages in the Redshift transpiler. Lastly, there were several bug fixes, including fixes for issues related to SQL Server, MySQL, and PostgreSQL.
+* Updated sqlglot requirement from <25.33,>=25.5.0 to >=25.5.0,<25.35 ([#3413](https://github.com/databrickslabs/ucx/issues/3413)). In this release, the `sqlglot` dependency has been updated from a version range that allows up to `25.33`, but excludes `25.34`, to a version range that allows `25.5.0` and above, but excludes `25.35`. This update was made to enable the latest version of `sqlglot`, which includes one breaking change related to the alias expansion of USING STRUCT fields. This version also introduces two new features, an optimization for alias expansion of USING STRUCT fields, and support for generated columns in PostgreSQL. Additionally, two bug fixes were implemented, addressing proper consumption of dashed table parts and removal of parentheses from CURRENT_USER in Presto. The update also includes a fix to make TIMESTAMP map to Type.TIMESTAMPTZ, a fix to parse DEFAULT in VALUES clause into a Var, and changes to the BigQuery and Snowflake dialects to improve transpilation and JSONPathTokenizer leniency. The commit message includes a reference to issue `[#3413](https://github.com/databrickslabs/ucx/issues/3413)` and a link to the `sqlglot` changelog for further reference.
+* Updated sqlglot requirement from <25.35,>=25.5.0 to >=25.5.0,<26.1 ([#3433](https://github.com/databrickslabs/ucx/issues/3433)). In this release, we have updated the required version of the `sqlglot` library to a range that includes version 25.5.0 but excludes version 26.1. This change is crucial due to the breaking changes introduced in `sqlglot` v26.0.0 that are not yet compatible with our project. The commit message includes the changelog for `sqlglot` v26.0.0, which highlights the breaking changes, new features, bug fixes, and other modifications in this version. Additionally, the commit includes a list of commits merged into the `sqlglot` repository for a comprehensive understanding of the changes. As a software engineer, I recommend approving this change to maintain compatibility with `sqlglot`. However, I advise thorough testing to ensure the updated version does not introduce any new issues. Furthermore, I suggest keeping track of future `sqlglot` updates to ensure the project stays up-to-date with the library.
+* changing table_migration to user_isolation ([#3389](https://github.com/databrickslabs/ucx/issues/3389)). In this release, the job cluster name in the Hive Metastore to Unity Catalog migration workflows has been changed from `table_migration` to "user_isolation." This renaming change affects all references to the job cluster in various methods including convert_managed_table, migrate_external_tables_sync, migrate_dbfs_root_delta_tables, migrate_dbfs_root_non_delta_tables, migrate_views, migrate_hive_serde_in_place, and update_migration_status, as well as job_task decorators that specify the job cluster. This change enhances user isolation during the migration process and resolves issue [#3172](https://github.com/databrickslabs/ucx/issues/3172). Engineers should note that this change purely affects naming and does not modify the functionality of the code.
+
+
 ## 0.51.0
 
 * Added `assign-owner-group` command ([#3111](https://github.com/databrickslabs/ucx/issues/3111)). The Databricks Labs Unity Catalog Exporter (UCX) tool now includes a new `assign-owner-group` command, allowing users to assign an owner group to the workspace. This group will be designated as the owner for all migrated tables and views, providing better control and organization of resources. The command can be executed in the context of a specific workspace or across multiple workspaces. The implementation includes new classes, methods, and attributes in various files, such as `cli.py`, `config.py`, and `groups.py`, enhancing ownership management functionality. The `assign-owner-group` command replaces the functionality of issue [#3075](https://github.com/databrickslabs/ucx/issues/3075) and addresses issue [#2890](https://github.com/databrickslabs/ucx/issues/2890), ensuring proper schema ownership and handling of crawled grants. Developers should be aware that running the `migrate-tables` workflow will result in assigning a new owner group for the Hive Metastore instance in the workspace installation.

diff --git a/src/databricks/labs/ucx/__about__.py b/src/databricks/labs/ucx/__about__.py
@@ -1,2 +1,2 @@
 # DO NOT MODIFY THIS FILE
-__version__ = "0.51.0"
+__version__ = "0.52.0"