Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.54.0 #3570

Merged
merged 1 commit into from
Jan 23, 2025
Merged

Release v0.54.0 #3570

merged 1 commit into from
Jan 23, 2025

Conversation

gueniai
Copy link
Collaborator

@gueniai gueniai commented Jan 23, 2025

  • Implement disposition field in SQL backend (#3477). This commit adds a query_statement_disposition configuration option for the SQL backend in the UCX tool, allowing users to specify the disposition of SQL statements during assessment results export and preventing failures when dealing with large workspaces and a large number of findings. The new configuration option is added to the config.yml file and used by the SqlBackend definition. The databricks labs install ucx and databricks labs ucx export-assessment commands have been modified to support this new functionality. A new Disposition enum has been added to the databricks.sdk.service.sql module. This change resolves issue #3447 and is related to pull request #3455. The functionality has been manually tested.
  • AWS role issue with external locations pointing to the root of a storage account (#3510). The AWSResources class in the aws.py file has been updated to enhance the regular expression pattern for matching S3 bucket names, now including an optional group for trailing slashes and any subsequent characters. This allows for recognition of external locations pointing to the root of a storage account, addressing issue #3505. The access.py file within the AWS module has also been updated, introducing a new path variable and updating a for loop condition to accurately identify missing paths in external locations referencing the root of a storage account. New unit tests have been added to tests/unit/aws/test_access.py, including a test_uc_roles_create_all_roles method that checks the creation of all possible UC roles when none exist and external locations with and without folders. Additionally, the backend fixture has been updated to include a new external location s3://BUCKET4, and various tests have been updated to incorporate this location and handle errors appropriately.
  • Added assert to make sure installation is finished before re-installation (#3546). In this release, we have added an assertion to ensure that the installation process is completed before attempting to reinstall, addressing a previous issue where the reinstallation was starting before the first installation was finished, causing a warning to not be raised and resulting in a test failure. We have introduced a new function wait_for_installation_to_finish(), which retries loading the installation if it is not found, with a timeout of 2 minutes. This function is utilized in the test_compare_remote_local_install_versions test to ensure that the installation is finished before proceeding. Furthermore, we have extracted the warning message to a variable error_message for better readability. This change enhances the reliability of the installation process.
  • Added dashboards to migration progress dashboard (#3314). This commit introduces significant updates to the migration progress dashboard, adding dashboards, linting resources, and modifying existing components. The changes include a new dashboard displaying the number of dashboards pending migration, with the data sourced from the ucx_catalog.multiworkspace.objects_snapshot table. The existing 'Migration [main]' dashboard has been updated, and unit and integration tests have been adapted accordingly. The commit also renames several SQL files, updates the percentage UDF, grant, job, cluster, table, and pipeline migration progress queries, and resolves linting compatibility issues related to Unity Catalog. The changes depend on issue #3424, progress issue #3045, and break up issue #3112. The new dashboard aims to enhance the migration process and ensure a smooth transition to the Unity Catalog.
  • Added history log encoder for dashboards (#3424). A new history log encoder for dashboards has been added, addressing issues #3368 and #3369, and modifying the existing experimental-migration-progress workflow. This update includes the addition of the DashboardOwnership class, used to generate ownership information for dashboards, and the DashboardProgressEncoder class, responsible for encoding progress data related to dashboards. The new functionality is tested through manual, unit, and integration testing. In the Table class, the from_table_info and from_historical_data methods have been added, allowing for the creation of Table instances from TableInfo objects and historical data dictionaries with more flexibility and safety. The test_tables.py file in the integration/progress directory has also been updated to include a new test function for checking table failures. These changes improve the tracking and management of dashboard IDs, enhance user name retrieval, and ensure the accurate determination of object ownership.
  • Create specific failure for Python syntax error while parsing with Astroid (#3498). This commit enhances the Python linting functionality in our open-source library by introducing a specific failure message, python-parse-error, for syntax errors encountered during code parsing using Astroid. Previously, a generic system-error message was used, which has been renamed to maintain consistency with the existing sql-parse-error message. This change provides clearer failure indicators and includes more detailed information about the error location. Additionally, modifications to Python linting-related code, unit test additions, and updates to the README guide users on handling these new error types have been implemented. A new method, Tree.maybe_parse(), has been introduced to parse Python code and detect syntax errors, ensuring more precise error handling for users.
  • DBR 16 and later support (#3481). This pull request introduces support for Databricks Runtime (DBR) 16 and later in the code that converts Hive Metastore (HMS) tables to external tables within the migrate-tables workflow. The changes include the addition of a new static method _get_entity_storage_locations to handle the new entityStorageLocations property in DBR16 and the modification of the _convert_hms_table_to_external method to account for this property. Additionally, the run_workflow function in the assessment workflow now has the skip_job_wait parameter set to True, which allows the workflow to continue running even if a job within it fails. The changes have been manually tested for DBR16, verified in a staging environment, and existing integration tests have been run for DBR 15. The diff also includes updates to the test_table_migration_convert_manged_to_external method to skip job waiting during testing, enabling the test to run successfully on DBR 16.
  • Delete stale code: NotebookLinter._load_source_from_run_cell (#3529). In this update, we have removed the stale code NotebookLinter._load_source_from_run_cell, which was responsible for loading the source code from a run cell in a notebook. This change is a part of the ongoing effort to address issue #3514 and enhances the overall codebase. Additionally, we have modified the existing databricks labs ucx lint-local-code command to update the code linting functionality. We have conducted manual testing to ensure that the changes function as intended and have added and modified several unit tests. The _load_source_from_run_cell method is no longer needed, as it was part of a deprecated functionality. The modifications to the databricks labs ucx lint-local-code command impact the way code linting is performed, ultimately improving the efficiency and maintainability of the codebase.
  • Exclude ucx dashboards from Lakeview dashboard crawler (#3450). In this release, we have enhanced the lakeview_crawler method in the open-source library to exclude Ucx dashboards and prevent false positives. This has been achieved by adding a new optional argument, exclude_dashboard_ids, to the __init__ method, which takes a list of dashboard IDs to exclude from the crawler. The _crawl method has been updated to skip dashboards whose IDs match the ones in the exclude_dashboard_ids list. The change includes unit tests and manual testing to ensure proper functionality and has been verified on the staging environment. These updates improve the accuracy and reliability of the dashboard crawler, providing better results for software engineers utilizing this library.
  • Fixed issue in installing UCX on UC enabled workspace (#3501). This PR introduces changes to the ClusterPolicyInstaller class, updating the spark_version policy definition from a fixed value to an allowlist with a default value. This resolves an issue where, when UC is enabled on a workspace, the cluster definition takes on single_user and user_isolation values instead of Legacy_Single_User and 'Legacy_Table_ACL'. The job definition is also updated to use the default value when not explicitly provided. These changes improve compatibility with UC-enabled workspaces, ensuring the correct values for spark_version in the cluster definition. The PR includes updates to unit tests and installation tests, addressing issue #3420.
  • Fixed typo in workflow name (in error message) (#3491). This PR (Pull Request) addresses a minor typo in the error message displayed by the validate_groups_permissions method in the workflows.py file. The typo occurred in the workflow name mentioned in the error message, where group was incorrectly spelled as "groups." The corrected spelling is now validate-groups-permissions. This change does not introduce any new methods or modify any existing functionality, but instead focuses on enhancing the clarity and accuracy of the error message. Ensuring that error messages are free from typos and other inaccuracies is essential for maintaining the usability and effectiveness of the code, as it enables users to more easily troubleshoot any issues that may arise during its usage.
  • HMS Federation Glue Support (#3526). This commit introduces support for HMS Federation Glue in the open-source library, resolving issue #3011. The changes include adding a new command, migrate-glue-credentials, to migrate Glue credentials to UC storage credentials in the federation glue for HMS. The AWSResourcePermissions class has been updated to include a new parameter config for HMS Federation Glue configuration and the load_uc_compatible_roles method now accepts an optional resource_type parameter for filtering compatible roles based on the provided type. Additionally, the ExternalLocations class has been updated to handle S3 resource type when identifying missing external locations. The commit also includes several bug fixes, new classes, methods, and changes to the existing methods to handle AWS Glue resources, and updates to the integration tests. Overall, these changes add significant functionality for AWS Glue support in the HMS Federation Glue feature.
  • Make link to issue template url safe (#3508). In this release, we have updated the python_ast.py file to enhance the encoding of the link to the issue template for bug reports. By utilizing the urllib.parse.quote_plus() function from Python's standard library, we have ensured that any special characters in the provided source code will be properly encoded. This eliminates the risk of issues arising from incorrectly interpreted characters, enhancing the reliability of the bug reporting process. This change, initially introduced in issue #3498, has been thoroughly tested to guarantee its correct functioning. The rest of the file remains unaffected, preserving its original functionality.
  • Refactor PipelineMigrator's to add include_pipeline_ids (#3495). In this release, the PipelineMigrator class has been refactored to enhance pipeline migration functionality. The skip-pipeline-ids flag has been replaced with include-pipeline-ids, allowing users to specify a list of pipelines to migrate, rather than listing pipelines to skip. Additionally, the exclude_pipeline_ids functionality has been added to provide even more granularity in pipeline selection. The migrate_pipelines method now prioritizes include_pipeline_ids and exclude_pipeline_ids parameters to determine the list of pipelines for migration. The _migrate_pipeline method has been updated to accept a string pipeline ID and now checks if the pipeline has already been migrated. Several support methods, such as _clone_pipeline, have also been refactored for improved functionality. Although no new methods were added, the behavior of the migrate_pipelines method has changed. While unit tests have been updated to cover the changes, integration tests have not been modified yet. Ensure thorough testing to prevent any new issues or breaks in existing functionality.
  • Release v0.54.0 (#3530). 0.54.0 brings several enhancements and bug fixes to the UCX library. A query_statement_disposition option is added to the SQL backend to handle large SQL queries during assessment results export, preventing potential failures in large workspaces with high volumes of findings. AWS role compatibility checks are improved for external locations pointing to the root of a storage account. Dashboards are enhanced with added migration progress dashboards and a history log encoder. New failure types are introduced for Python syntax errors during parsing and SQL parsing errors. The library now supports DBR 16 and later versions, with optional conversion of Hive Metastore tables to external tables in the migrate-tables workflow. The PipelineMigrator functionality is refactored to add an include_pipeline_ids parameter for better control over the migration process. Various dependency updates, including databricks-labs-blueprint, databricks-sdk, and sqlglot, are included in this release, which bring new features, improvements, and bug fixes, as well as API changes. Please thoroughly test and review the changes to ensure seamless functionality.
  • Rename Python AST's Tree methods for clarity (#3524). In this release, we have made significant improvements to the clarity of the Python AST's Tree methods in the python_analyzer.py file. The append_ and extend_ methods have been renamed to attach_ to better reflect their functionality. These methods now always return None. New methods such as attach_child_tree, attach_nodes, and extend_globals have been introduced to enhance the functionality of the library. The attach_child_tree method allows for attaching one tree as a child of another tree, propagating module references and enabling traversal from both the parent and child trees. The attach_nodes method sets the parent of the attached nodes and adds them to the body of the tree. Additionally, docstrings have been added, and unit testing has been expanded. The changes include modifications to code linting, existing command functionalities, and manual testing to ensure compatibility. These enhancements improve the clarity, functionality, and flexibility of the Python AST's Tree methods.
  • Revert "Release v0.54.0" (#3569). In version 0.53.1, we have reverted changes from 0.54.0 to address issues with the previous release and ensure proper propagation to PyPI. This version includes various updates such as implementing a disposition field in the SQL backend, improving ARN pattern matching for AWS roles, adding dashboards to migration progress, enhancing Python linting functionality, and adding support for DBR 16 in converting Hive Metastore tables to external tables. We have also excluded UCX dashboards from the Lakeview dashboard crawler, refactored PipelineMigrator's to add include_pipeline_ids, and updated the sqlglot and databricks-labs-blueprint requirements. Additionally, several issues related to installation, typo in workflow name, and table-migration workflows have been fixed. The sqlglot requirement has been updated from <26.1,>=25.5.0 to >=25.5.0,<26.2, and databricks-labs-blueprint from <0.10,>=0.9.1 to >=0.9.1,<0.11. This release does not introduce any new methods or change existing functionality, but focuses on addressing bugs and improving functionality.
  • Schedule the migration progress workflow to run daily (#3485). This PR introduces a daily scheduling mechanism for the UCX installation's migration progress workflow, allowing it to run automatically once per day at 5 a.m. UTC. It includes refactoring the plumbing for managing and installing workflows, enabling them to have a Cron-based schedule. Relevant user documentation has been updated, and existing unit and integration tests have been added to ensure the changes function as intended. A new test has been added to verify the migration-progress workflow is installed with a schedule attached, checking the workflow schedule's quartz cron expression, time zone, and pause status, as well as confirming that the workflow is unpaused upon installation. The PR also introduces new methods to manage workflow scheduling and configure cron-based schedules.
  • Scope crawled pipelines in PipelineCrawler (#3513). In the latest release, we have introduced a new optional argument, 'include_pipeline_ids', in the constructor of the PipelinesCrawler class located in the 'databricks/labs/ucx/assessment' module. This argument allows users to filter pipelines based on a list of pipeline IDs, improving the crawler's flexibility and efficiency in processing pipelines. In the _crawl method of the PipelinesCrawler class, a new behavior has been implemented based on the value of 'include_pipeline_ids'. If the argument is not None, then the method uses the pipeline IDs from this list instead of retrieving all pipelines. Additionally, two unit tests have been added to verify the functionality of this new argument and ensure that the crawler handles cases where a pipeline is not found or its specification is missing. A new parameter, 'force_refresh', has also been added to the snapshot function. This release aims to provide a more efficient and customizable pipeline crawling experience for users.
  • Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 (#3519). In this update, the requirement for the databricks-labs-blueprint library has been changed from version range '<0.10,>=0.9.1>' to a new range of '>=0.9.1,<0.11'. This change allows for the use of the latest version of the library while maintaining compatibility with the current project setup, and is based on information from the library's releases and changelog. The commit includes a list of commits and dependencies for the updated library. This update was automatically implemented by Dependabot, a tool that handles dependency updates and conflict resolution, ensuring a seamless integration process for engineers adopting the project.
  • Updated databricks-sdk requirement from <0.41,>=0.40 to >=0.40,<0.42 (#3553). In this release, we have updated the databricks-sdk package requirement to permit version 0.41 while excluding version 0.42. This update includes several improvements and new features in version 0.41, such as the addition of the serving.http_request method for calling external functions and enhancements to the Files API client to recover from download failures. The commit also includes bug fixes, internal changes, and updates to the API for better functionality and compatibility. It is essential to note that these changes have been made to ensure compatibility with the latest features and improvements in the databricks-sdk package.
  • Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 (#3500). In this release, we have updated the version requirement for the sqlglot package. The minimum version required is now 25.5.0 and less than 26.2, previously it was 25.5.0 and less than 26.1. This change allows for the most recent version of sqlglot to be installed, while still maintaining compatibility with the current codebase. The update is necessary due to breaking changes introduced in version 26.1.0 of sqlglot, including normalizing before qualifying tables, requiring the AS token in CTEs for all dialects except spark and databricks, supporting Unicode in sqlite, mysql, tsql, postgres, and oracle, parsing ASCII into Unicode to facilitate transpilation, and improving transpilation of CHAR[ACTER]_LENGTH. Additionally, several bug fixes and new features have been added in this update.
  • Updated sqlglot requirement from <26.2,>=25.5.0 to >=25.5.0,<26.3 (#3528). In this release, we have updated the version constraint for the sqlglot dependency in our project's "pyproject.toml" file. The previous constraint allowed versions between 25.5.0 and 26.2, while the new constraint allows versions between 25.5.0 and 26.3. This change was made to ensure that we can use the latest version of sqlglot while also preventing the version from exceeding 26.3. Additionally, the commit includes detailed information about the specific commits and changes made in the updated version of sqlglot, providing valuable insights for software engineers working with this open-source library.
  • Updated table-migration workflows to also capture updated migration progress into the history log (#3239). The table-migration workflows have been updated to log not only the tables that still need to be migrated, but also the updated progress information into the history log, ensuring a more comprehensive record of migration progress. The affected workflows include migrate-tables, migrate-external-hiveserde-tables-in-place-experimental, migrate-external-tables-ctas, scan-tables-in-mounts-experimental, and migrate-tables-in-mounts-experimental. The encoder for table-history has been updated to prevent implicit refresh of TableMigrationStatus data during initialization. Additionally, the documentation has been updated to reflect which workflows update which tables. New and updated unit and integration tests, as well as manual testing, have been conducted to ensure the functionality of the changes.

Dependency updates:

  • Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 (#3500).
  • Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 (#3519).
  • Updated databricks-sdk requirement from <0.41,>=0.40 to >=0.40,<0.42 (#3553).

*  Implement disposition field in SQL backend ([#3477](#3477)). This commit adds a `query_statement_disposition` configuration option for the SQL backend in the UCX tool, allowing users to specify the disposition of SQL statements during assessment results export and preventing failures when dealing with large workspaces and a large number of findings. The new configuration option is added to the `config.yml` file and used by the `SqlBackend` definition. The `databricks labs install ucx` and `databricks labs ucx export-assessment` commands have been modified to support this new functionality. A new `Disposition` enum has been added to the `databricks.sdk.service.sql` module. This change resolves issue [#3447](#3447) and is related to pull request [#3455](#3455). The functionality has been manually tested.
* AWS role issue with external locations pointing to the root of a storage account ([#3510](#3510)). The `AWSResources` class in the `aws.py` file has been updated to enhance the regular expression pattern for matching S3 bucket names, now including an optional group for trailing slashes and any subsequent characters. This allows for recognition of external locations pointing to the root of a storage account, addressing issue [#3505](#3505). The `access.py` file within the AWS module has also been updated, introducing a new `path` variable and updating a for loop condition to accurately identify missing paths in external locations referencing the root of a storage account. New unit tests have been added to `tests/unit/aws/test_access.py`, including a `test_uc_roles_create_all_roles` method that checks the creation of all possible UC roles when none exist and external locations with and without folders. Additionally, the `backend` fixture has been updated to include a new external location `s3://BUCKET4`, and various tests have been updated to incorporate this location and handle errors appropriately.
* Added assert to make sure installation is finished before re-installation ([#3546](#3546)). In this release, we have added an assertion to ensure that the installation process is completed before attempting to reinstall, addressing a previous issue where the reinstallation was starting before the first installation was finished, causing a warning to not be raised and resulting in a test failure. We have introduced a new function `wait_for_installation_to_finish()`, which retries loading the installation if it is not found, with a timeout of 2 minutes. This function is utilized in the `test_compare_remote_local_install_versions` test to ensure that the installation is finished before proceeding. Furthermore, we have extracted the warning message to a variable `error_message` for better readability. This change enhances the reliability of the installation process.
* Added dashboards to migration progress dashboard ([#3314](#3314)). This commit introduces significant updates to the migration progress dashboard, adding dashboards, linting resources, and modifying existing components. The changes include a new dashboard displaying the number of dashboards pending migration, with the data sourced from the `ucx_catalog.multiworkspace.objects_snapshot` table. The existing 'Migration [main]' dashboard has been updated, and unit and integration tests have been adapted accordingly. The commit also renames several SQL files, updates the percentage UDF, grant, job, cluster, table, and pipeline migration progress queries, and resolves linting compatibility issues related to Unity Catalog. The changes depend on issue [#3424](#3424), progress issue [#3045](#3045), and break up issue [#3112](#3112). The new dashboard aims to enhance the migration process and ensure a smooth transition to the Unity Catalog.
* Added history log encoder for dashboards ([#3424](#3424)). A new history log encoder for dashboards has been added, addressing issues [#3368](#3368) and [#3369](#3369), and modifying the existing `experimental-migration-progress` workflow. This update includes the addition of the `DashboardOwnership` class, used to generate ownership information for dashboards, and the `DashboardProgressEncoder` class, responsible for encoding progress data related to dashboards. The new functionality is tested through manual, unit, and integration testing. In the `Table` class, the `from_table_info` and `from_historical_data` methods have been added, allowing for the creation of `Table` instances from `TableInfo` objects and historical data dictionaries with more flexibility and safety. The `test_tables.py` file in the `integration/progress` directory has also been updated to include a new test function for checking table failures. These changes improve the tracking and management of dashboard IDs, enhance user name retrieval, and ensure the accurate determination of object ownership.
* Create specific failure for Python syntax error while parsing with Astroid ([#3498](#3498)). This commit enhances the Python linting functionality in our open-source library by introducing a specific failure message, `python-parse-error`, for syntax errors encountered during code parsing using Astroid. Previously, a generic `system-error` message was used, which has been renamed to maintain consistency with the existing `sql-parse-error` message. This change provides clearer failure indicators and includes more detailed information about the error location. Additionally, modifications to Python linting-related code, unit test additions, and updates to the README guide users on handling these new error types have been implemented. A new method, `Tree.maybe_parse()`, has been introduced to parse Python code and detect syntax errors, ensuring more precise error handling for users.
* DBR 16 and later support ([#3481](#3481)). This pull request introduces support for Databricks Runtime (DBR) 16 and later in the code that converts Hive Metastore (HMS) tables to external tables within the `migrate-tables` workflow. The changes include the addition of a new static method `_get_entity_storage_locations` to handle the new `entityStorageLocations` property in DBR16 and the modification of the `_convert_hms_table_to_external` method to account for this property. Additionally, the `run_workflow` function in the `assessment` workflow now has the `skip_job_wait` parameter set to `True`, which allows the workflow to continue running even if a job within it fails. The changes have been manually tested for DBR16, verified in a staging environment, and existing integration tests have been run for DBR 15. The diff also includes updates to the `test_table_migration_convert_manged_to_external` method to skip job waiting during testing, enabling the test to run successfully on DBR 16.
* Delete stale code: `NotebookLinter._load_source_from_run_cell` ([#3529](#3529)). In this update, we have removed the stale code `NotebookLinter._load_source_from_run_cell`, which was responsible for loading the source code from a run cell in a notebook. This change is a part of the ongoing effort to address issue [#3514](#3514) and enhances the overall codebase. Additionally, we have modified the existing `databricks labs ucx lint-local-code` command to update the code linting functionality. We have conducted manual testing to ensure that the changes function as intended and have added and modified several unit tests. The `_load_source_from_run_cell` method is no longer needed, as it was part of a deprecated functionality. The modifications to the `databricks labs ucx lint-local-code` command impact the way code linting is performed, ultimately improving the efficiency and maintainability of the codebase.
* Exclude ucx dashboards from Lakeview dashboard crawler ([#3450](#3450)). In this release, we have enhanced the `lakeview_crawler` method in the open-source library to exclude Ucx dashboards and prevent false positives. This has been achieved by adding a new optional argument, `exclude_dashboard_ids`, to the `__init__` method, which takes a list of dashboard IDs to exclude from the crawler. The `_crawl` method has been updated to skip dashboards whose IDs match the ones in the `exclude_dashboard_ids` list. The change includes unit tests and manual testing to ensure proper functionality and has been verified on the staging environment. These updates improve the accuracy and reliability of the dashboard crawler, providing better results for software engineers utilizing this library.
* Fixed issue in installing UCX on UC enabled workspace ([#3501](#3501)). This PR introduces changes to the `ClusterPolicyInstaller` class, updating the `spark_version` policy definition from a fixed value to an allowlist with a default value. This resolves an issue where, when UC is enabled on a workspace, the cluster definition takes on `single_user` and `user_isolation` values instead of `Legacy_Single_User` and 'Legacy_Table_ACL'. The job definition is also updated to use the default value when not explicitly provided. These changes improve compatibility with UC-enabled workspaces, ensuring the correct values for `spark_version` in the cluster definition. The PR includes updates to unit tests and installation tests, addressing issue [#3420](#3420).
* Fixed typo in workflow name (in error message) ([#3491](#3491)). This PR (Pull Request) addresses a minor typo in the error message displayed by the `validate_groups_permissions` method in the `workflows.py` file. The typo occurred in the workflow name mentioned in the error message, where `group` was incorrectly spelled as "groups." The corrected spelling is now `validate-groups-permissions`. This change does not introduce any new methods or modify any existing functionality, but instead focuses on enhancing the clarity and accuracy of the error message. Ensuring that error messages are free from typos and other inaccuracies is essential for maintaining the usability and effectiveness of the code, as it enables users to more easily troubleshoot any issues that may arise during its usage.
* HMS Federation Glue Support ([#3526](#3526)). This commit introduces support for HMS Federation Glue in the open-source library, resolving issue [#3011](#3011). The changes include adding a new command, `migrate-glue-credentials`, to migrate Glue credentials to UC storage credentials in the federation glue for HMS. The `AWSResourcePermissions` class has been updated to include a new parameter `config` for HMS Federation Glue configuration and the `load_uc_compatible_roles` method now accepts an optional `resource_type` parameter for filtering compatible roles based on the provided type. Additionally, the `ExternalLocations` class has been updated to handle S3 resource type when identifying missing external locations. The commit also includes several bug fixes, new classes, methods, and changes to the existing methods to handle AWS Glue resources, and updates to the integration tests. Overall, these changes add significant functionality for AWS Glue support in the HMS Federation Glue feature.
* Make link to issue template url safe ([#3508](#3508)). In this release, we have updated the `python_ast.py` file to enhance the encoding of the link to the issue template for bug reports. By utilizing the `urllib.parse.quote_plus()` function from Python's standard library, we have ensured that any special characters in the provided source code will be properly encoded. This eliminates the risk of issues arising from incorrectly interpreted characters, enhancing the reliability of the bug reporting process. This change, initially introduced in issue [#3498](#3498), has been thoroughly tested to guarantee its correct functioning. The rest of the file remains unaffected, preserving its original functionality.
* Refactor `PipelineMigrator`'s to add `include_pipeline_ids` ([#3495](#3495)). In this release, the `PipelineMigrator` class has been refactored to enhance pipeline migration functionality. The `skip-pipeline-ids` flag has been replaced with `include-pipeline-ids`, allowing users to specify a list of pipelines to migrate, rather than listing pipelines to skip. Additionally, the `exclude_pipeline_ids` functionality has been added to provide even more granularity in pipeline selection. The `migrate_pipelines` method now prioritizes `include_pipeline_ids` and `exclude_pipeline_ids` parameters to determine the list of pipelines for migration. The `_migrate_pipeline` method has been updated to accept a string pipeline ID and now checks if the pipeline has already been migrated. Several support methods, such as `_clone_pipeline`, have also been refactored for improved functionality. Although no new methods were added, the behavior of the `migrate_pipelines` method has changed. While unit tests have been updated to cover the changes, integration tests have not been modified yet. Ensure thorough testing to prevent any new issues or breaks in existing functionality.
* Release v0.54.0 ([#3530](#3530)). 0.54.0 brings several enhancements and bug fixes to the UCX library. A `query_statement_disposition` option is added to the SQL backend to handle large SQL queries during assessment results export, preventing potential failures in large workspaces with high volumes of findings. AWS role compatibility checks are improved for external locations pointing to the root of a storage account. Dashboards are enhanced with added migration progress dashboards and a history log encoder. New failure types are introduced for Python syntax errors during parsing and SQL parsing errors. The library now supports DBR 16 and later versions, with optional conversion of Hive Metastore tables to external tables in the `migrate-tables` workflow. The `PipelineMigrator` functionality is refactored to add an `include_pipeline_ids` parameter for better control over the migration process. Various dependency updates, including `databricks-labs-blueprint`, `databricks-sdk`, and `sqlglot`, are included in this release, which bring new features, improvements, and bug fixes, as well as API changes. Please thoroughly test and review the changes to ensure seamless functionality.
* Rename Python AST's `Tree` methods for clarity ([#3524](#3524)). In this release, we have made significant improvements to the clarity of the Python AST's `Tree` methods in the `python_analyzer.py` file. The `append_` and `extend_` methods have been renamed to `attach_` to better reflect their functionality. These methods now always return `None`. New methods such as `attach_child_tree`, `attach_nodes`, and `extend_globals` have been introduced to enhance the functionality of the library. The `attach_child_tree` method allows for attaching one tree as a child of another tree, propagating module references and enabling traversal from both the parent and child trees. The `attach_nodes` method sets the parent of the attached nodes and adds them to the body of the tree. Additionally, docstrings have been added, and unit testing has been expanded. The changes include modifications to code linting, existing command functionalities, and manual testing to ensure compatibility. These enhancements improve the clarity, functionality, and flexibility of the Python AST's `Tree` methods.
* Revert "Release v0.54.0" ([#3569](#3569)). In version 0.53.1, we have reverted changes from 0.54.0 to address issues with the previous release and ensure proper propagation to PyPI. This version includes various updates such as implementing a disposition field in the SQL backend, improving ARN pattern matching for AWS roles, adding dashboards to migration progress, enhancing Python linting functionality, and adding support for DBR 16 in converting Hive Metastore tables to external tables. We have also excluded UCX dashboards from the Lakeview dashboard crawler, refactored PipelineMigrator's to add include_pipeline_ids, and updated the sqlglot and databricks-labs-blueprint requirements. Additionally, several issues related to installation, typo in workflow name, and table-migration workflows have been fixed. The sqlglot requirement has been updated from <26.1,>=25.5.0 to >=25.5.0,<26.2, and databricks-labs-blueprint from <0.10,>=0.9.1 to >=0.9.1,<0.11. This release does not introduce any new methods or change existing functionality, but focuses on addressing bugs and improving functionality.
* Schedule the migration progress workflow to run daily ([#3485](#3485)). This PR introduces a daily scheduling mechanism for the UCX installation's migration progress workflow, allowing it to run automatically once per day at 5 a.m. UTC. It includes refactoring the plumbing for managing and installing workflows, enabling them to have a Cron-based schedule. Relevant user documentation has been updated, and existing unit and integration tests have been added to ensure the changes function as intended. A new test has been added to verify the migration-progress workflow is installed with a schedule attached, checking the workflow schedule's quartz cron expression, time zone, and pause status, as well as confirming that the workflow is unpaused upon installation. The PR also introduces new methods to manage workflow scheduling and configure cron-based schedules.
* Scope crawled pipelines in PipelineCrawler ([#3513](#3513)). In the latest release, we have introduced a new optional argument, 'include_pipeline_ids', in the constructor of the PipelinesCrawler class located in the 'databricks/labs/ucx/assessment' module. This argument allows users to filter pipelines based on a list of pipeline IDs, improving the crawler's flexibility and efficiency in processing pipelines. In the `_crawl` method of the PipelinesCrawler class, a new behavior has been implemented based on the value of 'include_pipeline_ids'. If the argument is not None, then the method uses the pipeline IDs from this list instead of retrieving all pipelines. Additionally, two unit tests have been added to verify the functionality of this new argument and ensure that the crawler handles cases where a pipeline is not found or its specification is missing. A new parameter, 'force_refresh', has also been added to the `snapshot` function. This release aims to provide a more efficient and customizable pipeline crawling experience for users.
* Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 ([#3519](#3519)). In this update, the requirement for the `databricks-labs-blueprint` library has been changed from version range '<0.10,>=0.9.1>' to a new range of '>=0.9.1,<0.11'. This change allows for the use of the latest version of the library while maintaining compatibility with the current project setup, and is based on information from the library's releases and changelog. The commit includes a list of commits and dependencies for the updated library. This update was automatically implemented by Dependabot, a tool that handles dependency updates and conflict resolution, ensuring a seamless integration process for engineers adopting the project.
* Updated databricks-sdk requirement from <0.41,>=0.40 to >=0.40,<0.42 ([#3553](#3553)). In this release, we have updated the `databricks-sdk` package requirement to permit version 0.41 while excluding version 0.42. This update includes several improvements and new features in version 0.41, such as the addition of the `serving.http_request` method for calling external functions and enhancements to the Files API client to recover from download failures. The commit also includes bug fixes, internal changes, and updates to the API for better functionality and compatibility. It is essential to note that these changes have been made to ensure compatibility with the latest features and improvements in the `databricks-sdk` package.
* Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 ([#3500](#3500)). In this release, we have updated the version requirement for the sqlglot package. The minimum version required is now 25.5.0 and less than 26.2, previously it was 25.5.0 and less than 26.1. This change allows for the most recent version of sqlglot to be installed, while still maintaining compatibility with the current codebase. The update is necessary due to breaking changes introduced in version 26.1.0 of sqlglot, including normalizing before qualifying tables, requiring the `AS` token in CTEs for all dialects except spark and databricks, supporting Unicode in sqlite, mysql, tsql, postgres, and oracle, parsing ASCII into Unicode to facilitate transpilation, and improving transpilation of CHAR[ACTER]_LENGTH. Additionally, several bug fixes and new features have been added in this update.
* Updated sqlglot requirement from <26.2,>=25.5.0 to >=25.5.0,<26.3 ([#3528](#3528)). In this release, we have updated the version constraint for the `sqlglot` dependency in our project's "pyproject.toml" file. The previous constraint allowed versions between 25.5.0 and 26.2, while the new constraint allows versions between 25.5.0 and 26.3. This change was made to ensure that we can use the latest version of sqlglot while also preventing the version from exceeding 26.3. Additionally, the commit includes detailed information about the specific commits and changes made in the updated version of sqlglot, providing valuable insights for software engineers working with this open-source library.
* Updated table-migration workflows to also capture updated migration progress into the history log ([#3239](#3239)). The table-migration workflows have been updated to log not only the tables that still need to be migrated, but also the updated progress information into the history log, ensuring a more comprehensive record of migration progress. The affected workflows include `migrate-tables`, `migrate-external-hiveserde-tables-in-place-experimental`, `migrate-external-tables-ctas`, `scan-tables-in-mounts-experimental`, and `migrate-tables-in-mounts-experimental`. The encoder for table-history has been updated to prevent implicit refresh of `TableMigrationStatus` data during initialization. Additionally, the documentation has been updated to reflect which workflows update which tables. New and updated unit and integration tests, as well as manual testing, have been conducted to ensure the functionality of the changes.

Dependency updates:

 * Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 ([#3500](#3500)).
 * Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 ([#3519](#3519)).
 * Updated databricks-sdk requirement from <0.41,>=0.40 to >=0.40,<0.42 ([#3553](#3553)).
@gueniai gueniai requested a review from a team as a code owner January 23, 2025 22:11
@gueniai gueniai deployed to account-admin January 23, 2025 22:11 — with GitHub Actions Active
Copy link

✅ 2/2 passed, 17s total

Running from acceptance #8085

@gueniai gueniai merged commit ebe97e0 into main Jan 23, 2025
7 checks passed
@gueniai gueniai deleted the prepare/0.54.0 branch January 23, 2025 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant