-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v0.54.0 #3530
Merged
Merged
Release v0.54.0 #3530
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Implement disposition field in SQL backend ([#3477](#3477)). This commit introduces the `query_statement_disposition` configuration value to handle large SQL queries during assessment results export for workspaces with numerous findings. A new parameter is added to the `config.yml` file, allowing users to specify the disposition method for running large SQL statements. The modification includes changes to the `databricks labs install ucx` and `databricks labs ucx export-assessment` commands and updates to the SqlBackend definition. The `Disposition` enum is utilized to specify the disposition method in tests, which have been manually verified. This feature, developed by Michele Daddetta and Guenia Izquierdo Delgado, resolves issue [#3447](#3447) and is based on changes from PR [#3455](#3455). * AWS role issue with external locations pointing to the root of a storage account ([#3510](#3510)). In this release, the `AWSResources` class in `aws.py` has been updated to improve S3 bucket ARN pattern matching by modifying the regular expression pattern for matching. The `_identify_missing_paths` function in `access.py` has been enhanced to check for AWS role compatibility with external locations that point to the root of a storage account using `PurePath` class. Additionally, new unit tests have been added to `tests/unit/aws/test_access.py` to ensure the correct creation of all necessary UC roles, including the new external location `s3://BUCKET4` with an appropriate access level. These changes improve the accuracy of ARN pattern matching and enhance compatibility checking and testing for AWS roles and external locations. This release is part of the ongoing development of the AWS assessment tool and addresses issues [#3510](#3510) and [#3505](#3505). * Added dashboards to migration progress dashboard ([#3314](#3314)). This commit, co-authored by Guenia Izquierdo Delgado, modifies the migration progress dashboard to include linting resources, adds new dashboards, and improves overall functionality and maintainability. The changes include modifying the existing 'Migration [main]' dashboard and updating associated unit and integration tests. New dashboards such as `Dashboards migrated` and `Dashboard pending migration` provide valuable insights into the migration progress, displaying successful migrations and pending migration status by owner. The commit also reorganizes some existing queries and adds new methods to support the new functionality, addressing dependencies from issue [#3424](#3424) and progressing work on issue [#3045](#3045), while breaking up issue [#3112](#3112). * Added history log encoder for dashboards ([#3424](#3424)). This commit introduces a history log encoder for dashboards in the context of a larger application, addressing issues [#3368](#3368) and [#3369](#3369). The `experimental-migration-progress` workflow has been modified, and new classes, properties, and methods have been added to handle dashboard-related progress encoding. Specifically, the `Dashboard` class, `DashboardOwnership` class, and `DashboardProgressEncoder` class have been introduced, along with several methods for assessing dashboard ownership. These changes are tested through manual testing, unit tests, and integration tests. Additionally, the existing `TableProgressEncoder` class has been updated with new tests for failure scenarios involving tables that have not been migrated. The `WorkspacePathOwnership` method has been added to determine the owner of a given workspace path, and a new unit test has been added to test table creation from historical data. * Create specific failure for Python syntax error while parsing with Astroid ([#3498](#3498)). This commit enhances the Python linting functionality in our open-source library by introducing a specific failure message for syntax errors that occur during code parsing with Astroid. Previously, a generic `system-error` message was displayed, which provided limited guidance for users. Now, a new failure type called `python-parse-error` is displayed when a SyntaxError is raised during parsing, with detailed information such as the error message, line, and column numbers. This change aligns the failure type with `sql-parse-error` and adds a default GitHub issue template to report the error. Additionally, the commit renames `system-error` to `python-parse-error` to maintain consistency and updates the README to explain the new failure type. The commit also includes new unit tests to ensure that the new failure type is being handled correctly, and modifies the Python linting-related code to add a new method `Tree.maybe_parse()` to handle syntax errors. * DBR 16 and later support ([#3481](#3481)). This pull request introduces support for Databricks Runtime (DBR) 16 in the optional conversion of Hive Metastore (HMS) tables to external tables within the `migrate-tables` workflow. The update includes modifications to the existing `migrate-tables` workflow, such as the addition of a `_get_entity_storage_locations` method to check for the presence of the `entityStorageLocations` property in the table metadata, which is required for the `CatalogTable` constructor in DBR 16.0. The changes have been tested manually on DBR16, passed integration tests on DBR15, and verified on a staging environment using DBR16. Additionally, the `test_running_real_assessment_job` function in `test_workflows.py` has been updated to include the `skip_job_wait=True` parameter when running the `run_workflow` method for the `assessment` workflow, improving testing efficiency. The commit also includes a deprecated test case for converting managed tables to external before migrating, with a note about its failure from DBR 16.0 onwards due to a JDK update. The test case remains unchanged, but the note serves as a reminder for further investigation. The `run_workflow` function in the test cases has been modified to include a `skip_job_wait` parameter, allowing tests to bypass waiting for job completion, reducing overall test runtime and improving the developer experience. * Exclude ucx dashboards from Lakeview dashboard crawler ([#3450](#3450)). In this release, we have introduced modifications to the `assessment` workflow, specifically in the `dashboards.py` file, to exclude dashboards from the UCX package in the Lakeview dashboard crawler and prevent false positives. The `lakeview_crawler` method in the `application.py` file has been updated to include a new argument `exclude_dashboard_ids`, set to the list of dashboard IDs in the `install_state.dashboards` object. This ensures that these dashboards are excluded from the crawler. Additionally, two new unit tests have been added to ensure the exclusion functionality works correctly. The first test checks if the crawler skips the dashboard with the ID specified in the `exclude_dashboard_ids` parameter, and the second test ensures that the `exclude_dashboard_ids` parameter takes priority over the `include_dashboard_ids` parameter when both are provided. The changes have been manually tested and verified on the staging environment, and the linked issue [#3441](#3441) has been resolved. * Fixed issue in installing UCX on UC enabled workspace ([#3501](#3501)). In this release, we have updated the UCX policy definition for `spark_version` from a fixed value to an allowlist with a default value. This change resolves an issue where enabling UC on a workspace caused the cluster definition to take on `single_user` and `user_isolation` values instead of `Legacy_Single_User` and 'Legacy_Table_ACL'. The policy was found to be overriding these values, and changing `spark_version` from fixed to allowlist resolved the issue. Additionally, the job definition now uses the default value if no value is provided by setting `apply_policy_default_values` to true. This change resolves issue [#3420](#3420). No new methods have been added, and existing functionality has not been significantly altered. To test this change, updated unit tests, integration tests, and a static installation test should be performed. The code modification includes a new method called `test_job_cluster_on_uc_enabled_workspace` which tests the behavior of installation on a UC-enabled workspace, verifying that the correct data security modes are set for different job clusters. The changes in this release are backward compatible and do not affect existing functionality. The modification to the UCX policy ensures that the correct spark version and node type are selected, while also allowing for flexibility in data security modes. The updated tests provide confidence in the correct behavior of the installation process on both standard and UC-enabled workspaces. * Fixed typo in workflow name (in error message) ([#3491](#3491)). This PR fixes a minor typo in an error message that appears when group permissions fail to migrate successfully. The typo, found in the name of the workflow for validating permissions, has been corrected from `validate-group-permissions` to "validate-groups-permissions". This change enhances the user experience by providing clearer instructions for addressing issues with group permissions during migration. No new methods have been introduced, and existing functionality has been modified solely for the correction of the typo. The change does not impact any other parts of the codebase. This project is geared towards software engineers who seek to utilize its features. * Refactor `PipelineMigrator`'s to add `include_pipeline_ids` ([#3495](#3495)). In this release, the `PipelineMigrator` class in the `pipelines_migrate.py` file has been refactored to enhance the pipeline migration process. The refactor introduces a new parameter `include_pipeline_ids`, which allows users to specify a list of pipelines to migrate. Previously, users could only skip pipelines that were already migrated or explicitly specified using the `skip_pipeline_ids` parameter. With this refactor, users now have more control over the migration process by being able to explicitly include and exclude pipelines using the `include_pipeline_ids` and `exclude_pipeline_ids` parameters, respectively. Additionally, the implementation of the `PipelineMigrator` class has been simplified, and unit tests and integration tests have been updated to reflect these changes. As a software engineer, it is important to thoroughly test and validate this new behavior to ensure compatibility with existing systems. * Schedule the migration progress workflow to run daily ([#3485](#3485)). This PR introduces a daily schedule for the UCX installation's migration progress workflow, refactoring workflow management/installation plumbing to enable Cron-based scheduling and setting the default schedule for the migration progress workflow to run at 5 a.m. UTC. Relevant user documentation has been updated, and the existing `migration-progress-experimental` workflow has been modified. New test methods have been added to check for the presence of workflows and tasks, as well as validate the workflow's schedule and pause status. These changes improve automation and maintainability of the UCX installation process, while ensuring that existing functionalities are working correctly. * Scope crawled pipelines in PipelineCrawler ([#3513](#3513)). In this release, the `PipelineCrawler` class in the `databricks/labs/ucx/assessment` directory has been updated with a new optional argument `include_pipeline_ids` in the constructor. This argument is a list of strings that represent the IDs of pipelines to be crawled. If not provided, all pipelines will be crawled. The `_crawl` method has been modified to accept a list of pipeline IDs and now obtains a list of pipeline IDs instead of pipeline objects. For each pipeline ID, the method tries to get the pipeline and extract its configuration, while also checking for any failures. Additionally, assertions have been added to ensure that the `pipeline_id` and `spec.configuration` attributes are not `None`. A new test function `test_include_pipeline_ids()` has been introduced to verify the functionality of this argument. These changes improve the functionality of the `PipelineCrawler` class by allowing users to crawl specific pipelines based on their IDs. * Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 ([#3519](#3519)). In this update, the requirement for the `databricks-labs-blueprint` package has been updated to a version greater than or equal to 0.9.1 and strictly less than 0.11, previously it was greater than or equal to 0.9.1 and strictly less than 0.10. This change allows the latest version of the package to be used. Additionally, the commit includes release notes, a changelog, and commit information for the updated package, as well as instructions for Dependabot commands and options. The changes are limited to the `pyproject.toml` file and do not have any impact on other parts of the codebase. * Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 ([#3500](#3500)). In this pull request, we have updated the version requirement for the `sqlglot` dependency in the 'pyproject.toml' file. The previous version constraint was for a version greater than or equal to 25.5.0 and less than 26.1, but it has been relaxed to permit versions greater than or equal to 25.5.0 and less than 26.2. This change was made to enable the use of the latest version of 'sqlglot', which includes several new features, bug fixes, and breaking changes as detailed in the 26.1.0 changelog. We have also included the commit history for the `sqlglot` repository to provide further context and reference. This update aims to ensure compatibility with the latest version of `sqlglot` while also providing transparency regarding the changes implemented. * Updated sqlglot requirement from <26.2,>=25.5.0 to >=25.5.0,<26.3 ([#3528](#3528)). In this release, we have updated the required version constraint of the `sqlglot` library in the `pyproject.toml` file. The previous constraint `>=25.5.0,<26.2` has been updated to `>=25.5.0,<26.3`. This change allows the project to utilize the latest version of `sqlglot` within the newly specified range while maintaining compatibility with the project's existing requirements. Notably, this update does not introduce any new methods to the project; it only affects the version constraint for the `sqlglot` library. Software engineers integrating this project can now benefit from the latest `sqlglot` versions within the specified range. * Updated table-migration workflows to also capture updated migration progress into the history log ([#3239](#3239)). This pull request enhances the table-migration workflows by logging updated migration progress in the history log, providing improved visibility into the migration process. The workflows, including `migrate-tables`, `migrate-external-hiveserde-tables-in-place-experimental`, `migrate-external-tables-ctas`, `scan-tables-in-mounts-experimental`, and `migrate-tables-in-mounts-experimental`, have been updated to include this new logging functionality. In addition to these changes, the documentation has been updated to reflect which workflows update which tables, and the `TableMigrationStatus` data initialization behavior has been modified. New and updated unit and integration tests have been manually tested to ensure the changes are functioning correctly. Co-authored by Serge Smertin and Cor Zuurmond. Dependency updates: * Updated sqlglot requirement from <26.1,>=25.5.0 to >=25.5.0,<26.2 ([#3500](#3500)). * Updated databricks-labs-blueprint requirement from <0.10,>=0.9.1 to >=0.9.1,<0.11 ([#3519](#3519)).
✅ 2/2 passed, 19s total Running from acceptance #8075 |
JCZuurmond
had a problem deploying
to
account-admin
January 16, 2025 16:37 — with
GitHub Actions
Failure
To be more sure, we can wait for CI to pass: https://github.com/databrickslabs/ucx/actions/runs/12813437903. But is a bit redundant |
JCZuurmond
approved these changes
Jan 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 🔥 🔥
JCZuurmond
had a problem deploying
to
account-admin
January 16, 2025 18:50 — with
GitHub Actions
Failure
JCZuurmond
had a problem deploying
to
account-admin
January 17, 2025 14:56 — with
GitHub Actions
Failure
1 task
gueniai
had a problem deploying
to
account-admin
January 21, 2025 21:33 — with
GitHub Actions
Failure
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
query_statement_disposition
configuration option for the SQL backend used in thedatabricks labs ucx
command-line interface. This option allows users to choose the disposition method for running large SQL queries during assessment results export, preventing failures in cases of large workspaces with high volumes of findings. The new option is included in theconfig.yml
file and used in the SqlBackend definition. The commit also includes updates to theworkspace_cli.py
file and addresses issue #3447. Thedisposition
parameter has been added to theStatementExecutionBackend
method, and theDisposition
enum from thedatabricks.sdk.service.sql
module has been added to theconfig.py
file. The changes have been manually tested and are included in the modifieddatabricks labs install ucx
anddatabricks labs ucx export-assessment
commands.aws.py
file in thesrc/databricks/labs/ucx/assessment/
directory has been updated to improve S3 bucket ARN pattern matching, now allowing optional trailing slashes for greater flexibility. In theaccess.py
file within theaws
directory of thedatabricks/labs/ucx
package, the_identify_missing_paths
method now checks if therole.resource_path
is a parent of the external location path or if they match exactly, allowing root-level external locations to be recognized as compatible with AWS roles. A new method,AWSUCRoleCandidate
, has been added to theAWSResources
class, and several test cases have been updated or added to ensure proper functionality with UC roles and AWS resources, including handling cases with multiple role creations.wait_for_installation_to_finish
has been introduced to manage the waiting process. Furthermore, we have updated thetest_compare_remote_local_install_versions
function to acceptinstallation_ctx
instead ofws
as a parameter, ensuring proper configuration and loading of the installation before test execution. These changes guarantee that the test will pass if the installation is finished before the reinstallation is attempted.experimental-migration-progress
workflow. This enhancement introduces aDashboardProgressEncoder
class that encodes Dashboard objects into Historical records, appending inventory snapshots to the history table. The changes include adding new methods for handling object types such as directories, and updating theis_delta
property of theTable
class. The commit also includes new tests: manually tested, unit tests added, and integration tests added. Specifically,test_table_progress_encoder_table_failures
has been updated to include a new parameter,is_migrated_table
, which, if set to False, addsPending migration
to the list of failures. Theis_used_table
parameter has been removed, and its functionality is no longer part of this commit. The changes are tested through manual, unit, and integration testing, ensuring the proper encoding of migration progress and identifying relevant failures.system-error
message, but with this change, a new failure type calledpython-parse-error
has been introduced. This new error type includes the start and end line and column numbers of the error and is accompanied by a new issue URL for reporting the error on the UCX GitHub. Thesystem-error
failure type has been renamed topython-parse-error
to maintain consistency with thesql-parse-error
failure type. Additionally, a new methodTree.maybe_parse()
has been introduced to improve error detection and reporting during Python linting. A unit test has been added to ensure the new failure type is working as intended, and a generic failure is kept for directing users to create GitHub issues for surfacing other issues.migrate-tables
workflow. The change includes a new static method_get_entity_storage_locations
to check for the presence of theentityStorageLocations
property on table metadata. The existing_convert_hms_table_to_external
method has been updated to use this new method and to include theentityStorageLocations
constructor argument if present. The changes have been manually tested for DBR 16, tested with existing integration tests for DBR 15, and verified on the staging environment with DBR 16. Additionally, theskip_job_wait=True
parameter has been added to specific test function calls to improve test execution time. This release also resolves an issue with a failed test in DBR16 due to a JDK update.NotebookLinter._load_source_from_run_cell
(#3529). In this release, we have improved the code linting functionality in the NotebookLinter class of our open-source library by removing the_load_source_from_run_cell
method in the sources.py file. This method, previously used to load source code from run cells in a notebook, has been identified as stale code and is no longer required. Consequently, this change affects thedatabricks labs ucx lint-local-code
command and results in cleaner and more maintainable code. Furthermore, updated and added unit tests have been included in this commit, which have been manually tested to ensure that the changes do not adversely impact existing functionality, thus progressing issue #3514.assessment
workflow has been improved to exclude certain dashboard IDs from the Lakeview dashboard crawler. This change has been made to address the issue of false positive dashboards and affects the_crawl
method in thedashboards.py
file. The excluded dashboard IDs are now obtained from theinstall_state.dashboards
object. Additionally, new methods have been added to thetest_dashboards.py
file in theunit/assessment
directory to test the exclusion functionality, including a test to ensure that the exclude parameter takes priority over the include parameter. The commit also includes unit tests, manual tests, and screenshots to verify the changes on the staging environment. Overall, this modification enhances the accuracy of the dashboard crawler and simplifies the process of identifying and assessing relevant dashboards.spark_version
parameter fromfixed
toallowlist
with a default value, allowing the cluster definition to takesingle_user
anduser_isolation
values instead ofLegacy_Single_User
and 'Legacy_Table_ACL'. Additionally, the job definition has been updated to use the default value when not explicitly provided. The changes are implemented in thetest_policy.py
file and impact thetest_job_cluster_policy
andtest_job_cluster_on_uc_enabled_workspace
methods. The pull request also includes updates to unit tests and integration tests to ensure the correct behavior of the updated UCX policy and job definition. The target audience is software engineers adopting this project, with changes involving adjusting policy definitions and testing job cluster behavior under different configurations. Issue #3501 is also resolved with these changes.validate_groups_permissions
method in theworkflows.py
file. The typo resulted in the incorrect spelling ofgroup
asgroups
in the workflow name. The fix simply changesgroups
togroup
in the error message, ensuring accurate workflow name display. The functionality of the code remains unaffected by this change, and no new methods have been added. To clarify, thevalidate_groups_permissions
method verifies whether group permissions have been migrated correctly, and if not, raises a ValueError with an error message suggesting the use of thevalidate-group-permissions
workflow for validation after the API has caught up. This fix resolves the typo issue and maintains the expected behavior of the code._definitely_failure
function in thepython_ast.py
file has been modified to make the link to the issue template URL safe using Python'surllib
. This change ensures that any special characters in the source code passed to the function will be properly displayed in the issue template. If the source code cannot be parsed, the function creates a link to the issue template for reporting a bug in the UCX library, including the source code as part of the issue body. With this commit, the source code is now passed through theurllib.parse.quote_plus
function before being added to the issue body, making it url-safe and improving the robustness and user-friendliness of the library. This change has been introduced in issue #3498 and has been manually tested.PipelineMigrator
's to addinclude_pipeline_ids
(#3495). In this refactoring, thePipelineMigrator
has been updated to introduce aninclude_pipeline_ids
option, replacing the previousskip_pipeline_ids
flag. This change allows users to specify the list of pipelines to migrate, providing better control over the migration process. ThePipelinesMigrator
constructor,_get_pipelines_to_migrate
, andmigrate_pipelines
methods have been modified to accommodate this new flag. The_migrate_pipeline
method now accepts the pipeline ID instead of aPipelineInfo
object. Additionally, the unit tests have been updated to include the newinclude_flag
parameter, which facilitates testing various scenarios with different pipeline lists. Although the commit does not show changes to test files, integration tests should be updated to reflect the newinclude-pipeline-ids
flag functionality. This improvement resolves issue #3492 and enhances the overall flexibility of thePipelineMigrator
.Tree
methods for clarity (#3524). In this release, theTree
class in the Python AST library has been updated for improved code clarity and functionality. Theappend_
methods have been renamed toattach_
for better accuracy, and now include docstrings for increased understanding. These methods have been updated to always returnNone
. A new method,attach_child_tree
, has been added, allowing for traversal from both parent and child and propagating any module references. Several new methods and functionalities have been introduced to improve the class, while extensive unit testing has been conducted to ensure functionality. Additionally, the diff includes test cases for various functionalities, such as inferring values when attaching trees and verifying spark module propagation, as well as tests to ensure that certain operations are not supported. This change, linked to issues #3514 and #3520, may affect any code that calls these methods and relies on their return values. However, the added docstrings and unit tests will help ensure your code continues to function correctly.migration-progress-experimental
workflow has been modified. Additionally, unit and integration tests have been added/modified to ensure the proper functioning of the updated code, and new functions have been added to verify the workflow's schedule and task detection.PipelineCrawler
class in thepipelines.py
file has been updated to include a new optional argumentinclude_pipeline_ids
in its constructor. This argument allows users to filter the pipelines that are crawled by specifying a list of pipeline IDs. The_crawl
method has been modified to check ifinclude_pipeline_ids
is notNone
and to filter the list of pipelines accordingly. The class now also checks if each pipeline exists before getting its configuration, and logs a warning message if the pipeline is not found. Previously, aNotFound
exception was raised. Additionally, the code has been updated to usepipeline.spec.configuration
instead ofpipeline_response.spec.configuration
to get the pipeline configuration. These changes have been tested through new and updated unit tests, including a test for handling creators' user names. Overall, these updates provide improved functionality and flexibility for crawling pipelines.databricks-labs-blueprint
package to be greater than or equal to 0.9.1 and less than 0.11. This change allows us to use the latest version of the package and includes bug fixes and dependency updates. The hosted runner has been patched in version 0.10.1 to address issues with publishing artifacts in the release workflow. Release notes for previous versions are also provided in the commit. These updates are intended to improve the overall functionality and stability of the library.databricks-sdk
package requirement has been updated to version 0.41.0, which brings new features, improvements, bug fixes, and API changes. Among the new features are the addition of 'serving.http_request' for calling external functions, and recovery on download failures in the Files API client. Although the specifics of the functionality added and changed are not detailed, the focus of this release appears to be on bug fixes and internal enhancements. Additionally, the API has undergone changes, including added and altered methods and fields, however, specific information about these changes has not been provided in the release notes.sqlglot
package, which has been updated to version 25.5.0 or higher, but less than 26.2. This change is essential to leverage the latest version of sqlglot while avoiding any breaking changes introduced in version 26.1. The new version includes several breaking changes, new features, bug fixes, and modifications to various dialects such as hive, postgres, tsql, and sqlite. Moreover, the tokenizer has been updated to accept underscore-separated number literals. However, the specific impact of these changes on the project is not detailed in the commit message, and software engineers should thoroughly test and review the changes to ensure seamless functionality.sqlglot
dependency from>=25.5.0,<26.2
to>=25.5.0,<26.3
in thepyproject.toml
file. Sqlglot is a Python-based SQL parser and optimizer, and this change allows us to adopt the latest version of sqlglot within the specified version range. This update addresses potential security vulnerabilities and incorporates performance enhancements and bug fixes, ensuring that our library remains up-to-date and secure.migrate-tables
,migrate-external-hiveserde-tables-in-place-experimental
,migrate-external-tables-ctas
,scan-tables-in-mounts-experimental
, andmigrate-tables-in-mounts-experimental
. The encoder for table-history has been refactored to improve control over when theTableMigrationStatus
data is refreshed. The documentation has been updated to reflect the changes in each workflow. Additionally, both unit and integration tests have been added and updated to ensure the changes work as intended and resolve any conflicts. A newProgressTrackingInstallation
class has been added to support this functionality. The changes have been manually tested and include modifications to the existing workflows, new methods, and a renamed method. Themock_workspace_client
function has been replaced, and theexternal_locations.resolve_mount
method and other methods have not been called. TheTablesCrawler
object'ssnapshot
method has been called once to retrieve the list of tables in the Hive metastore. The migration record workflow run is also updated to include the workflow run information in theworkflow_runs
table. These changes are expected to improve the accuracy and reliability of the table-migration workflows.Dependency updates: