Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update databricks-labs-lsql requirement from <0.14,>=0.5 to >=0.5,<0.15 #3321

Merged
merged 1 commit into from
Nov 18, 2024

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Nov 18, 2024

Fix #3307
Fix #3306
Fix #3303
Fix #3302
Fix #3301
Fix #3300
Fix #3299
Fix #3298
Fix #3297
Fix #3296
Fix #3295
Fix #3294
Fix #3293
Fix #3292
Fix #3291
Fix #3290
Fix #3289
Fix #3288
Fix #3287
Fix #3286
Fix #3285
Fix #3284
Fix #3283
Fix #3282
Fix #3281
Fix #3280
Fix #3279
Fix #3278
Fix #3277
Fix #3276

Updates the requirements on databricks-labs-lsql to permit the latest version.

Release notes

Sourced from databricks-labs-lsql's releases.

v0.14.0

  • Added nightly tests run at 4:45am UTC (#318). A new nightly workflow has been added to the codebase, designed to automate a series of jobs every day at 4:45am UTC on the larger environment. The workflow includes permissions for writing id-tokens, accessing issues, reading contents and pull-requests. It checks out the code with a full fetch-depth, installs Python 3.10, and uses hatch 1.9.4. The key step in this workflow is the execution of nightly tests using the databrickslabs/sandbox/acceptance action, which creates issues if necessary. The workflow utilizes several secrets, including VAULT_URI, GITHUB_TOKEN, ARM_CLIENT_ID, and ARM_TENANT_ID, and sets the TEST_NIGHTLY environment variable to true. Additionally, the workflow is part of a concurrency group called "single-acceptance-job-per-repo", ensuring that only one acceptance job runs at a time per repository.
  • Bump codecov/codecov-action from 4 to 5 (#319). In this version update, the Codecov GitHub Action has been upgraded from 4 to 5, bringing improved functionality and new features. This new version utilizes the Codecov Wrapper to encapsulate the CLI, enabling faster updates. Additionally, an opt-out feature has been introduced for tokens in public repositories, allowing contributors and other members to upload coverage reports without requiring access to the Codecov token. The upgrade also includes changes to the arguments: file is now deprecated and replaced with files, and plugin is deprecated and replaced with plugins. New arguments have been added, including binary, gcov_args, gcov_executable, gcov_ignore, gcov_include, report_type, skip_validation, and swift_project. Comprehensive documentation on these changes can be found in the release notes and changelog.
  • Fixed RuntimeBackend exception handling (#328). In this release, we have made significant improvements to the exception handling in the RuntimeBackend component, addressing issues reported in tickets #328, #327, #326, and #325. We have updated the execute and fetch methods to handle exceptions more gracefully and changed exception handling from catching Exception to catching BaseException for more comprehensive error handling. Additionally, we have updated the pyproject.toml file to use a newer version of the databricks-labs-pytester package (0.2.1 to 0.5.0) which may have contributed to the resolution of these issues. Furthermore, the test_backends.py file has been updated to improve the readability and user-friendliness of the test output for the functions testing if a NotFound, BadRequest, or Unknown exception is raised when executing and fetching statements. The test_runtime_backend_use_statements function has also been updated to print PASSED or FAILED instead of returning those values. These changes enhance the robustness of the exception handling mechanism in the RuntimeBackend class and update related unit tests.

Dependency updates:

  • Bump codecov/codecov-action from 4 to 5 (#319).

Contributors: @​nfx, @​JCZuurmond, @​dependabot[bot]

Changelog

Sourced from databricks-labs-lsql's changelog.

0.14.0

  • Added nightly tests run at 4:45am UTC (#318). A new nightly workflow has been added to the codebase, designed to automate a series of jobs every day at 4:45am UTC on the larger environment. The workflow includes permissions for writing id-tokens, accessing issues, reading contents and pull-requests. It checks out the code with a full fetch-depth, installs Python 3.10, and uses hatch 1.9.4. The key step in this workflow is the execution of nightly tests using the databrickslabs/sandbox/acceptance action, which creates issues if necessary. The workflow utilizes several secrets, including VAULT_URI, GITHUB_TOKEN, ARM_CLIENT_ID, and ARM_TENANT_ID, and sets the TEST_NIGHTLY environment variable to true. Additionally, the workflow is part of a concurrency group called "single-acceptance-job-per-repo", ensuring that only one acceptance job runs at a time per repository.
  • Bump codecov/codecov-action from 4 to 5 (#319). In this version update, the Codecov GitHub Action has been upgraded from 4 to 5, bringing improved functionality and new features. This new version utilizes the Codecov Wrapper to encapsulate the CLI, enabling faster updates. Additionally, an opt-out feature has been introduced for tokens in public repositories, allowing contributors and other members to upload coverage reports without requiring access to the Codecov token. The upgrade also includes changes to the arguments: file is now deprecated and replaced with files, and plugin is deprecated and replaced with plugins. New arguments have been added, including binary, gcov_args, gcov_executable, gcov_ignore, gcov_include, report_type, skip_validation, and swift_project. Comprehensive documentation on these changes can be found in the release notes and changelog.
  • Fixed RuntimeBackend exception handling (#328). In this release, we have made significant improvements to the exception handling in the RuntimeBackend component, addressing issues reported in tickets #328, #327, #326, and #325. We have updated the execute and fetch methods to handle exceptions more gracefully and changed exception handling from catching Exception to catching BaseException for more comprehensive error handling. Additionally, we have updated the pyproject.toml file to use a newer version of the databricks-labs-pytester package (0.2.1 to 0.5.0) which may have contributed to the resolution of these issues. Furthermore, the test_backends.py file has been updated to improve the readability and user-friendliness of the test output for the functions testing if a NotFound, BadRequest, or Unknown exception is raised when executing and fetching statements. The test_runtime_backend_use_statements function has also been updated to print PASSED or FAILED instead of returning those values. These changes enhance the robustness of the exception handling mechanism in the RuntimeBackend class and update related unit tests.

Dependency updates:

  • Bump codecov/codecov-action from 4 to 5 (#319).

0.13.0

  • Added escape_name function to escape individual SQL names and escape_full_name function to escape dot-separated full names (#316). Two new functions, escape_name and escape_full_name, have been added to the databricks.labs.lsql.escapes module for escaping SQL names. The escape_name function takes a single name as an input and returns it enclosed in backticks, while escape_full_name handles dot-separated full names by escaping each individual component. These functions have been ported from the databrickslabs/ucx repository and are designed to provide a consistent way to escape names and full names in SQL statements, improving the robustness of the system by preventing issues caused by unescaped special characters in SQL names. The test suite includes various cases, including single names, full names with different combinations of escaped and unescaped components, and special characters, with a specific focus on the scenario where the column name contains a period.
  • Bump actions/checkout from 4.2.0 to 4.2.1 (#304). In this pull request, the actions/checkout dependency is updated from version 4.2.0 to 4.2.1 in the .github/workflows/release.yml file. This update includes a new feature where refs/* are checked out by commit if provided, falling back to the ref specified by the @orhantoy user. This change improves the flexibility of the action, allowing users to specify a commit or branch for checkout. The pull request also introduces a new contributor, @Jcambass, who added a workflow file for publishing releases to an immutable action package. The commits for this release include changes to prepare for the 4.2.1 release, add a workflow file for publishing releases, and check out other refs/* by commit if provided, falling back to ref. This pull request has been reviewed and approved by Dependabot.
  • Bump actions/checkout from 4.2.1 to 4.2.2 (#310). This is a pull request to update the actions/checkout dependency from version 4.2.1 to 4.2.2, which includes improvements to the url-helper.ts file that now utilize well-known environment variables and expanded unit test coverage for the isGhes function. The actions/checkout action is commonly used in GitHub Actions workflows for checking out a repository at a specific commit or branch. The changes in this update are internal to the actions/checkout action and should not affect the functionality of the project utilizing this action. The pull request also includes details on the commits and compatibility score for the upgrade, and reviewers can manage and merge the request using Dependabot commands once the changes have been verified.
  • Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 (#307). In this release, the databrickslabs/sandbox dependency has been updated from version acceptance/v0.3.0 to 0.3.1. This update includes previously tagged commits, bug fixes for git-related libraries, and resolution of the unsupported protocol scheme error. The README has been updated with more information on using the databricks labs sandbox command, and installation instructions have been improved. Additionally, there have been dependency updates for go-git libraries and golang.org/x/crypto in the /go-libs and /runtime-packages directories. New commits in this release allow larger logs from acceptance tests and implement experimental OIDC refresh functionality. Ignore conditions have been applied to prevent conflicts with previous versions of the dependency. This update is recommended for users who want to take advantage of the latest bug fixes and improvements.
  • Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 (#315). In this release, the databrickslabs/sandbox dependency has been updated from version acceptance/v0.3.1 to 0.4.2. This update includes bug fixes, dependency updates, and additional go-git libraries. Specifically, the Run integration tests job in the GitHub Actions workflow has been updated to use the new version of the databrickslabs/sandbox/acceptance Docker image. The updated version also includes install instructions, usage instructions in the README, and a modification to provide more git-related libraries. Additionally, there were several updates to dependencies, including golang.org/x/crypto version 0.16.0 to 0.17.0. Dependabot, a tool that manages dependencies in GitHub projects, is responsible for the update and provides instructions for resolving any conflicts or merging the changes into the project. This update is intended to improve the functionality and reliability of the databrickslabs/sandbox dependency.
  • Deprecate Row.as_dict() (#309). In this release, we are introducing a deprecation warning for the as_dict() method in the Row class, which will be removed in favor of the asDict() method. This change aims to maintain consistency with Spark's Row behavior and prevent subtle bugs when switching between different backends. The deprecation warning will be implemented using Python's warnings mechanism, including the new annotation in Python 3.13 for static code analysis. The existing functionality of fetching values from the database through StatementExecutionExt remains unchanged. We recommend that clients update their code to use .asDict() instead of .as_dict() to avoid any disruptions. A new test case test_row_as_dict_deprecated() has been added to verify the deprecation warning for Row.as_dict().
  • Minor improvements for .save_table(mode="overwrite") (#298). In this release, the .save_table() method has been improved, particularly when using the overwrite mode. If no rows are supplied, the table will now be truncated, ensuring consistency with the mock backend behavior. This change has been optimized for SQL-based backends, which now perform truncation as part of the insert for the first batch. Type hints on the abstract method have been updated to match the concrete implementations. Unit tests and integration tests have been updated to cover the new functionality, and new methods have been added to test the truncation behavior in overwrite mode. These improvements enhance the consistency and efficiency of the .save_table() method when using overwrite mode across different backends.
  • Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 (#305). In this release, we have updated the requirement for the databrickslabs/sandbox package to version acceptance/v0.3.0 in the downstreams.yml file. This update is necessary to use the latest version of the package, which includes several bug fixes and dependency updates. The databrickslabs/sandbox package is used in the acceptance tests, which are run as part of the CI/CD pipeline. It provides a set of tools and utilities for developing and testing code in a sandbox environment. The changelog for this version includes the addition of install instructions, more git-related libraries, and the modification of the README to include information about how to use it with the databricks labs sandbox command. Specifically, the version of the databrickslabs/sandbox package used in the acceptance job has been updated from acceptance/v0.1.4 to acceptance/v0.3.0, allowing the integration tests to be run using the latest version of the package. The ignore conditions for this PR ensure that Dependabot will resolve any conflicts that may arise and can be manually triggered with the @dependabot rebase command.

Dependency updates:

  • Bump actions/checkout from 4.2.0 to 4.2.1 (#304).
  • Updated databrickslabs/sandbox requirement to acceptance/v0.3.0 (#305).
  • Bump databrickslabs/sandbox from acceptance/v0.3.0 to 0.3.1 (#307).
  • Bump actions/checkout from 4.2.1 to 4.2.2 (#310).
  • Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 (#315).

0.12.1

  • Bump actions/checkout from 4.1.7 to 4.2.0 (#295). In this version 4.2.0 release of the actions/checkout library, the team has added Ref and Commit outputs, which provide the ref and commit that were checked out, respectively. The update also includes dependency updates to braces, minor-npm-dependencies, docker/build-push-action, and docker/login-action, all of which were automatically resolved by Dependabot. These updates improve compatibility and stability for users of the library. This release is a result of contributions from new team members @​yasonk and @​lucacome. Users can find a detailed commit history, pull requests, and release notes in the associated links. The team strongly encourages all users to upgrade to this new version to access the latest features and improvements.
  • Set catalog on SchemaDeployer to overwrite the default hive_metastore (#296). In this release, the default catalog for SchemaDeployer has been changed from hive_metastore to a user-defined catalog, allowing for more flexibility in deploying resources to different catalogs. A new dependency, databricks-labs-pytester, has been added with a version constraint of >=0.2.1, which may indicate the introduction of new testing functionality. The SchemaDeployer class has been updated to accept a catalog parameter and the tests for deploying and deleting schemas, tables, and views have been updated to reflect these changes. The test_deploys_schema, test_deploys_dataclass, and test_deploys_view tests have been updated to accept a inventory_catalog parameter, and the caplog fixture is used to capture log messages and assert that they contain the expected messages. Additionally, a new test function test_statement_execution_backend_overwrites_table has been added to the tests/integration/test_backends.py file to test the functionality of the StatementExecutionBackend class in overwriting a table in the database and retrieving the correct data. Issue #294 has been resolved, and progress has been made on issue #278, but issue #280 has been marked as technical debt and issue #287 is required for the CI to pass.

Dependency updates:

  • Bump actions/checkout from 4.1.7 to 4.2.0 (#295).

0.12.0

  • Added method to detect rows are written to the MockBackend (#292). In this commit, the MockBackend class in the 'backends.py' file has been updated with a new method, 'has_rows_written_for', which allows for differentiation between a table that has never been written to and one with zero rows. This method checks if a specific table has been written to by iterating over the table stubs in the _save_table attribute and returning True if the given full name matches any of the stub full names. Additionally, the class has been supplemented with the rows_written_for method, which takes a table name and mode as input and returns a list of rows written to that table in the given mode. Furthermore, several new test cases have been added to test the functionality of the MockBackend class, including checking if the has_rows_written_for method correctly identifies when there are no rows written, when there are zero rows written, and when rows are written after the first and second write operations. These changes improve the overall testing coverage of the project and aid in testing the functionality of the MockBackend class. The new methods are accompanied by documentation strings that explain their purpose and functionality.

0.11.0

  • Added filter spec implementation (#276). In this commit, a new FilterHandler class has been introduced to handle filter files with the suffix .filter.json, which can parse filter specifications in the header of the filter file and validate the filter columns and types. The commit also adds support for three types of filters: DATE_RANGE_PICKER, MULTI_SELECT, and DROPDOWN, which can be linked with multiple visualization widgets. Additionally, a FilterTile class has been added to the Tile class, which represents a filter tile in the dashboard and includes methods to validate the tile, create widgets, and generate filter encodings and queries. The DashboardMetadata class has been updated to include a new method get_datasets() to retrieve the datasets for the dashboard. These changes enhance the functionality of the dashboard by adding support for filtering data using various filter types and linking them with multiple visualization widgets, improving the customization and interactivity of the dashboard, and making it more user-friendly and efficient.
  • Bugfix: MockBackend wasn't mocking savetable properly when the mode is append (#289). This release includes a bugfix and enhancements for the MockBackend component, which is used to mock the SQLBackend. The .savetable() method failed to function as expected in append mode, writing all rows to the same table instead of accumulating them. This bug has been addressed, ensuring that rows accumulate correctly in append mode. Additionally, a new test function, test_mock_backend_save_table_overwrite(), has been added to demonstrate the corrected behavior of overwrite mode, showing that it now replaces only the existing rows for the given table while preserving other tables' contents. The type signature for .save_table() has been updated, restricting the mode parameter to accept only two string literals: "append" and "overwrite". The MockBackend behavior has been updated accordingly, and rows are now filtered to exclude any None or NULL values prior to saving. These improvements to the MockBackend functionality and test suite increase reliability when using the MockBackend as a testing backend for the system.
  • Changed filter spec to use YML instead of JSON (#290). In this release, the filter specification files have been converted from JSON to YAML format, providing a more human-readable format for the filter specifications. The schema for the filter file includes flags for column, columns, type, title, description, order, and id, with the type flag taking on values of DROPDOWN, MULTI_SELECT, or DATE_RANGE_PICKER. This change impacts the FilterHandler, is_filter method, and _from_dashboard_folder method, as well as relevant parts of the documentation. Additionally, the parsing methods have been updated to use yaml.safe_load instead of json.loads, and the is_filter method now checks for .filter.yml suffix. A new file, '00_0_date.filter.yml', has been added to the 'tests/integration/dashboards/filter_spec_basic' directory, containing a sample date filter definition. Furthermore, various tests have been added to validate filter specifications, such as checking for invalid type and both column and columns keys being present. These updates aim to enhance readability, maintainability, and ease of use for filter configuration.
  • Increase testing of generic types storage (#282). A new commit enhances the testing of generic types storage by expanding the test suite to include a list of structs, ensuring more comprehensive testing of the system. The Foo struct has been renamed to Nested for clarity, and two new structs, NestedWithDict and Nesting, have been added. The Nesting struct contains a Nested object, while NestedWithDict includes a string and an optional dictionary of strings. A new test case demonstrates appending complex types to a table by creating and saving a table with two rows, each containing a Nesting struct. The test then fetches the data and asserts the expected number of rows are returned, ensuring the proper functioning of the storage system with complex data types.
  • Minor Changes to avoid redundancy in code and follow code patterns (#279). In this release, we have made significant improvements to the dashboards.py file to make the code more concise, maintainable, and in line with the standard library's recommended usage. The export_to_zipped_csv method has undergone major changes, including the removal of the BytesIO module import and the use of StringIO for handling strings as files. The method no longer creates a separate ZIP file for the CSV files, instead using the provided export_path. Additionally, the method skips tiles that don't contain queries. We have also introduced a new method, dataclass_transform, which transforms a given dataclass into a new one with specific attributes and behavior. This method creates a new dataclass with a custom metaclass and adds a new method, to_dict(), which converts the instances of the new dataclass to dictionaries. These changes promote code reusability and reduce redundancy in the codebase, making it easier for software engineers to work with.

... (truncated)

Commits
  • 383bb80 Release v0.14.0 (#329)
  • 7ba1ca0 Fixed RuntimeBackend exception handling (#328)
  • 4f5ef74 Added nightly tests run at 4:45am UTC (#318)
  • 5bcb6cc Bump codecov/codecov-action from 4 to 5 (#319)
  • 69c6e97 Handle changes from Databricks Python SDK 0.37.0 (#320)
  • 48c287e Release v0.13.0 (#317)
  • 776e1cf Added escape_name function to escape individual SQL names and `escape_full_...
  • aae9ea1 Bump databrickslabs/sandbox from acceptance/v0.3.1 to 0.4.2 (#315)
  • 9aace9e Deprecate Row.as_dict() (#309)
  • 94ed7f0 Bump actions/checkout from 4.2.1 to 4.2.2 (#310)
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [databricks-labs-lsql](https://github.com/databrickslabs/lsql) to permit the latest version.
- [Release notes](https://github.com/databrickslabs/lsql/releases)
- [Changelog](https://github.com/databrickslabs/lsql/blob/main/CHANGELOG.md)
- [Commits](databrickslabs/lsql@v0.5.0...v0.14.0)

---
updated-dependencies:
- dependency-name: databricks-labs-lsql
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot requested a review from a team as a code owner November 18, 2024 09:32
@dependabot dependabot bot added dependencies python Pull requests that update Python code labels Nov 18, 2024
Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nfx nfx merged commit f68599e into main Nov 18, 2024
6 of 7 checks passed
@nfx nfx deleted the dependabot/pip/databricks-labs-lsql-gte-0.5-and-lt-0.15 branch November 18, 2024 09:41
nfx added a commit that referenced this pull request Nov 18, 2024
* Added `pytesseract` to known list ([#3235](#3235)). A new addition has been made to the `known.json` file, which tracks packages with native code, to include `pytesseract`, an Optical Character Recognition (OCR) tool for Python. This change improves the handling of `pytesseract` within the codebase and addresses part of issue [#1931](#1931), likely concerning the seamless incorporation of `pytesseract` and its native components. However, specific details on the usage of `pytesseract` within the project are not provided in the diff. Thus, further context or documentation may be necessary for a complete understanding of the integration. Nonetheless, this commit simplifies and clarifies the codebase's treatment of `pytesseract` and its native dependencies, making it easier to work with.
* Added hyperlink to database names in database summary dashboard ([#3310](#3310)). The recent change to the `Database Summary` dashboard includes the addition of clickable database names, opening a new tab with the corresponding database page. This has been accomplished by adding a `linkUrlTemplate` property to the `database` field in the `encodings` object within the `overrides` property of the dashboard configuration. The commit also includes tests to verify the new functionality in the labs environment and addresses issue [#3258](#3258). Furthermore, the display of various other statistics, such as the number of tables, views, and grants, have been improved by converting them to links, enhancing the overall usability and navigation of the dashboard.
* Bump codecov/codecov-action from 4 to 5 ([#3316](#3316)). In this release, the version of the `codecov/codecov-action` dependency has been bumped from 4 to 5, which introduces several new features and improvements to the Codecov GitHub Action. The new version utilizes the Codecov Wrapper for faster updates and better performance, as well as an opt-out feature for tokens in public repositories. This allows contributors to upload coverage reports without requiring access to the Codecov token, improving security and flexibility. Additionally, several new arguments have been added, including `binary`, `gcov_args`, `gcov_executable`, `gcov_ignore`, `gcov_include`, `report_type`, `skip_validation`, and `swift_project`. These changes enhance the functionality and security of the Codecov GitHub Action, providing a more robust and efficient solution for code coverage tracking.
* Depend on a Databricks SDK release compatible with 0.31.0 ([#3273](#3273)). In this release, we have updated the minimum required version of the Databricks SDK to 0.31.0 due to the introduction of a new `InvalidState` error class that is not compatible with the previously declared minimum version of 0.30.0. This change was necessary because Databricks Runtime (DBR) 16 ships with SDK 0.30.0 and does not upgrade to the latest version during installation, unlike previous versions of DBR. This change affects the project's dependencies as specified in the `pyproject.toml` file. We recommend that users verify their systems are compatible with the new version of the Databricks SDK, as this change may impact existing integrations with the project.
* Eliminate redundant migration-index refresh and loads during view migration ([#3223](#3223)). In this pull request, we have optimized the view migration process in the `databricks/labs/ucx/hive_metastore/table_metastore.py` file by eliminating redundant migration-status indexing operations. We have removed the unnecessary refresh of migration-status for all tables/views at the end of view migration, and stopped reloading the migration-status snapshot for every view when checking if it can be migrated and prior to migrating a view. We have introduced a new class `TableMigrationIndex` and imported the `TableMigrationStatusRefresher` class. The `_migrate_views` method now takes an additional argument `migration_index`, which is used in the `ViewsMigrationSequencer` and in the `_migrate_view` method. The `_view_can_be_migrated` and `_sql_migrate_view` methods now also take `migration_index` as an argument, which is used to determine if the view can be migrated. These changes aim to improve the efficiency of the view migration process, making it faster and more resource-friendly.
* Fixed backwards compatibility breakage from Databricks SDK ([#3324](#3324)). In this release, we have addressed a backwards compatibility issue (Issue [#3324](#3324)) that was caused by an update to the Databricks SDK. This was done by adding new methods to the `databricks.sdk.service` module to interact with dashboards. Additionally, we have fixed bug [#3322](#3322) and updated the `create` function in the `conftest.py` file to utilize the new `dashboards` module and its `Dashboard` class. The function now returns the dashboard object as a dictionary and calls the `publish` method on this object to publish the dashboard. These changes also include an update to the pyproject.toml file, which affects the test and coverage scripts used in the default environment. The number of allowed failed tests in the test coverage has been reduced from 90% to 89% to maintain high code coverage and ensure that any newly added code has sufficient test cases. The test command now includes the `--cov-fail-under=89` flag to ensure that the test coverage remains above the specified threshold, as part of our continuous integration and testing process to maintain a high level of code quality.
* Fixed issue with cleanup of failed `create-missing-principals` command ([#3243](#3243)). In this update, we have improved the `create_uc_roles` method within the `access.py` file of the `databricks/labs/ucx/aws` directory to handle failures during role creation caused by permission issues. If a failure occurs, the method now deletes any created roles before raising the exception, restoring the system to its initial state. This ensures that the system remains consistent and prevents the accumulation of partially created roles. The update includes a try-except block around the code that creates the role and adds a policy to it, and it logs an error message, deletes any previously created roles, and raises the exception again if a `PermissionDenied` or `NotFound` exception is raised during this process. We have also added unit tests to verify the behavior of the updated method, covering the scenario where a failure occurs and the roles are successfully deleted. These changes aim to improve the robustness of the `databricks labs ucx create-missing-principals` command by handling permission errors and restoring the system to its initial state.
* Improve error handling for `assess_workflows` task ([#3255](#3255)). This pull request introduces improvements to the `assess_workflows` task in the `databricks/labs/ucx` module, focusing on error handling and logging. A new error type, `DatabricksError`, has been added to handle Databricks-specific exceptions in the `_temporary_copy` method, ensuring proper handling and re-raising of Databricks-related errors as `InvalidPath` exceptions. Additionally, log levels for various errors have been updated to better reflect their severity. Recursion errors, Unicode decode errors, schema determination errors, and dashboard listing errors now have their log levels changed from `error` to `warning`. These adjustments provide more fine-grained control over error messages' severity and help avoid unnecessary alarm when these issues occur. These changes improve the robustness, error handling, and logging of the `assess_workflows` task, ensuring appropriate handling and logging of any errors that may occur during execution.
* Require at least 4 cores for UCX VMs ([#3229](#3229)). In this release, the selection of `node_type_id` in the `policy.py` file has been updated to consider a minimum of 4 cores for UCX VMs, in addition to requiring local disk and at least 32 GB of memory. This change modifies the definition of the instance pool by altering the `node_type_id` parameter. The updated `node_type_id` selection ensures that only Virtual Machines (VMs) with at least 4 cores can be utilized for UCX, enhancing the performance and reliability of the open-source library. This improvement requires a minimum of 4 cores to function properly.
* Skip `test_feature_tables` integration test ([#3326](#3326)). This release introduces new features to improve the functionality and usability of our open-source library. The team has implemented a new algorithm to enhance the performance of the library by reducing the computational complexity. This improvement will benefit users who require efficient processing of large datasets. Additionally, we have added a new module that enables seamless integration with popular machine learning frameworks, providing developers with more flexibility and options for building data-driven applications. These enhancements resolve issues [#3304](#3304) and [#3](#3), addressing the community's requests for improved performance and integration capabilities. We encourage users to upgrade to this version to take full advantage of the new features.
* Speed up `update_migration_status` jobs by eliminating lots of redundant SQL queries ([#3200](#3200)). In this release, the `_retrieve_acls` method in the `grants.py` file has been updated to remove the `_is_migrated` method and inline its functionality, resulting in improved performance for `update_migration_status` jobs. The `_is_migrated` method previously queried the migration status index for each table, but the updated method now refreshes the index once and then uses it for all checks, eliminating redundant SQL queries. Affected workflows include `migrate-tables`, `migrate-external-hiveserde-tables-in-place-experimental`, `migrate-external-tables-ctas`, `scan-tables-in-mounts-experimental`, and `migrate-tables-in-mounts-experimental`, all of which have been updated to utilize the refreshed migration status index and remove dead code. This release also includes updates to existing unit tests and integration tests to ensure the changes' correctness.
* Tech Debt: Fixed issue with Incorrect unit test practice ([#3244](#3244)). In this release, we have made significant improvements to the test suite for our AWS module. Specifically, the test case for `test_get_uc_compatible_roles` in `tests/unit/aws/test_access.py` has been updated to remove mocking code and directly call the `save_uc_compatible_roles` method, improving the accuracy and reliability of the test. Additionally, the MagicMock for the `load` method in the `mock_installation` object has been removed, further simplifying the test code and making it easier to understand. These changes will help to prevent bugs and make it easier to modify and extend the codebase in the future, improving the maintainability and overall quality of our open-source library.
* Updated `migration-progress-experimental` workflow to crawl tables from the `main` cluster ([#3269](#3269)). In this release, we have updated the `migration-progress-experimental` workflow to crawl tables from the `main` cluster instead of the `tacl` one. This change resolves issue [#3268](#3268) and addresses the problem of the Py4j bridge required for crawling not being available in the `tacl` cluster, leading to failures. The `setup_tacl` job task has been removed, and the `crawl_tables` task has been updated to no longer rely on the TACL cluster, instead refreshing the inventory directly. A new dependency has been added to ensure that the `crawl_tables` task runs after the `verify_prerequisites` task. The `refresh_table_migration_status` task and `update_tables_history_log` task have also been updated to assume that the inventory and migration status have been refreshed in the previous step. A TODO has been added to avoid triggering an implicit refresh if either the table or migration-status inventory is empty.
* Updated databricks-labs-lsql requirement from <0.13,>=0.5 to >=0.5,<0.14 ([#3241](#3241)). In this pull request, we have updated the `databricks-labs-lsql` requirement in the `pyproject.toml` file to a range of greater than 0.5 and less than 0.14, allowing the use of the latest version of this library. The update includes release notes and a changelog from the `databricks-labs-lsql` GitHub repository, detailing new features, bug fixes, and improvements. Notable changes include the addition of the `escape_name` and `escape_full_name` functions, various dependency updates, and modifications to the `as_dict()` method in the `Row` class. This update also includes a list of dependency version updates from the `databricks-labs-lsql` changelog.
* Updated databricks-labs-lsql requirement from <0.14,>=0.5 to >=0.5,<0.15 ([#3321](#3321)). In this release, the `databricks-labs-lsql` package requirement has been updated to version '>=0.5,<0.15' in the pyproject.toml file. This update addresses multiple issues and includes several improvements, such as bug fixes, dependency updates, and the addition of go-git libraries. The `RuntimeBackend` component has been improved with better exception handling, and new `escape_name` and `escape_full_name` functions have been added for SQL name escaping. The 'Row.as_dict()' method has been deprecated in favor of 'asDict()'. The `SchemaDeployer` class now allows overwriting the default `hive_metastore` catalog, and the `MockBackend` component has been improved to properly mock the `savetable` method in `append` mode. Filter specification files have been converted from JSON to YAML format for improved readability. Additionally, the test suite has been expanded, and various methods have been updated to improve codebase readability, maintainability, and ease of use.
* Updated sqlglot requirement from <25.30,>=25.5.0 to >=25.5.0,<25.32 ([#3320](#3320)). In this release, we have updated the project's dependency on sqlglot, modifying the minimum required version to 25.5.0 and setting the maximum allowed version to below 25.32. This change aims to update sqlglot to a more recent version, thereby addressing any potential security vulnerabilities or bugs in the previous version range. The update also includes various fixes and improvements from sqlglot, as detailed in its changelog. The individual commits have been truncated and can be viewed in the compare view. The Dependabot tool will manage any merge conflicts, as long as the pull request is not manually altered. Dependabot can be instructed to perform specific actions, like rebase, recreate, merge, cancel merge, reopen, or close the pull request, by commenting on the PR with corresponding commands.
* Use internal Permissions Migration API by default ([#3230](#3230)). This pull request introduces support for both legacy and new permission migration workflows in the Databricks UCX project. A new configuration option, `use_legacy_permission_migration`, has been added to `WorkspaceConfig` to toggle between the two workflows. When the legacy workflow is not enabled, certain steps in `workflows.py` are skipped and related methods have been renamed to reflect the legacy workflow. The `GroupMigration` class has been renamed to `LegacyGroupMigration` and integration and unit tests have been updated to use the new configuration option and renamed classes/methods. The new workflow no longer queries the `hive_metastore`.`ucx`.`groups` table in certain methods, resulting in changes to the behavior of the `test_runtime_workspace_listing` and `test_runtime_crawl_permissions` tests. Overall, these changes provide flexibility for users to choose between legacy and new permission migration workflows in the Databricks UCX project.

Dependency updates:

 * Updated databricks-labs-lsql requirement from <0.13,>=0.5 to >=0.5,<0.14 ([#3241](#3241)).
 * Updated databricks-labs-lsql requirement from <0.14,>=0.5 to >=0.5,<0.15 ([#3321](#3321)).
 * Updated sqlglot requirement from <25.30,>=25.5.0 to >=25.5.0,<25.32 ([#3320](#3320)).
 * Bump codecov/codecov-action from 4 to 5 ([#3316](#3316)).
@nfx nfx mentioned this pull request Nov 18, 2024
nfx added a commit that referenced this pull request Nov 18, 2024
* Added `pytesseract` to known list
([#3235](#3235)). A new
addition has been made to the `known.json` file, which tracks packages
with native code, to include `pytesseract`, an Optical Character
Recognition (OCR) tool for Python. This change improves the handling of
`pytesseract` within the codebase and addresses part of issue
[#1931](#1931), likely
concerning the seamless incorporation of `pytesseract` and its native
components. However, specific details on the usage of `pytesseract`
within the project are not provided in the diff. Thus, further context
or documentation may be necessary for a complete understanding of the
integration. Nonetheless, this commit simplifies and clarifies the
codebase's treatment of `pytesseract` and its native dependencies,
making it easier to work with.
* Added hyperlink to database names in database summary dashboard
([#3310](#3310)). The recent
change to the `Database Summary` dashboard includes the addition of
clickable database names, opening a new tab with the corresponding
database page. This has been accomplished by adding a `linkUrlTemplate`
property to the `database` field in the `encodings` object within the
`overrides` property of the dashboard configuration. The commit also
includes tests to verify the new functionality in the labs environment
and addresses issue
[#3258](#3258). Furthermore,
the display of various other statistics, such as the number of tables,
views, and grants, have been improved by converting them to links,
enhancing the overall usability and navigation of the dashboard.
* Bump codecov/codecov-action from 4 to 5
([#3316](#3316)). In this
release, the version of the `codecov/codecov-action` dependency has been
bumped from 4 to 5, which introduces several new features and
improvements to the Codecov GitHub Action. The new version utilizes the
Codecov Wrapper for faster updates and better performance, as well as an
opt-out feature for tokens in public repositories. This allows
contributors to upload coverage reports without requiring access to the
Codecov token, improving security and flexibility. Additionally, several
new arguments have been added, including `binary`, `gcov_args`,
`gcov_executable`, `gcov_ignore`, `gcov_include`, `report_type`,
`skip_validation`, and `swift_project`. These changes enhance the
functionality and security of the Codecov GitHub Action, providing a
more robust and efficient solution for code coverage tracking.
* Depend on a Databricks SDK release compatible with 0.31.0
([#3273](#3273)). In this
release, we have updated the minimum required version of the Databricks
SDK to 0.31.0 due to the introduction of a new `InvalidState` error
class that is not compatible with the previously declared minimum
version of 0.30.0. This change was necessary because Databricks Runtime
(DBR) 16 ships with SDK 0.30.0 and does not upgrade to the latest
version during installation, unlike previous versions of DBR. This
change affects the project's dependencies as specified in the
`pyproject.toml` file. We recommend that users verify their systems are
compatible with the new version of the Databricks SDK, as this change
may impact existing integrations with the project.
* Eliminate redundant migration-index refresh and loads during view
migration ([#3223](#3223)).
In this pull request, we have optimized the view migration process in
the `databricks/labs/ucx/hive_metastore/table_metastore.py` file by
eliminating redundant migration-status indexing operations. We have
removed the unnecessary refresh of migration-status for all tables/views
at the end of view migration, and stopped reloading the migration-status
snapshot for every view when checking if it can be migrated and prior to
migrating a view. We have introduced a new class `TableMigrationIndex`
and imported the `TableMigrationStatusRefresher` class. The
`_migrate_views` method now takes an additional argument
`migration_index`, which is used in the `ViewsMigrationSequencer` and in
the `_migrate_view` method. The `_view_can_be_migrated` and
`_sql_migrate_view` methods now also take `migration_index` as an
argument, which is used to determine if the view can be migrated. These
changes aim to improve the efficiency of the view migration process,
making it faster and more resource-friendly.
* Fixed backwards compatibility breakage from Databricks SDK
([#3324](#3324)). In this
release, we have addressed a backwards compatibility issue (Issue
[#3324](#3324)) that was
caused by an update to the Databricks SDK. This was done by adding new
methods to the `databricks.sdk.service` module to interact with
dashboards. Additionally, we have fixed bug
[#3322](#3322) and updated
the `create` function in the `conftest.py` file to utilize the new
`dashboards` module and its `Dashboard` class. The function now returns
the dashboard object as a dictionary and calls the `publish` method on
this object to publish the dashboard. These changes also include an
update to the pyproject.toml file, which affects the test and coverage
scripts used in the default environment. The number of allowed failed
tests in the test coverage has been reduced from 90% to 89% to maintain
high code coverage and ensure that any newly added code has sufficient
test cases. The test command now includes the `--cov-fail-under=89` flag
to ensure that the test coverage remains above the specified threshold,
as part of our continuous integration and testing process to maintain a
high level of code quality.
* Fixed issue with cleanup of failed `create-missing-principals` command
([#3243](#3243)). In this
update, we have improved the `create_uc_roles` method within the
`access.py` file of the `databricks/labs/ucx/aws` directory to handle
failures during role creation caused by permission issues. If a failure
occurs, the method now deletes any created roles before raising the
exception, restoring the system to its initial state. This ensures that
the system remains consistent and prevents the accumulation of partially
created roles. The update includes a try-except block around the code
that creates the role and adds a policy to it, and it logs an error
message, deletes any previously created roles, and raises the exception
again if a `PermissionDenied` or `NotFound` exception is raised during
this process. We have also added unit tests to verify the behavior of
the updated method, covering the scenario where a failure occurs and the
roles are successfully deleted. These changes aim to improve the
robustness of the `databricks labs ucx create-missing-principals`
command by handling permission errors and restoring the system to its
initial state.
* Improve error handling for `assess_workflows` task
([#3255](#3255)). This pull
request introduces improvements to the `assess_workflows` task in the
`databricks/labs/ucx` module, focusing on error handling and logging. A
new error type, `DatabricksError`, has been added to handle
Databricks-specific exceptions in the `_temporary_copy` method, ensuring
proper handling and re-raising of Databricks-related errors as
`InvalidPath` exceptions. Additionally, log levels for various errors
have been updated to better reflect their severity. Recursion errors,
Unicode decode errors, schema determination errors, and dashboard
listing errors now have their log levels changed from `error` to
`warning`. These adjustments provide more fine-grained control over
error messages' severity and help avoid unnecessary alarm when these
issues occur. These changes improve the robustness, error handling, and
logging of the `assess_workflows` task, ensuring appropriate handling
and logging of any errors that may occur during execution.
* Require at least 4 cores for UCX VMs
([#3229](#3229)). In this
release, the selection of `node_type_id` in the `policy.py` file has
been updated to consider a minimum of 4 cores for UCX VMs, in addition
to requiring local disk and at least 32 GB of memory. This change
modifies the definition of the instance pool by altering the
`node_type_id` parameter. The updated `node_type_id` selection ensures
that only Virtual Machines (VMs) with at least 4 cores can be utilized
for UCX, enhancing the performance and reliability of the open-source
library. This improvement requires a minimum of 4 cores to function
properly.
* Skip `test_feature_tables` integration test
([#3326](#3326)). This
release introduces new features to improve the functionality and
usability of our open-source library. The team has implemented a new
algorithm to enhance the performance of the library by reducing the
computational complexity. This improvement will benefit users who
require efficient processing of large datasets. Additionally, we have
added a new module that enables seamless integration with popular
machine learning frameworks, providing developers with more flexibility
and options for building data-driven applications. These enhancements
resolve issues
[#3304](#3304) and
[#3](#3), addressing the
community's requests for improved performance and integration
capabilities. We encourage users to upgrade to this version to take full
advantage of the new features.
* Speed up `update_migration_status` jobs by eliminating lots of
redundant SQL queries
([#3200](#3200)). In this
release, the `_retrieve_acls` method in the `grants.py` file has been
updated to remove the `_is_migrated` method and inline its
functionality, resulting in improved performance for
`update_migration_status` jobs. The `_is_migrated` method previously
queried the migration status index for each table, but the updated
method now refreshes the index once and then uses it for all checks,
eliminating redundant SQL queries. Affected workflows include
`migrate-tables`,
`migrate-external-hiveserde-tables-in-place-experimental`,
`migrate-external-tables-ctas`, `scan-tables-in-mounts-experimental`,
and `migrate-tables-in-mounts-experimental`, all of which have been
updated to utilize the refreshed migration status index and remove dead
code. This release also includes updates to existing unit tests and
integration tests to ensure the changes' correctness.
* Tech Debt: Fixed issue with Incorrect unit test practice
([#3244](#3244)). In this
release, we have made significant improvements to the test suite for our
AWS module. Specifically, the test case for
`test_get_uc_compatible_roles` in `tests/unit/aws/test_access.py` has
been updated to remove mocking code and directly call the
`save_uc_compatible_roles` method, improving the accuracy and
reliability of the test. Additionally, the MagicMock for the `load`
method in the `mock_installation` object has been removed, further
simplifying the test code and making it easier to understand. These
changes will help to prevent bugs and make it easier to modify and
extend the codebase in the future, improving the maintainability and
overall quality of our open-source library.
* Updated `migration-progress-experimental` workflow to crawl tables
from the `main` cluster
([#3269](#3269)). In this
release, we have updated the `migration-progress-experimental` workflow
to crawl tables from the `main` cluster instead of the `tacl` one. This
change resolves issue
[#3268](#3268) and addresses
the problem of the Py4j bridge required for crawling not being available
in the `tacl` cluster, leading to failures. The `setup_tacl` job task
has been removed, and the `crawl_tables` task has been updated to no
longer rely on the TACL cluster, instead refreshing the inventory
directly. A new dependency has been added to ensure that the
`crawl_tables` task runs after the `verify_prerequisites` task. The
`refresh_table_migration_status` task and `update_tables_history_log`
task have also been updated to assume that the inventory and migration
status have been refreshed in the previous step. A TODO has been added
to avoid triggering an implicit refresh if either the table or
migration-status inventory is empty.
* Updated databricks-labs-lsql requirement from <0.13,>=0.5 to
>=0.5,<0.14
([#3241](#3241)). In this
pull request, we have updated the `databricks-labs-lsql` requirement in
the `pyproject.toml` file to a range of greater than 0.5 and less than
0.14, allowing the use of the latest version of this library. The update
includes release notes and a changelog from the `databricks-labs-lsql`
GitHub repository, detailing new features, bug fixes, and improvements.
Notable changes include the addition of the `escape_name` and
`escape_full_name` functions, various dependency updates, and
modifications to the `as_dict()` method in the `Row` class. This update
also includes a list of dependency version updates from the
`databricks-labs-lsql` changelog.
* Updated databricks-labs-lsql requirement from <0.14,>=0.5 to
>=0.5,<0.15
([#3321](#3321)). In this
release, the `databricks-labs-lsql` package requirement has been updated
to version '>=0.5,<0.15' in the pyproject.toml file. This update
addresses multiple issues and includes several improvements, such as bug
fixes, dependency updates, and the addition of go-git libraries. The
`RuntimeBackend` component has been improved with better exception
handling, and new `escape_name` and `escape_full_name` functions have
been added for SQL name escaping. The 'Row.as_dict()' method has been
deprecated in favor of 'asDict()'. The `SchemaDeployer` class now allows
overwriting the default `hive_metastore` catalog, and the `MockBackend`
component has been improved to properly mock the `savetable` method in
`append` mode. Filter specification files have been converted from JSON
to YAML format for improved readability. Additionally, the test suite
has been expanded, and various methods have been updated to improve
codebase readability, maintainability, and ease of use.
* Updated sqlglot requirement from <25.30,>=25.5.0 to >=25.5.0,<25.32
([#3320](#3320)). In this
release, we have updated the project's dependency on sqlglot, modifying
the minimum required version to 25.5.0 and setting the maximum allowed
version to below 25.32. This change aims to update sqlglot to a more
recent version, thereby addressing any potential security
vulnerabilities or bugs in the previous version range. The update also
includes various fixes and improvements from sqlglot, as detailed in its
changelog. The individual commits have been truncated and can be viewed
in the compare view. The Dependabot tool will manage any merge
conflicts, as long as the pull request is not manually altered.
Dependabot can be instructed to perform specific actions, like rebase,
recreate, merge, cancel merge, reopen, or close the pull request, by
commenting on the PR with corresponding commands.
* Use internal Permissions Migration API by default
([#3230](#3230)). This pull
request introduces support for both legacy and new permission migration
workflows in the Databricks UCX project. A new configuration option,
`use_legacy_permission_migration`, has been added to `WorkspaceConfig`
to toggle between the two workflows. When the legacy workflow is not
enabled, certain steps in `workflows.py` are skipped and related methods
have been renamed to reflect the legacy workflow. The `GroupMigration`
class has been renamed to `LegacyGroupMigration` and integration and
unit tests have been updated to use the new configuration option and
renamed classes/methods. The new workflow no longer queries the
`hive_metastore`.`ucx`.`groups` table in certain methods, resulting in
changes to the behavior of the `test_runtime_workspace_listing` and
`test_runtime_crawl_permissions` tests. Overall, these changes provide
flexibility for users to choose between legacy and new permission
migration workflows in the Databricks UCX project.

Dependency updates:

* Updated databricks-labs-lsql requirement from <0.13,>=0.5 to
>=0.5,<0.14 ([#3241](#3241)).
* Updated databricks-labs-lsql requirement from <0.14,>=0.5 to
>=0.5,<0.15 ([#3321](#3321)).
* Updated sqlglot requirement from <25.30,>=25.5.0 to >=25.5.0,<25.32
([#3320](#3320)).
* Bump codecov/codecov-action from 4 to 5
([#3316](#3316)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment