Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update databricks-labs-lsql requirement from <0.8,>=0.5 to >=0.5,<0.9 #2416

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Aug 13, 2024

Updates the requirements on databricks-labs-lsql to permit the latest version.

Changelog

Sourced from databricks-labs-lsql's changelog.

0.8.0

  • Removed deploy_dashboard method (#240). In this release, the deploy_dashboard method has been removed from the dashboards.py file and the legacy deployment method has been deprecated. The deploy_dashboard method was previously used to deploy a dashboard to a workspace, but it has been replaced with the create method of the lakeview attribute of the WorkspaceClient object. Additionally, the test_dashboards_creates_dashboard_via_legacy_method method has been removed. A new test has been added to ensure that the deploy_dashboard method is no longer being used, utilizing the deprecated_call function from pytest to verify that calling the method raises a deprecation warning. This change simplifies the code and improves the overall design of the system, resolving issue #232. The _with_better_names method and create_dashboard method remain unchanged.
  • Skip test that fails due to insufficient permission to create schema (#248). A new test function, test_dashboards_creates_dashboard_with_replace_database, has been added to the open-source library, but it is currently marked to be skipped due to missing permissions to create a schema. This function creates an instance of the Dashboards class with the ws parameter, creates a dashboard using the make_dashboard function, and performs various actions using the created dashboard, as well as functions such as tmp_path and sql_backend. This test function aims to ensure that the Dashboards class functions as expected when creating a dashboard with a replaced database. Once the necessary permissions for creating a schema are acquired, this test function can be enabled for further testing and validation.
  • Updates to use the Databricks Python sdk 0.30.0 (#247). In this release, we have updated the project to use Databricks Python SDK version 0.30.0. This update includes changes to the execute and fetch_value functions, which now use the new StatementResponse type instead of ExecuteStatementResponse. A conditional import statement has been added to maintain compatibility with both Databricks SDK versions 0.30.0 and below. The execute function now raises TimeoutError when the specified timeout is greater than 50 seconds and the statement execution hasn't finished. Additionally, the fetch_value function has been updated to handle the case when the execute function returns None. The unit test file test_backends.py has also been updated to reflect these changes, with multiple test functions now using the StatementResponse class instead of ExecuteStatementResponse. These changes improve the system's compatibility with the latest version of the Databricks SDK, ensuring that the core functionality of the SDK continues to work as expected.

0.7.5

  • Fixed missing widget name suffixes (#243). In this release, we have addressed an issue related to missing widget name suffixes (#243) by adding a _widget suffix to the name of the widget object in the dashboards.py file. This change ensures consistency between the widget name and the id of the query, facilitating user understanding of the relationship between the two. A new method, _get_query_widget_spec, has also been added, although its specific functionality requires further investigation. Additionally, the unit tests in the tests/unit/test_dashboards.py file have been updated to check for the presence of the _widget suffix in widget names, ensuring that the tests accurately reflect the desired behavior. These changes improve the consistency of dashboard widget naming, thus benefiting software engineers utilizing or extending the project's widget-ordering functionalities.

0.7.4

  • Fixed dataset/widget name uniqueness requirement that was preventing dashboards being deployed (#241). A fix has been implemented to address a uniqueness requirement issue with the dataset/widget name that was preventing dashboard deployment. A new widget instance is now created with a unique name, generated by appending _widget to the metadata ID, in the get_layouts method. This ensures that multiple widgets with the same ID but different content can exist in a single dashboard, thereby meeting the name uniqueness requirement. In the save_to_folder method, the widget name is modified by removing the _widget suffix before writing the textbox specification to a markdown file, maintaining consistency between the widget ID and file name. These changes are localized to the get_layouts and save_to_folder methods, and no new methods have been added. The existing functionality related to the creation, validation, and saving of dashboard layouts remains unaltered.

0.7.3

  • Added publish flag to Dashboards.create_dashboard (#233). In this release, we have added a publish flag to the Dashboards.create_dashboard method, allowing users to publish the dashboard upon creation, thereby resolving issue #219. This flag is included in the labs.yml file with a description of its functionality. Additionally, the no-open flag's description has been updated to specify that it prevents the dashboard from opening in the browser after creation. The create_dashboard function in the cli.py and dashboards.py files has been updated to include the new publish flag, allowing for more flexibility in how users create and manage their dashboards. The Dashboards.create_dashboard method now calls the WorkspaceClient.lakeview.publish method when the publish flag is set to True, which publishes the created dashboard. This behavior is covered in the updated tests for the method.
  • Fixed boolean cli flags (#235). In this release, we have improved the handling of command-line interface (CLI) flags in the databricks labs command. Specifically, we have addressed the limitation that pure boolean flags are not supported. Now, when using boolean flags, the user will be prompted to confirm with a y or 'yes'. We have modified the create_dashboard command to accept string inputs for the publish and no_open flags, which are then converted to boolean values for internal use. Additionally, we have introduced a new open-browser command, which will open the dashboard in the browser after creating when set to y or 'yes'. These changes have been tested manually to ensure correct behavior. This improvement provides a more flexible input experience and better handling of boolean flags in the CLI command for software engineers using the open-source library.
  • Fixed format breaks widget (#238). In this release, we've made significant changes to the 'databricks/labs/lsql' directory's 'dashboards.py' file to address formatting breaks in the widget that could occur with Call to Action (CTA) presence in a query. These changes include the addition of new class variables, including _SQL_DIALECT and _DIALECT, and their integration into existing methods such as _parse_header, validate, format, _get_abstract_syntax_tree, and replace_catalog_and_database_in_query. Furthermore, we have developed new methods for creating and deleting schemas and getting the current test purge time. We have also implemented new integration tests to demonstrate the fix for the formatting issue and added new test cases for the query handler's header-splitting functionality, query formatting, and CTE handling. These enhancements improve the library's handling of SQL queries and query tiles in the context of dashboard creation, ensuring proper parsing, formatting, and metadata extraction for a wide range of query scenarios.
  • Fixed replace database when catalog or database is None (#237). In this release, we have addressed an issue where system tables disappeared in ucx dashboards when replacing the placeholder database. To rectify this, we have developed a new method, replace_catalog_and_database_in_query, in the dashboards.py file's replace_database function. This method checks if the catalog or database in a query match the ones to be replaced and replaces them with new ones, ensuring that system tables are not lost during the replacement process. Additionally, we have introduced new unit tests in test_dashboards.py to verify that queries are correctly transformed when replacing the database or catalog in the query. These tests include various scenarios, using two parametrized test functions, to ensure the correct functioning of the feature. This change provides a more robust and reliable dashboard display when replacing the placeholder database in the system.

0.7.2

  • Fixed dashboard deployment/creation (#230). The recent changes to our open-source library address issues related to dashboard deployment and creation, enhancing their reliability and consistency. The deploy_dashboard function has been deprecated in favor of the more accurate create_dashboard function, which now includes a publish flag. A validate method has been added to the Tile, MarkdownTile, and QueryTile classes to raise an error if the dashboard is invalid. The test_dashboards.py file has been updated to reflect these changes. These enhancements address issues #222, #229, and partially resolve #220. The commit includes an image of a dashboard created through the deprecated deploy_dashboard method. These improvements ensure better dashboard creation, validation, and deployment, while also maintaining backward compatibility through the deprecation of deploy_dashboard.

0.7.1

  • Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (#224). In version 3.0.0 of sigstore/gh-action-sigstore-python, several changes, additions, and removals have been implemented. Notably, certain settings such as fulcio-url, rekor-url, ctfe, and rekor-root-pubkey have been removed. Additionally, the output settings signature, certificate, and bundle have also been removed. The inputs are now parsed according to POSIX shell lexing rules for better consistency. The release-signing-artifacts setting no longer causes a hard error when used under the incorrect event. Furthermore, various deprecations present in sigstore-python's 2.x series have been resolved. The default suffix has been changed from .sigstore to .sigstore.json, in line with Sigstore's client specification. The release-signing-artifacts setting now defaults to true. This version also includes several bug fixes and improvements to support CI runners that use PEP 668 to constrain global package prefixes.
  • Use default factory to create Tile._position (#226). In this change, the default value creation for the _position field in various classes including Tile, MarkdownTile, TableTile, and CounterTile has been updated. Previously, a new Position object was explicitly created for the default value. With this update, the default_factory argument of the dataclasses.field function is now used to create a new Position object. This change is made in anticipation of the Python 3.11 release, which modifies the field default mutability check behavior. By utilizing the default_factory approach, we ensure that a new Position object is generated during each instance creation, rather than reusing a single default instance. This guarantees the immutability of default values and aligns with best practices for forward-compatibility with future Python versions. It is important to note that this modification does not affect the functionality of the classes but enhances their initialization process.

Dependency updates:

  • Bump sigstore/gh-action-sigstore-python from 2.1.1 to 3.0.0 (#224).

0.7.0

  • Added databricks labs lsql fmt command (#221). The commit introduces a new command, databricks labs lsql fmt, to the open-source library, which formats SQL files in a given folder using the Databricks SDK. This command can be used without authentication and accepts a folder flag, which specifies the directory containing SQL files to format. The change also updates the labs.yml file and includes a new method, format, in the QueryTile class, which formats SQL queries using the sqlglot library. This commit enhances the functionality of the CLI for SQL file formatting and improves the readability and consistency of SQL files, making it easier for developers to understand and maintain the code. Additionally, the commit includes changes to various SQL files to demonstrate the improved formatting, such as converting SQL keywords to uppercase, adding appropriate spacing around keywords and operators, and aligning column names in the VALUES clause. The purpose of this change is to ensure that the formatting method works correctly and does not introduce any issues in the existing functionality.

0.6.0

  • Added method to dashboards to get dashboard url (#211). In this release, we have added a new method get_url to the lakeview_dashboards object in the laksedashboard library. This method utilizes the Databricks SDK to retrieve the dashboard URL, simplifying the code and making it more maintainable. Previously, the dashboard URL was constructed by concatenating the host and dashboard ID, but this new method ensures that the URL is obtained correctly, even if the format changes in the future. Additionally, a new unit test has been added for a method that gets the dashboard URL using the workspace client. This new functionality allows users to easily retrieve the URL for a dashboard using its ID and the workspace client.
  • Extend replace database in query (#210). This commit extends the database replacement functionality in the DashboardMetadata class, allowing users to specify which database and catalog to replace. The enhancement includes support for catalog replacement and a new replace_database method in the DashboardMetadata class, which replaces the catalog and/or database in the query based on provided parameters. These changes enhance the flexibility and customization of the database replacement feature in queries, making it easier for users to control how their data is displayed in the dashboard. The create_dashboard function has also been updated to use the new method for replacing the database and catalog. Additionally, the TileMetadata update method has been replaced with a new merge method, and the QueryTile and Tile classes have new properties and methods for handling content, width, height, and position. The commit also includes several unit tests to ensure the new functionality works as expected.
  • Improve object oriented dashboard-as-code implementation (#208). In this release, the object-oriented implementation of the dashboard-as-code feature has been significantly improved, addressing previous pull request comments (#201). The TileMetadata dataclass now includes methods for updating and comparing tile metadata, and the DashboardMetadata class has been removed and its functionality incorporated into the Dashboards class. The Dashboards class now generates tiles, datasets, and layouts for dashboards using the provided query_transformer. The code's readability and maintainability have been further enhanced by replacing the use of the copy module with dataclasses.replace for creating object copies. Additionally, updates have been made to the unit tests for dashboard functionality in the project, with new methods and attributes added to check for valid dashboard metadata and handle duplicate query or widget IDs, as well as to specify the order in which tiles and widgets should be displayed in the dashboard.

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [databricks-labs-lsql](https://github.com/databrickslabs/lsql) to permit the latest version.
- [Release notes](https://github.com/databrickslabs/lsql/releases)
- [Changelog](https://github.com/databrickslabs/lsql/blob/main/CHANGELOG.md)
- [Commits](databrickslabs/lsql@v0.5.0...v0.8.0)

---
updated-dependencies:
- dependency-name: databricks-labs-lsql
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot requested a review from a team August 13, 2024 15:22
@dependabot dependabot bot added dependencies python Pull requests that update Python code labels Aug 13, 2024
@dependabot dependabot bot requested a review from qziyuan August 13, 2024 15:22
@nfx nfx merged commit 882dbc6 into main Aug 13, 2024
4 of 6 checks passed
@nfx nfx deleted the dependabot/pip/databricks-labs-lsql-gte-0.5-and-lt-0.9 branch August 13, 2024 15:54
nfx added a commit that referenced this pull request Aug 15, 2024
* Added `validate-table-locations` command for checking overlapping tables across workspaces ([#2341](#2341)). A new command, `validate-table-locations`, has been added to check for overlapping table locations across workspaces before migrating tables. This command is intended to ensure that tables can be migrated across workspaces without issues. The new command is part of the table migration workflows and uses a `LocationTrie` data structure to efficiently search for overlapping table locations. If any overlaps are found, the command logs a warning message and adds the conflicting tables to a list of all conflicts. This list is returned at the end of the command. The `validate-table-locations` command is intended to be run before migrating tables to ensure that the tables can be migrated without conflicts. The command includes a `workspace-ids` flag, which allows users to specify a list of workspace IDs to include in the validation. If this flag is not provided, the command will include all workspaces present in the account. This new command resolves issue [#673](#673). The `validate_table_locations` method is added to the `AccountAggregate` class and the `ExternalLocations` class has been updated to use the new `LocationTrie` class. The import section has also been updated to include new modules such as `LocationTrie` and `Table` from `databricks.labs.ucx.hive_metastore.locations` and `databricks.labs.ucx.hive_metastore.tables` respectively. Additionally, test cases have been added to ensure the correct functioning of the `LocationTrie` class.
* Added references to hive_metastore catalog in all table references an… ([#2419](#2419)). In this release, we have updated various methods and functions across multiple files to include explicit references to the `hive_metastore` catalog in table references. This change aims to improve the accuracy and consistency of table references in the codebase, enhancing reliability and maintainability. Affected files include `azure.py`, `init_scripts.py`, `pipelines.py`, and others in the `databricks/labs/ucx/assessment` module, as well as test files in the `tests/unit/assessment` and `tests/unit/azure` directories. The `_try_fetch` method has been updated to include the catalog name in table references in all instances, ensuring the correct catalog is referenced in all queries. Additionally, various test functions in affected files have been updated to reference the `hive_metastore` catalog in SQL queries. This update is part of the resolution of issue [#2207](#2207) and promotes robust handling of catalog, schema, and table naming scenarios in hive metastore migration status management.
* Added support for skipping views when migrating tables and views ([#2343](#2343)). In this release, we've added support for skipping both tables and views during the migration process in the `databricks labs ucx` command, addressing issue [#1937](#1937). The `skip` command has been enhanced to support skipping views, and new functions `skip_table_or_view` and `load_one` have been introduced to the `Table` class. Appropriate error handling and tests, including unit tests and integration tests, have been implemented to ensure the functionality works as expected. With these changes, users can now skip views during migration and have more flexibility when working with tables in the Unity Catalog.
* Avoid false positives when linting for pyspark patterns ([#2381](#2381)). This release includes enhancements to the PySpark linter aimed at reducing false positives during linting. The linter has been updated to check the originating module when detecting PySpark calls, ensuring that warnings are triggered only for relevant nodes from the pyspark or dbutils modules. Specifically, the `ReturnValueMatcher` and `DirectFilesystemAccessMatcher` classes have been modified to include this new check. These changes improve the overall accuracy of the PySpark linter, ensuring that only pertinent warnings are surfaced during linting. Additionally, the commit includes updated unit tests to verify the correct behavior of the modified linter. Specific improvements have been made to avoid false positives when detecting the `listTables` function in the PySpark catalog, ensuring that the warning is only triggered for the actual PySpark `listTables` method call.
* Bug: Generate custom warning when doing table size check and encountering DELTA_INVALID_FORMAT exception ([#2426](#2426)). A modification has been implemented in the `_safe_get_table_size` method within the `table_size.py` file of the `hive_metastore` package. This change addresses an issue ([#1913](#1913)) concerning the occurrence of a `DELTA_INVALID_FORMAT` exception while determining the size of a Delta table. Instead of raising an error, the exception is now converted into a warning, and the function proceeds to process the rest of the table. A corresponding warning message has been added to inform users about the issue and suggest checking the table structure. No new methods have been introduced, and existing functionalities have been updated to handle this specific exception more gracefully. The changes have been thoroughly tested with unit tests for the table size check when encountering a `DELTA_INVALID_FORMAT` error, employing a mock backend and a mock Spark session to simulate the error conversion. This change does not affect user documentation, CLI commands, workflows, or tables, and is solely intended for software engineers adopting the project.
* Clean up left over uber principal resources for Azure ([#2370](#2370)). This commit includes modifications to the Azure access module of the UCX project to clean up resources if the creation of the uber principal fails midway. It addresses issues [#2360](#2360) (Azure part) and [#2363](#2363), and modifies the command `databricks labs ucx create-uber-principal` to include this functionality. The changes include adding new methods and modifying existing ones for working with Azure resources, such as `StorageAccount`, `AccessConnector`, and `AzureRoleAssignment`. Additionally, new unit and integration tests have been added and manually tested to ensure that the changes work as intended. The commit also includes new fixtures for testing storage accounts and access connectors, and a test case for getting, applying, and deleting storage permissions. The `azure_api_client` function has been updated to handle different input argument lengths and methods such as "get", "put", and "post". A new managed identity, "appIduser1", has been added to the Azure mappings file, and the corresponding role assignments have been updated. The changes include error handling mechanisms for certain scenarios that may arise during the creation of the uber service principal.
* Crawlers: Use `TRUNCATE TABLE` instead of `DELETE FROM` when resetting crawler tables ([#2392](#2392)). In this release, the `.reset()` method for crawlers has been updated to use `TRUNCATE TABLE` instead of `DELETE FROM` when clearing out crawler tables, resulting in more efficient and idiomatic code. This change affects the existing `migrate-data-reconciliation` workflow and is accompanied by updated unit and integration tests to ensure correct functionality. The `reset()` method now accepts a table name argument, which is passed to the newly introduced `escape_sql_identifier()` utility function from the `databricks.labs.ucx.framework.utils` module for added safety. The migration status is now refreshed using the `TRUNCATE TABLE` command, which removes all records from the table, providing improved performance compared to the previous implementation. The `SHOW DATABASES` and `TRUNCATE TABLE` queries are validated in the `refresh_migration_status` workflow test, which now checks if the `TRUNCATE TABLE` query is used instead of `DELETE FROM` when resetting crawler tables.
* Detect tables that are not present in the mapping file ([#2205](#2205)). In this release, we have introduced a new method `get_remaining_tables()` that returns a list of tables in the Hive metastore that have not been processed by the migration tool. This method performs a full refresh of the index and checks each table in the Hive metastore against the index to determine if it has been migrated. We have also added a new private method `_is_migrated()` to check if a given table has already been migrated. Additionally, we have replaced the `refresh_migration_status` method with `update_migration_status` in several workflows to present a more accurate representation of the migration process in the dashboard. A new SQL script, 04_1_remaining_hms_tables.sql, has been added to list the remaining tables in Hive Metastore which are not present in the mapping file. We have also added a new test for the table migration job that verifies that tables not present in the mapping file are detected and reported. A new test function `test_refresh_migration_status_published_remained_tables` has been added to ensure that the migration process correctly handles the case where tables have been published to the target metadata store but still remain in the source metadata store. These changes are intended to improve the functionality of the migration tool for Hive metastore tables and resolve issue [#1221](#1221).
* Fixed ConcurrentDeleteReadException in migrate-view task during table migration ([#2282](#2282)). In this release, we have implemented a fix for the ConcurrentDeleteReadException that occurred during the `migrate-view` command's migration task. The solution involved moving the refreshing of migration status between batches from within the batches. Along with the fix, we added a new method `index()` to the `TableMigrate` class, which checks if a table has been migrated or not. This method is utilized in the `_view_can_be_migrated` method to ensure that all dependencies of a view have been migrated before migrating the view. The `index_full_refresh()` method, which earlier performed this check, has been modified to refresh the index between batches instead of within batches. It is worth noting that the changes made have been manually tested, but no unit tests, integration tests, or verification on staging environments have been added. The target audience for this release is software engineers who adopt this project. No new documentation, commands, workflows, or tables have been added or modified in this release.
* Fixed documentation typos: `create-missing-pricipals` -> `create-missing-principals` ([#2357](#2357)). This pull request resolves typographical errors in the `create-missing-principals` command documentation, correcting the mistaken usage of `create-missing-pricipals` throughout the project documentation. The changes encompass the command description, the UCX command section, and the manual process documentation for AWS storage credentials. The `create-missing-principals` command, utilized by Cloud Admins to create and configure new AWS roles for Unity Catalog access, remains functionally unaltered.
* Fixed linting for Spark Python workflow tasks ([#2349](#2349)). This commit updates the linter to support PySpark tasks in workflows by modifying the existing `experimental-workflow-linter` to correctly handle these tasks. Previously, the linter assumed Python files were Jupyter notebooks, but PySpark tasks are top-level Python files run as `__main__`. This change introduces a new `ImportFileResolver` class to resolve imports for PySpark tasks, and updates to the `DependencyResolver` class to properly handle them. Additionally, unit and integration tests have been updated and added to ensure the correct behavior of the linter. The `DependencyResolver` constructor now accepts an additional `import_resolver` argument in some instances. This commit resolves issue [#2213](#2213) and improves the accuracy and versatility of the linter for different types of Python files.
* Fixed missing `security_policy` when updating SQL warehouse config ([#2409](#2409)). In this release, we have added new methods `GetWorkspaceWarehouseConfigResponseSecurityPolicy` and `SetWorkspaceWarehouseConfigRequestSecurityPolicy` to improve handling of SQL warehouse config security policy. We have introduced a new variable `security_policy` to store the security policy value, which is used when updating the SQL warehouse configuration, ensuring that the required security policy is set and fixing the `InvalidParameterValue: Endpoint security policy is required and must be one of NONE, DATA_ACCESS_CONTROL, PASSTHROUGH` error. Additionally, when the `enable_serverless_compute` error occurs, the new SQL warehouse data access config is printed in the log, allowing users to manually configure the uber principal in the UI. We have also updated the `create_uber_principal` method to set the security policy correctly and added parameterized tests to test the setting of the warehouse configuration security policy. The `test_create_global_spn` method has been updated to include the `security_policy` parameter in the `create_global_spn` method call, and new test cases have been added to verify the warehouse config's security policy is correctly updated. These enhancements will help make the system more robust and user-friendly.
* Fixed raise logs `ResourceDoesNotExists` when iterating the log paths ([#2382](#2382)). In this commit, we have improved the handling of the `ResourceDoesNotExist` exception when iterating through log paths in the open-source library. Previously, the exception was not being properly raised or handled, resulting in unreliable code behavior. To address this, we have added unit tests in the `test_install.py` file that accurately reflect the actual behavior when iterating log paths. We have also modified the test to raise the `ResourceDoesNotExist` exception when the result is iterated over, rather than when the method is called. Additionally, we have introduced the `ResourceDoesNotExistIter` class to make it easier to simulate the error during testing. These changes ensure that the code can gracefully handle cases where the specified log path does not exist, improving the overall reliability and robustness of the library. Co-authored by Andrew Snare.
* Generate custom error during installation due to external metastore connectivity issues ([#2425](#2425)). In this release, we have added a new custom error `OperationFailed` to the `InstallUcxError` enumeration in the `databricks/labs/ucx/install.py` file. This change is accompanied by an exception handler that checks if the error message contains a specific string related to AWS credentials, indicating an issue with external metastore connectivity. If this condition is met, a new `OperationFailed` error is raised with a custom message, providing instructions for resolving the external metastore connectivity issue and re-running the UCX installation. This enhancement aims to provide a more user-friendly error message for users encountering issues during UCX installation due to external metastore connectivity. The functionality of the `_create_database` method has been modified to include this new error handling mechanism, with no alterations made to the existing functionality. Although the changes have not been tested using unit tests, integration tests, or staging environments, they have been manually tested on a workspace with incorrect external metastore connectivity.
* Improve logging when waiting for workflows to complete ([#2364](#2364)). This pull request enhances the logging and error handling of the databricks labs ucx project, specifically when executing workflows. It corrects a bug where `skip_job_wait` was not handled correctly, and now allows for a specified timeout period instead of the default 20 minutes. Job logs are replicated into the local logfile if a workflow times out. The existing `databricks labs ucx ensure-assessment-run` and `databricks labs ucx migrate-tables` commands have been updated, and unit and integration tests have been added. These improvements provide more detailed and informative logs, and ensure the quality of the code through added tests. The behavior of the `wait_get_run_job_terminated_or_skipped` method has been modified in the tests, and the `assign_metastore` function has also been updated, but the specific changes are not specified in the commit message.
* Lint dependencies consistently ([#2400](#2400)). In this release, the `files.py` module in the `databricks/labs/ucx/source_code/linters` package has been updated to ensure consistent linting of jobs. Previously, the linting sequence was not consistent and did not provide inherited context when linting jobs. These issues have been resolved by modifying the `_lint_one` function to build an inherited tree when linting files or notebooks, and by adding a `FileLinter` object to determine which file/notebook linter to use for linting. The `_lint_task` function has also been updated to yield a `LocatedAdvice` object instead of a tuple, and takes an additional argument, `linted_paths`, which is a set of paths that have been previously linted. Additionally, the `_lint_notebook` and `_lint_file` functions have been removed as their functionality is now encompassed by the `_lint_one` function. These changes ensure consistent linting of jobs and inherited context. Unit tests were run and passed. Co-authored by Eric Vergnaud.
* Make Lakeview names unique ([#2354](#2354)). A change has been implemented to guarantee the uniqueness of dataset names in Lakeview dashboards, addressing issue [#2345](#2345) where non-unique names caused system errors. This change includes renaming the `count` dataset to `fourtytwo` in the `datasets` field and updating the `name` field for a widget query from `count` to 'counter'. These internal adjustments streamline the Lakeview system's functionality while ensuring consistent and unique dataset naming.
* Optimisation: when detecting if a file is a notebook only read the start instead of the whole file ([#2390](#2390)). In this release, we have optimized the detection of notebook files during linting in a non-Workspace path in our open-source library. This has been achieved by modifying the `is_a_notebook` function in the `base.py` file to improve efficiency. The function now checks the start of the file instead of loading the entire file, which is more resource-intensive. If the content of the file is not available, it will attempt to read the file header when opening the file, instead of returning False if there's an error. The `magic_header` is used to determine if the file is a notebook or not. If the content is available, the function will check if it starts with the `magic_header` to determine if a file is a notebook or not, without having to read the entire file, which can be time-consuming and resource-intensive. This change will improve the linting performance and reduce resource usage, making the library more efficient and user-friendly for software engineers.
* Retry dashboard install on DeadlineExceeded ([#2379](#2379)). In this release, we have added the DeadlineExceeded exception to the @Retried decorator in the _create_dashboard method, which is used to create a lakeview dashboard from SQL queries in a folder. This modification to the existing functionality is intended to improve the reliability of dashboard installation by retrying the operation up to four minutes in case of a DeadlineExceeded error. This change resolves issues [#2376](#2376), [#2377](#2377), and [#2389](#2389), which were related to dashboard installation timeouts. Software engineers will benefit from this update as it will ensure successful installation of dashboards, even in scenarios where timeouts were previously encountered.
* Updated databricks-labs-lsql requirement from <0.8,>=0.5 to >=0.5,<0.9 ([#2416](#2416)). In this update, we have modified the version requirement for the `databricks-labs-lsql` library to allow version 0.8, which includes bug fixes and improvements in dashboard creation and deployment. We have changed the version constraint from '<0.8,>=0.5' to '>=0.5,<0.9' to accommodate the latest version while preventing future major version upgrades. This change enhances the overall design of the system and simplifies the code for managing dashboard deployment. Additionally, we have introduced a new test that verifies the `deploy_dashboard` method is no longer being used, utilizing the `deprecated_call` function from pytest to ensure that calling the method raises a deprecation warning. This ensures that the system remains maintainable and up-to-date with the latest version of the `databricks-labs-lsql` library.
* Updated ext hms detection to include more conf attributes ([#2414](#2414)). In this enhancement, the code for installing UCX has been updated to include additional configuration attributes for detecting external Hive Metastore (HMS). The previous implementation failed to recognize specific attributes like "spark.hadoop.hive.metastore.uris". This update introduces a new list of recognized prefixes for external HMS spark attributes, namely "spark_conf.spark.sql.hive.metastore", "spark_conf.spark.hadoop.hive.metastore", "spark_conf.spark.hadoop.javax.jdo.option", and "spark_conf.spark.databricks.hive.metastore". This change enables the extraction of a broader set of configuration attributes during installation, thereby improving the overall detection and configuration of external HMS attributes.
* Updated sqlglot requirement from <25.11,>=25.5.0 to >=25.5.0,<25.12 ([#2415](#2415)). In this pull request, we have updated the `sqlglot` dependency to a new version range, `>=25.5.0,<25.12`, to allow for the latest version of `sqlglot` to be used while ensuring that the version does not exceed 25.12. This change includes several breaking changes, new features, and bug fixes. New features include support for ALTER VIEW AS SELECT, UNLOAD, and other SQL commands, as well as improvements in performance. Bug fixes include resolutions for issues related to OUTER/CROSS APPLY parsing, GENERATE_TIMESTAMP_ARRAY, and other features. Additionally, there are changes that improve the handling of various SQL dialects, including BigQuery, DuckDB, Snowflake, Oracle, and ClickHouse. The changelog and commits also reveal fixes for bugs and improvements in the library since the last version used in the project. By updating the `sqlglot` dependency to the new version range, the project can leverage the newly added features, bug fixes, and performance improvements while ensuring that the version does not exceed 25.12.
* Updated sqlglot requirement from <25.9,>=25.5.0 to >=25.5.0,<25.11 ([#2403](#2403)). In this release, we have updated the required version range for the `sqlglot` library from '[25.5.0, 25.9)' to '[25.5.0, 25.11)'. This update allows us to utilize newer versions of `sqlglot` while maintaining compatibility with previous versions. The `sqlglot` team's latest release, version 25.10.0, includes several breaking changes, new features, bug fixes, and refactors. Notable changes include support for STREAMING tables in Databricks, transpilation of Snowflake's CONVERT_TIMEZONE function for DuckDB, support for GENERATE_TIMESTAMP_ARRAY in BigQuery, and the ability to parse RENAME TABLE as a Command in Teradata. Due to these updates, we strongly advise conducting thorough testing before deploying to production, as these changes may impact the functionality of the project.
* Use `load_table` instead of deprecated `is_view` in failing integration test `test_mapping_skips_tables_databases` ([#2412](#2412)). In this release, the `is_view` parameter of the `skip_table_or_view` method in the `test_mapping_skips_tables_databases` integration test has been deprecated and replaced with a more flexible `load_table` parameter. This change allows for greater control and customization when specifying how a table should be loaded during the test. The `load_table` parameter is a callable that returns a `Table` object, which contains information about the table's schema, name, object type, and table format. This improvement removes the use of a deprecated parameter, enhancing the maintainability of the test code.

Dependency updates:

 * Updated sqlglot requirement from <25.9,>=25.5.0 to >=25.5.0,<25.11 ([#2403](#2403)).
 * Updated databricks-labs-lsql requirement from <0.8,>=0.5 to >=0.5,<0.9 ([#2416](#2416)).
 * Updated sqlglot requirement from <25.11,>=25.5.0 to >=25.5.0,<25.12 ([#2415](#2415)).
@nfx nfx mentioned this pull request Aug 15, 2024
nfx added a commit that referenced this pull request Aug 15, 2024
* Added `validate-table-locations` command for checking overlapping
tables across workspaces
([#2341](#2341)). A new
command, `validate-table-locations`, has been added to check for
overlapping table locations across workspaces before migrating tables.
This command is intended to ensure that tables can be migrated across
workspaces without issues. The new command is part of the table
migration workflows and uses a `LocationTrie` data structure to
efficiently search for overlapping table locations. If any overlaps are
found, the command logs a warning message and adds the conflicting
tables to a list of all conflicts. This list is returned at the end of
the command. The `validate-table-locations` command is intended to be
run before migrating tables to ensure that the tables can be migrated
without conflicts. The command includes a `workspace-ids` flag, which
allows users to specify a list of workspace IDs to include in the
validation. If this flag is not provided, the command will include all
workspaces present in the account. This new command resolves issue
[#673](#673). The
`validate_table_locations` method is added to the `AccountAggregate`
class and the `ExternalLocations` class has been updated to use the new
`LocationTrie` class. The import section has also been updated to
include new modules such as `LocationTrie` and `Table` from
`databricks.labs.ucx.hive_metastore.locations` and
`databricks.labs.ucx.hive_metastore.tables` respectively. Additionally,
test cases have been added to ensure the correct functioning of the
`LocationTrie` class.
* Added references to hive_metastore catalog in all table references an…
([#2419](#2419)). In this
release, we have updated various methods and functions across multiple
files to include explicit references to the `hive_metastore` catalog in
table references. This change aims to improve the accuracy and
consistency of table references in the codebase, enhancing reliability
and maintainability. Affected files include `azure.py`,
`init_scripts.py`, `pipelines.py`, and others in the
`databricks/labs/ucx/assessment` module, as well as test files in the
`tests/unit/assessment` and `tests/unit/azure` directories. The
`_try_fetch` method has been updated to include the catalog name in
table references in all instances, ensuring the correct catalog is
referenced in all queries. Additionally, various test functions in
affected files have been updated to reference the `hive_metastore`
catalog in SQL queries. This update is part of the resolution of issue
[#2207](#2207) and promotes
robust handling of catalog, schema, and table naming scenarios in hive
metastore migration status management.
* Added support for skipping views when migrating tables and views
([#2343](#2343)). In this
release, we've added support for skipping both tables and views during
the migration process in the `databricks labs ucx` command, addressing
issue [#1937](#1937). The
`skip` command has been enhanced to support skipping views, and new
functions `skip_table_or_view` and `load_one` have been introduced to
the `Table` class. Appropriate error handling and tests, including unit
tests and integration tests, have been implemented to ensure the
functionality works as expected. With these changes, users can now skip
views during migration and have more flexibility when working with
tables in the Unity Catalog.
* Avoid false positives when linting for pyspark patterns
([#2381](#2381)). This
release includes enhancements to the PySpark linter aimed at reducing
false positives during linting. The linter has been updated to check the
originating module when detecting PySpark calls, ensuring that warnings
are triggered only for relevant nodes from the pyspark or dbutils
modules. Specifically, the `ReturnValueMatcher` and
`DirectFilesystemAccessMatcher` classes have been modified to include
this new check. These changes improve the overall accuracy of the
PySpark linter, ensuring that only pertinent warnings are surfaced
during linting. Additionally, the commit includes updated unit tests to
verify the correct behavior of the modified linter. Specific
improvements have been made to avoid false positives when detecting the
`listTables` function in the PySpark catalog, ensuring that the warning
is only triggered for the actual PySpark `listTables` method call.
* Bug: Generate custom warning when doing table size check and
encountering DELTA_INVALID_FORMAT exception
([#2426](#2426)). A
modification has been implemented in the `_safe_get_table_size` method
within the `table_size.py` file of the `hive_metastore` package. This
change addresses an issue
([#1913](#1913)) concerning
the occurrence of a `DELTA_INVALID_FORMAT` exception while determining
the size of a Delta table. Instead of raising an error, the exception is
now converted into a warning, and the function proceeds to process the
rest of the table. A corresponding warning message has been added to
inform users about the issue and suggest checking the table structure.
No new methods have been introduced, and existing functionalities have
been updated to handle this specific exception more gracefully. The
changes have been thoroughly tested with unit tests for the table size
check when encountering a `DELTA_INVALID_FORMAT` error, employing a mock
backend and a mock Spark session to simulate the error conversion. This
change does not affect user documentation, CLI commands, workflows, or
tables, and is solely intended for software engineers adopting the
project.
* Clean up left over uber principal resources for Azure
([#2370](#2370)). This
commit includes modifications to the Azure access module of the UCX
project to clean up resources if the creation of the uber principal
fails midway. It addresses issues
[#2360](#2360) (Azure part)
and [#2363](#2363), and
modifies the command `databricks labs ucx create-uber-principal` to
include this functionality. The changes include adding new methods and
modifying existing ones for working with Azure resources, such as
`StorageAccount`, `AccessConnector`, and `AzureRoleAssignment`.
Additionally, new unit and integration tests have been added and
manually tested to ensure that the changes work as intended. The commit
also includes new fixtures for testing storage accounts and access
connectors, and a test case for getting, applying, and deleting storage
permissions. The `azure_api_client` function has been updated to handle
different input argument lengths and methods such as "get", "put", and
"post". A new managed identity, "appIduser1", has been added to the
Azure mappings file, and the corresponding role assignments have been
updated. The changes include error handling mechanisms for certain
scenarios that may arise during the creation of the uber service
principal.
* Crawlers: Use `TRUNCATE TABLE` instead of `DELETE FROM` when resetting
crawler tables
([#2392](#2392)). In this
release, the `.reset()` method for crawlers has been updated to use
`TRUNCATE TABLE` instead of `DELETE FROM` when clearing out crawler
tables, resulting in more efficient and idiomatic code. This change
affects the existing `migrate-data-reconciliation` workflow and is
accompanied by updated unit and integration tests to ensure correct
functionality. The `reset()` method now accepts a table name argument,
which is passed to the newly introduced `escape_sql_identifier()`
utility function from the `databricks.labs.ucx.framework.utils` module
for added safety. The migration status is now refreshed using the
`TRUNCATE TABLE` command, which removes all records from the table,
providing improved performance compared to the previous implementation.
The `SHOW DATABASES` and `TRUNCATE TABLE` queries are validated in the
`refresh_migration_status` workflow test, which now checks if the
`TRUNCATE TABLE` query is used instead of `DELETE FROM` when resetting
crawler tables.
* Detect tables that are not present in the mapping file
([#2205](#2205)). In this
release, we have introduced a new method `get_remaining_tables()` that
returns a list of tables in the Hive metastore that have not been
processed by the migration tool. This method performs a full refresh of
the index and checks each table in the Hive metastore against the index
to determine if it has been migrated. We have also added a new private
method `_is_migrated()` to check if a given table has already been
migrated. Additionally, we have replaced the `refresh_migration_status`
method with `update_migration_status` in several workflows to present a
more accurate representation of the migration process in the dashboard.
A new SQL script, 04_1_remaining_hms_tables.sql, has been added to list
the remaining tables in Hive Metastore which are not present in the
mapping file. We have also added a new test for the table migration job
that verifies that tables not present in the mapping file are detected
and reported. A new test function
`test_refresh_migration_status_published_remained_tables` has been added
to ensure that the migration process correctly handles the case where
tables have been published to the target metadata store but still remain
in the source metadata store. These changes are intended to improve the
functionality of the migration tool for Hive metastore tables and
resolve issue
[#1221](#1221).
* Fixed ConcurrentDeleteReadException in migrate-view task during table
migration ([#2282](#2282)).
In this release, we have implemented a fix for the
ConcurrentDeleteReadException that occurred during the `migrate-view`
command's migration task. The solution involved moving the refreshing of
migration status between batches from within the batches. Along with the
fix, we added a new method `index()` to the `TableMigrate` class, which
checks if a table has been migrated or not. This method is utilized in
the `_view_can_be_migrated` method to ensure that all dependencies of a
view have been migrated before migrating the view. The
`index_full_refresh()` method, which earlier performed this check, has
been modified to refresh the index between batches instead of within
batches. It is worth noting that the changes made have been manually
tested, but no unit tests, integration tests, or verification on staging
environments have been added. The target audience for this release is
software engineers who adopt this project. No new documentation,
commands, workflows, or tables have been added or modified in this
release.
* Fixed documentation typos: `create-missing-pricipals` ->
`create-missing-principals`
([#2357](#2357)). This pull
request resolves typographical errors in the `create-missing-principals`
command documentation, correcting the mistaken usage of
`create-missing-pricipals` throughout the project documentation. The
changes encompass the command description, the UCX command section, and
the manual process documentation for AWS storage credentials. The
`create-missing-principals` command, utilized by Cloud Admins to create
and configure new AWS roles for Unity Catalog access, remains
functionally unaltered.
* Fixed linting for Spark Python workflow tasks
([#2349](#2349)). This
commit updates the linter to support PySpark tasks in workflows by
modifying the existing `experimental-workflow-linter` to correctly
handle these tasks. Previously, the linter assumed Python files were
Jupyter notebooks, but PySpark tasks are top-level Python files run as
`__main__`. This change introduces a new `ImportFileResolver` class to
resolve imports for PySpark tasks, and updates to the
`DependencyResolver` class to properly handle them. Additionally, unit
and integration tests have been updated and added to ensure the correct
behavior of the linter. The `DependencyResolver` constructor now accepts
an additional `import_resolver` argument in some instances. This commit
resolves issue
[#2213](#2213) and improves
the accuracy and versatility of the linter for different types of Python
files.
* Fixed missing `security_policy` when updating SQL warehouse config
([#2409](#2409)). In this
release, we have added new methods
`GetWorkspaceWarehouseConfigResponseSecurityPolicy` and
`SetWorkspaceWarehouseConfigRequestSecurityPolicy` to improve handling
of SQL warehouse config security policy. We have introduced a new
variable `security_policy` to store the security policy value, which is
used when updating the SQL warehouse configuration, ensuring that the
required security policy is set and fixing the `InvalidParameterValue:
Endpoint security policy is required and must be one of NONE,
DATA_ACCESS_CONTROL, PASSTHROUGH` error. Additionally, when the
`enable_serverless_compute` error occurs, the new SQL warehouse data
access config is printed in the log, allowing users to manually
configure the uber principal in the UI. We have also updated the
`create_uber_principal` method to set the security policy correctly and
added parameterized tests to test the setting of the warehouse
configuration security policy. The `test_create_global_spn` method has
been updated to include the `security_policy` parameter in the
`create_global_spn` method call, and new test cases have been added to
verify the warehouse config's security policy is correctly updated.
These enhancements will help make the system more robust and
user-friendly.
* Fixed raise logs `ResourceDoesNotExists` when iterating the log paths
([#2382](#2382)). In this
commit, we have improved the handling of the `ResourceDoesNotExist`
exception when iterating through log paths in the open-source library.
Previously, the exception was not being properly raised or handled,
resulting in unreliable code behavior. To address this, we have added
unit tests in the `test_install.py` file that accurately reflect the
actual behavior when iterating log paths. We have also modified the test
to raise the `ResourceDoesNotExist` exception when the result is
iterated over, rather than when the method is called. Additionally, we
have introduced the `ResourceDoesNotExistIter` class to make it easier
to simulate the error during testing. These changes ensure that the code
can gracefully handle cases where the specified log path does not exist,
improving the overall reliability and robustness of the library.
Co-authored by Andrew Snare.
* Generate custom error during installation due to external metastore
connectivity issues
([#2425](#2425)). In this
release, we have added a new custom error `OperationFailed` to the
`InstallUcxError` enumeration in the `databricks/labs/ucx/install.py`
file. This change is accompanied by an exception handler that checks if
the error message contains a specific string related to AWS credentials,
indicating an issue with external metastore connectivity. If this
condition is met, a new `OperationFailed` error is raised with a custom
message, providing instructions for resolving the external metastore
connectivity issue and re-running the UCX installation. This enhancement
aims to provide a more user-friendly error message for users
encountering issues during UCX installation due to external metastore
connectivity. The functionality of the `_create_database` method has
been modified to include this new error handling mechanism, with no
alterations made to the existing functionality. Although the changes
have not been tested using unit tests, integration tests, or staging
environments, they have been manually tested on a workspace with
incorrect external metastore connectivity.
* Improve logging when waiting for workflows to complete
([#2364](#2364)). This pull
request enhances the logging and error handling of the databricks labs
ucx project, specifically when executing workflows. It corrects a bug
where `skip_job_wait` was not handled correctly, and now allows for a
specified timeout period instead of the default 20 minutes. Job logs are
replicated into the local logfile if a workflow times out. The existing
`databricks labs ucx ensure-assessment-run` and `databricks labs ucx
migrate-tables` commands have been updated, and unit and integration
tests have been added. These improvements provide more detailed and
informative logs, and ensure the quality of the code through added
tests. The behavior of the `wait_get_run_job_terminated_or_skipped`
method has been modified in the tests, and the `assign_metastore`
function has also been updated, but the specific changes are not
specified in the commit message.
* Lint dependencies consistently
([#2400](#2400)). In this
release, the `files.py` module in the
`databricks/labs/ucx/source_code/linters` package has been updated to
ensure consistent linting of jobs. Previously, the linting sequence was
not consistent and did not provide inherited context when linting jobs.
These issues have been resolved by modifying the `_lint_one` function to
build an inherited tree when linting files or notebooks, and by adding a
`FileLinter` object to determine which file/notebook linter to use for
linting. The `_lint_task` function has also been updated to yield a
`LocatedAdvice` object instead of a tuple, and takes an additional
argument, `linted_paths`, which is a set of paths that have been
previously linted. Additionally, the `_lint_notebook` and `_lint_file`
functions have been removed as their functionality is now encompassed by
the `_lint_one` function. These changes ensure consistent linting of
jobs and inherited context. Unit tests were run and passed. Co-authored
by Eric Vergnaud.
* Make Lakeview names unique
([#2354](#2354)). A change
has been implemented to guarantee the uniqueness of dataset names in
Lakeview dashboards, addressing issue
[#2345](#2345) where
non-unique names caused system errors. This change includes renaming the
`count` dataset to `fourtytwo` in the `datasets` field and updating the
`name` field for a widget query from `count` to 'counter'. These
internal adjustments streamline the Lakeview system's functionality
while ensuring consistent and unique dataset naming.
* Optimisation: when detecting if a file is a notebook only read the
start instead of the whole file
([#2390](#2390)). In this
release, we have optimized the detection of notebook files during
linting in a non-Workspace path in our open-source library. This has
been achieved by modifying the `is_a_notebook` function in the `base.py`
file to improve efficiency. The function now checks the start of the
file instead of loading the entire file, which is more
resource-intensive. If the content of the file is not available, it will
attempt to read the file header when opening the file, instead of
returning False if there's an error. The `magic_header` is used to
determine if the file is a notebook or not. If the content is available,
the function will check if it starts with the `magic_header` to
determine if a file is a notebook or not, without having to read the
entire file, which can be time-consuming and resource-intensive. This
change will improve the linting performance and reduce resource usage,
making the library more efficient and user-friendly for software
engineers.
* Retry dashboard install on DeadlineExceeded
([#2379](#2379)). In this
release, we have added the DeadlineExceeded exception to the @Retried
decorator in the _create_dashboard method, which is used to create a
lakeview dashboard from SQL queries in a folder. This modification to
the existing functionality is intended to improve the reliability of
dashboard installation by retrying the operation up to four minutes in
case of a DeadlineExceeded error. This change resolves issues
[#2376](#2376),
[#2377](#2377), and
[#2389](#2389), which were
related to dashboard installation timeouts. Software engineers will
benefit from this update as it will ensure successful installation of
dashboards, even in scenarios where timeouts were previously
encountered.
* Updated databricks-labs-lsql requirement from <0.8,>=0.5 to >=0.5,<0.9
([#2416](#2416)). In this
update, we have modified the version requirement for the
`databricks-labs-lsql` library to allow version 0.8, which includes bug
fixes and improvements in dashboard creation and deployment. We have
changed the version constraint from '<0.8,>=0.5' to '>=0.5,<0.9' to
accommodate the latest version while preventing future major version
upgrades. This change enhances the overall design of the system and
simplifies the code for managing dashboard deployment. Additionally, we
have introduced a new test that verifies the `deploy_dashboard` method
is no longer being used, utilizing the `deprecated_call` function from
pytest to ensure that calling the method raises a deprecation warning.
This ensures that the system remains maintainable and up-to-date with
the latest version of the `databricks-labs-lsql` library.
* Updated ext hms detection to include more conf attributes
([#2414](#2414)). In this
enhancement, the code for installing UCX has been updated to include
additional configuration attributes for detecting external Hive
Metastore (HMS). The previous implementation failed to recognize
specific attributes like "spark.hadoop.hive.metastore.uris". This update
introduces a new list of recognized prefixes for external HMS spark
attributes, namely "spark_conf.spark.sql.hive.metastore",
"spark_conf.spark.hadoop.hive.metastore",
"spark_conf.spark.hadoop.javax.jdo.option", and
"spark_conf.spark.databricks.hive.metastore". This change enables the
extraction of a broader set of configuration attributes during
installation, thereby improving the overall detection and configuration
of external HMS attributes.
* Updated sqlglot requirement from <25.11,>=25.5.0 to >=25.5.0,<25.12
([#2415](#2415)). In this
pull request, we have updated the `sqlglot` dependency to a new version
range, `>=25.5.0,<25.12`, to allow for the latest version of `sqlglot`
to be used while ensuring that the version does not exceed 25.12. This
change includes several breaking changes, new features, and bug fixes.
New features include support for ALTER VIEW AS SELECT, UNLOAD, and other
SQL commands, as well as improvements in performance. Bug fixes include
resolutions for issues related to OUTER/CROSS APPLY parsing,
GENERATE_TIMESTAMP_ARRAY, and other features. Additionally, there are
changes that improve the handling of various SQL dialects, including
BigQuery, DuckDB, Snowflake, Oracle, and ClickHouse. The changelog and
commits also reveal fixes for bugs and improvements in the library since
the last version used in the project. By updating the `sqlglot`
dependency to the new version range, the project can leverage the newly
added features, bug fixes, and performance improvements while ensuring
that the version does not exceed 25.12.
* Updated sqlglot requirement from <25.9,>=25.5.0 to >=25.5.0,<25.11
([#2403](#2403)). In this
release, we have updated the required version range for the `sqlglot`
library from '[25.5.0, 25.9)' to '[25.5.0, 25.11)'. This update allows
us to utilize newer versions of `sqlglot` while maintaining
compatibility with previous versions. The `sqlglot` team's latest
release, version 25.10.0, includes several breaking changes, new
features, bug fixes, and refactors. Notable changes include support for
STREAMING tables in Databricks, transpilation of Snowflake's
CONVERT_TIMEZONE function for DuckDB, support for
GENERATE_TIMESTAMP_ARRAY in BigQuery, and the ability to parse RENAME
TABLE as a Command in Teradata. Due to these updates, we strongly advise
conducting thorough testing before deploying to production, as these
changes may impact the functionality of the project.
* Use `load_table` instead of deprecated `is_view` in failing
integration test `test_mapping_skips_tables_databases`
([#2412](#2412)). In this
release, the `is_view` parameter of the `skip_table_or_view` method in
the `test_mapping_skips_tables_databases` integration test has been
deprecated and replaced with a more flexible `load_table` parameter.
This change allows for greater control and customization when specifying
how a table should be loaded during the test. The `load_table` parameter
is a callable that returns a `Table` object, which contains information
about the table's schema, name, object type, and table format. This
improvement removes the use of a deprecated parameter, enhancing the
maintainability of the test code.

Dependency updates:

* Updated sqlglot requirement from <25.9,>=25.5.0 to >=25.5.0,<25.11
([#2403](#2403)).
* Updated databricks-labs-lsql requirement from <0.8,>=0.5 to >=0.5,<0.9
([#2416](#2416)).
* Updated sqlglot requirement from <25.11,>=25.5.0 to >=25.5.0,<25.12
([#2415](#2415)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant