Complete support for pip install command #1853

JCZuurmond · 2024-06-06T11:50:45Z

Changes

Complete support for pip install command by passing all pip install arguments to install the libaries.

Linked issues

Resolves #1203

Functionality

modified existing command: databricks labs ucx lint-local-code
added a new workflow
modified existing workflow: experimental-workflow-linter

Tests

manually tested
added unit tests
added integration tests
Add integration test for pip install with --index-url

codecov · 2024-06-06T11:55:07Z

Codecov Report

Attention: Patch coverage is 93.33333% with 4 lines in your changes missing coverage. Please review.

Project coverage is 89.46%. Comparing base (64961fd) to head (b23516a).

Files	Patch %	Lines
...atabricks/labs/ucx/source_code/python_libraries.py	89.47%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1853      +/-   ##
==========================================
+ Coverage   89.43%   89.46%   +0.02%     
==========================================
  Files          95       95              
  Lines       12138    12166      +28     
  Branches     2127     2134       +7     
==========================================
+ Hits        10856    10884      +28     
  Misses        873      873              
  Partials      409      409

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

JCZuurmond · 2024-06-07T09:02:51Z

src/databricks/labs/ucx/source_code/graph.py

@@ -261,7 +265,9 @@ def __repr__(self):

 class LibraryResolver(abc.ABC):
    @abc.abstractmethod
-    def register_library(self, path_lookup: PathLookup, library: Path) -> list[DependencyProblem]:
+    def register_library(
+        self, path_lookup: PathLookup, *libraries: str, installation_arguments: list[str] | None = None


There are three changes in this API:

Multiple libraries can be installed at once. Motivation: with pip magic multiple installation can be installed at once. To remain as close as possible to the installation command defined as possible, thus minimizing the chance of mimicking the install incorrectly, we support installing multiple libraries at once using the installation arguments.

libraries are strings. Motivation: i) practical reason, when a library is defined as a relative path "./path/to/lib.whl" a conversion to Path loses the leading ./, which makes that we can not replace the relatives anymore when resolving them with the path lookup. ii) here, libraries are strings, e.g. when pip install pytest, only later they are maybe a path when resolving them with the path lookup

installation_arguments : Motivation: To pass on additional installation flags defined by the user

I think it's possible to make these changes while retaining API compatibility for callers:

def register_library(self, path_lookup: PathLookup, library: Path, *more_libraries: Path, installation_arguments: list[str] | None = None) -> list[DependencyProblem]: ...

This type signature ensures that at least one argument is provided.

Can you elaborate a bit more on why there's a problem with the leading ./ being stripped? I'm a little confused because anything without a leading / is considered relative, and if it's a problem why wasn't it a problem before?

With current changes, I am replacing the resolved path in the installation arguments which uses an equality to replace the argument

args = [maybe_library.as_posix() if arg == library else arg for arg in args]

Before the library path was "just" resolved, the path lookup can handle both the relative paths with and without the ./

I could split the *libraries into library and libraries to enforce at least defining one library. Though, it adds complexity to the code; the current code works; and allows users to unpack a possibly empty list of libraries. Though, it nudges users to define at least one library, which is can be a good thing. IMO only this change does not add much value

this gets too complicated - just pass list[str] down to pip and let it handle all the work. also shlex them and download any pathlib.Path-like files

JCZuurmond

Please have a look at the PR and my comments. This PR introduces some API changes about; I would like to have feedback if the motivation is sufficient to motivate these changes

src/databricks/labs/ucx/source_code/notebooks/cells.py

tests/integration/source_code/test_libraries.py

JCZuurmond · 2024-06-07T09:17:16Z

tests/unit/source_code/notebooks/test_cells.py

+    ],
+)
+def test_pip_cell_split(code, split):
+    assert PipCell._split(code) == split  # pylint: disable=protected-access


@nfx : We discussed this. Here the unit test for a protected method was useful to unity test many different pip install versions. It breaks the "no cheat" CI

github-actions · 2024-06-07T15:00:49Z

❌ 192/193 passed, 3 flaky, 1 failed, 24 skipped, 3h4m29s total

❌ test_build_notebook_dependency_graphs_installs_pytest_from_index_url: assert not [DependencyProblem(code='library-install-failed', message="'pip install pytest --index-url http://pypi.python.org/simp...n'", source_path=PosixPath('pip_install_pytest_with_index_url'), start_line=-1, start_col=-1, end_line=-1, end_col=-1)] (15.65s)

assert not [DependencyProblem(code='library-install-failed', message="'pip install pytest --index-url http://pypi.python.org/simp...n'", source_path=PosixPath('pip_install_pytest_with_index_url'), start_line=-1, start_col=-1, end_line=-1, end_col=-1)]
 +  where [DependencyProblem(code='library-install-failed', message="'pip install pytest --index-url http://pypi.python.org/simp...n'", source_path=PosixPath('pip_install_pytest_with_index_url'), start_line=-1, start_col=-1, end_line=-1, end_col=-1)] = MaybeGraph(graph=<DependencyGraph /home/runner/work/ucx/ucx/tests/unit/source_code/samples/pip_install_pytest_with_ind...'", source_path=PosixPath('pip_install_pytest_with_index_url'), start_line=-1, start_col=-1, end_line=-1, end_col=-1)]).problems
10:18 INFO [databricks.sdk] Using Databricks Metadata Service authentication
10:18 INFO [databricks.sdk] Using Databricks Metadata Service authentication
[gw6] linux -- Python 3.10.14 /home/runner/work/ucx/ucx/.venv/bin/python
10:18 INFO [databricks.sdk] Using Databricks Metadata Service authentication
10:18 INFO [databricks.sdk] Using Databricks Metadata Service authentication
10:18 INFO [databricks.labs.ucx.framework.utils] Invoking command: pip install pytest --index-url http://pypi.python.org/simple -t /tmp/ucx-n0o_ddat
10:18 DEBUG [databricks.labs.ucx.source_code.python_libraries] pip output:
Looking in indexes: http://pypi.python.org/simple

WARNING: The repository located at pypi.python.org is not a trusted or secure host and is being ignored. If this repository is available via HTTPS we recommend you use HTTPS instead, otherwise you may silence this warning and allow it anyway with '--trusted-host pypi.python.org'.
ERROR: Could not find a version that satisfies the requirement pytest (from versions: none)
ERROR: No matching distribution found for pytest
10:18 INFO [databricks.sdk] Using Databricks Metadata Service authentication
10:18 INFO [databricks.sdk] Using Databricks Metadata Service authentication
10:18 INFO [databricks.labs.ucx.framework.utils] Invoking command: pip install pytest --index-url http://pypi.python.org/simple -t /tmp/ucx-n0o_ddat
10:18 DEBUG [databricks.labs.ucx.source_code.python_libraries] pip output:
Looking in indexes: http://pypi.python.org/simple

WARNING: The repository located at pypi.python.org is not a trusted or secure host and is being ignored. If this repository is available via HTTPS we recommend you use HTTPS instead, otherwise you may silence this warning and allow it anyway with '--trusted-host pypi.python.org'.
ERROR: Could not find a version that satisfies the requirement pytest (from versions: none)
ERROR: No matching distribution found for pytest
10:18 INFO [databricks.labs.ucx.mixins.fixtures] Schema hive_metastore.ucx_sy3ny: https://DATABRICKS_HOST/explore/data/hive_metastore/ucx_sy3ny
10:18 DEBUG [databricks.labs.ucx.mixins.fixtures] added schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_sy3ny', metastore_id=None, name='ucx_sy3ny', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
10:18 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.WHSc/config.yml) doesn't exist.
10:18 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Inventory Database stored in hive_metastore
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Log level
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Number of threads
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Backup prefix
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: �[1mChoose how to map the workspace groups:�[0m
�[1m[0]�[0m �[36mMatch by Name�[0m
�[1m[1]�[0m �[36mApply a Prefix�[0m
�[1m[2]�[0m �[36mApply a Suffix�[0m
�[1m[3]�[0m �[36mMatch by External ID�[0m
�[1m[4]�[0m �[36mRegex Substitution�[0m
�[1m[5]�[0m �[36mRegex Matching�[0m
Enter a number between 0 and 5
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Enter a prefix to add to the workspace group name
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Comma-separated list of workspace group names to migrate. If not specified, we'll use all account-level groups with matching names to workspace-level groups
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Comma-separated list of databases to migrate. If not specified, we'll use all databases in hive_metastore
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Does given workspace 7342989205138882 block Internet access?
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Do you want to trigger assessment job after installation?
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Reconciliation threshold, in percentage
10:18 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: No HMS lineage collection init script exists, do you want to create one?
10:18 INFO [databricks.labs.ucx.install] Fetching installations...
10:18 INFO [databricks.labs.blueprint.parallel] finding WHSc installations 100/101, rps: 260.975/sec
10:18 INFO [databricks.labs.blueprint.parallel] finding WHSc installations 101/101, rps: 257.418/sec
10:18 INFO [databricks.labs.blueprint.parallel] Finished 'finding WHSc installations' tasks: 0% results available (0/101). Took 0:00:00.392992
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: �[1mSelect PRO or SERVERLESS SQL warehouse to run assessment dashboards on�[0m
�[1m[0]�[0m �[36m[Create new PRO SQL warehouse]�[0m
�[1m[1]�[0m �[36mDEFAULT Test Warehouse (TEST_DEFAULT_WAREHOUSE_ID, SERVERLESS, RUNNING)�[0m
Enter a number between 0 and 1
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Instance pool id to be set in cluster policy for all workflow clusters
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: We have identified one or more cluster policies set up for an external metastore. Would you like to set UCX to connect to the external metastore?
10:18 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Parallelism for migrating dbfs root delta tables with deep clone
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Min workers for auto-scale job cluster for table migration
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Max workers for auto-scale job cluster for table migration
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Open config file in the browser and continue installing? https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.WHSc/config.yml
10:18 DEBUG [tests.integration.conftest] Waiting for clusters to start...
10:18 INFO [databricks.labs.blueprint.parallel] ensure clusters running 3/3, rps: 83.100/sec
10:18 INFO [databricks.labs.blueprint.parallel] Finished 'ensure clusters running' tasks: 0% results available (0/3). Took 0:00:00.036319
10:18 DEBUG [tests.integration.conftest] Waiting for clusters to start...
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Do you want to uninstall ucx from the workspace too, this would remove ucx project folder, dashboards, queries and jobs
10:18 INFO [databricks.labs.ucx.install] Deleting UCX v0.26.1+6420240611101852 from https://DATABRICKS_HOST
10:18 INFO [databricks.labs.blueprint.tui] Asking prompt: Do you want to delete the inventory database ucx_sy3ny too?
10:18 INFO [databricks.labs.ucx.install] Deleting inventory database ucx_sy3ny
10:18 INFO [databricks.labs.lsql.deployment] deleting ucx_sy3ny database
10:18 INFO [databricks.labs.ucx.install] Deleting jobs
10:18 ERROR [databricks.labs.ucx.install] No jobs present or jobs already deleted
10:18 INFO [databricks.labs.ucx.install] Deleting cluster policy
10:18 INFO [databricks.labs.ucx.install] Deleting secret scope
10:18 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
10:18 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 workspace user fixtures
10:18 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 account group fixtures
10:18 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 workspace group fixtures
10:18 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 table fixtures
10:18 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 0 table fixtures
10:18 DEBUG [databricks.labs.ucx.mixins.fixtures] clearing 1 schema fixtures
10:18 DEBUG [databricks.labs.ucx.mixins.fixtures] removing schema fixture: SchemaInfo(browse_only=None, catalog_name='hive_metastore', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='hive_metastore.ucx_sy3ny', metastore_id=None, name='ucx_sy3ny', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
[gw6] linux -- Python 3.10.14 /home/runner/work/ucx/ucx/.venv/bin/python

Flaky tests:

🤪 test_migrate_table_in_mount (33.622s)
🤪 test_create_catalog_schema_with_principal_acl_CLOUD_ENV (1m19.579s)
🤪 test_repair_run_workflow_job (3m54.808s)

_{Running from acceptance #3867}

src/databricks/labs/ucx/source_code/notebooks/cells.py

nfx · 2024-06-07T16:35:16Z

src/databricks/labs/ucx/source_code/python_libraries.py

+                install_command = f"pip install {shlex.quote(library)} -t {venv}"
+                install_commands.append(install_command)
+        else:
+            # pip allows multiple target directories in its call, it uses the last one, thus the one added here
+            install_command = f"pip install {shlex.join(installation_arguments)} -t {venv}"


install_command = f"pip install {shlex.quote(library)} -t {venv}" and install_command = f"pip install {shlex.join(installation_arguments)} -t {venv}" are the same

They aren't always the same: i) installation arguments are empty for the pip libraries defined on a job and ii) the libraries don't contain the installation flags when those are defined in a %pip command

nfx · 2024-06-10T17:33:04Z

src/databricks/labs/ucx/source_code/graph.py

@@ -261,7 +265,9 @@ def __repr__(self):

 class LibraryResolver(abc.ABC):
    @abc.abstractmethod
-    def register_library(self, path_lookup: PathLookup, library: Path) -> list[DependencyProblem]:
+    def register_library(
+        self, path_lookup: PathLookup, *libraries: str, installation_arguments: list[str] | None = None


this gets too complicated - just pass list[str] down to pip and let it handle all the work. also shlex them and download any pathlib.Path-like files

src/databricks/labs/ucx/source_code/notebooks/cells.py

nfx

Lgtm

* Added `mlflow` to known packages ([#1895](#1895)). The `mlflow` package has been incorporated into the project and is now recognized as a known package. This integration includes modifications to the use of `mlflow` in the context of UC Shared Clusters, providing recommendations to modify or rewrite certain functionalities related to `sparkContext`, `_conf`, and `RDD` APIs. Additionally, the artifact storage system of `mlflow` in Databricks and DBFS has undergone changes. The `known.json` file has also been updated with several new packages, such as `alembic`, `aniso8601`, `cloudpickle`, `docker`, `entrypoints`, `flask`, `graphene`, `graphql-core`, `graphql-relay`, `gunicorn`, `html5lib`, `isort`, `jinja2`, `markdown`, `markupsafe`, `mccabe`, `opentelemetry-api`, `opentelemetry-sdk`, `opentelemetry-semantic-conventions`, `packaging`, `pyarrow`, `pyasn1`, `pygments`, `pyrsistent`, `python-dateutil`, `pytz`, `pyyaml`, `regex`, `requests`, and more. These packages are now acknowledged and incorporated into the project's functionality. * Added `tensorflow` to known packages ([#1897](#1897)). In this release, we are excited to announce the addition of the `tensorflow` package to our known packages list. Tensorflow is a popular open-source library for machine learning and artificial intelligence applications. This package includes several components such as `tensorflow`, `tensorboard`, `tensorboard-data-server`, and `tensorflow-io-gcs-filesystem`, which enable training, evaluation, and deployment of machine learning models, visualization of machine learning model metrics and logs, and access to Google Cloud Storage filesystems. Additionally, we have included other packages such as `gast`, `grpcio`, `h5py`, `keras`, `libclang`, `mdurl`, `namex`, `opt-einsum`, `optree`, `pygments`, `rich`, `rsa`, `termcolor`, `pyasn1_modules`, `sympy`, and `threadpoolctl`. These packages provide various functionalities required for different use cases, such as parsing Abstract Syntax Trees, efficient serial communication, handling HDF5 files, and managing threads. This release aims to enhance the functionality and capabilities of our platform by incorporating these powerful libraries and tools. * Added `torch` to known packages ([#1896](#1896)). In this release, the "known.json" file has been updated to include several new packages and their respective modules for a specific project or environment. These packages include "torch", "functorch", "mpmath", "networkx", "sympy", "isympy". The addition of these packages and modules ensures that they are recognized and available for use, preventing issues with missing dependencies or version conflicts. Furthermore, the `_analyze_dist_info` method in the `known.py` file has been improved to handle recursion errors during package analysis. A try-except block has been added to the loop that analyzes the distribution info folder, which logs the error and moves on to the next file if a `RecursionError` occurs. This enhancement increases the robustness of the package analysis process. * Added more known libraries ([#1894](#1894)). In this release, the `known` library has been enhanced with the addition of several new packages, bringing improved functionality and versatility to the software. Key additions include contourpy for drawing contours on 2D grids, cycler for creating cyclic iterators, docker-pycreds for managing Docker credentials, filelock for platform-independent file locking, fonttools for manipulating fonts, and frozendict for providing immutable dictionaries. Additional libraries like fsspec for accessing various file systems, gitdb and gitpython for working with git repositories, google-auth for Google authentication, html5lib for parsing and rendering HTML documents, and huggingface-hub for working with the Hugging Face model hub have been incorporated. Furthermore, the release includes idna, kiwisolver, lxml, matplotlib, mypy, peewee, protobuf, psutil, pyparsing, regex, requests, safetensors, sniffio, smmap, tokenizers, tomli, tqdm, transformers, types-pyyaml, types-requests, typing_extensions, tzdata, umap, unicorn, unidecode, urllib3, wandb, waterbear, wordcloud, xgboost, and yfinance for expanded capabilities. The zipp and zingg libraries have also been included for module name transformations and data mastering, respectively. Overall, these additions are expected to significantly enhance the software's functionality. * Added more value inference for `dbutils.notebook.run(...)` ([#1860](#1860)). In this release, the `dbutils.notebook.run(...)` functionality in `graph.py` has been significantly updated to enhance value inference. The change includes the introduction of new methods for handling `NotebookRunCall` and `SysPathChange` objects, as well as the refactoring of the `get_notebook_path` method into `get_notebook_paths`. This new method now returns a tuple of a boolean and a list of strings, indicating whether any nodes could not be resolved and providing a list of inferred paths. A new private method, `_get_notebook_paths`, has also been added to retrieve notebook paths from a list of nodes. Furthermore, the `load_dependency` method in `loaders.py` has been updated to detect the language of a notebook based on the file path, in addition to its content. The `Notebook` class now includes a new parameter, `SUPPORTED_EXTENSION_LANGUAGES`, which maps file extensions to their corresponding languages. In the `databricks.labs.ucx` project, more value inference has been added to the linter, including new methods and enhanced functionality for `dbutils.notebook.run(...)`. Several tests have been added or updated to demonstrate various scenarios and ensure the linter handles dynamic values appropriately. A new test file for the `NotebookLoader` class in the `databricks.labs.ucx.source_code.notebooks.loaders` module has been added, with a new class, `NotebookLoaderForTesting`, that overrides the `detect_language` method to make it a class method. This allows for more robust testing of the `NotebookLoader` class. Overall, these changes improve the accuracy and reliability of value inference for `dbutils.notebook.run(...)` and enhance the testing and usability of the related classes and methods. * Added nightly workflow to use industry solution accelerators for parser validation ([#1883](#1883)). A nightly workflow has been added to validate the parser using industry solution accelerators, which can be triggered locally with the `make solacc` command. This workflow involves a new Makefile target, 'solacc', which runs a Python script located at 'tests/integration/source_code/solacc.py'. The workflow is designed to run on the latest Ubuntu, installing Python 3.10 and hatch 1.9.4 using pip, and checking out the code with a fetch depth of 0. It runs on a daily basis at 7am using a cron schedule, and can also be triggered locally. The purpose of this workflow is to ensure parser compatibility with various industry solutions, improving overall software quality and robustness. * Complete support for pip install command ([#1853](#1853)). In this release, we've made significant enhancements to support the `pip install` command in our open-source library. The `register_library` method in the `DependencyResolver`, `NotebookResolver`, and `LocalFileResolver` classes has been modified to accept variable numbers of libraries instead of just one, allowing for more efficient dependency management. Additionally, the `resolve_import` method has been introduced in the `NotebookResolver` and `LocalFileResolver` classes for improved import resolution. Moreover, the `_split` static method has been implemented for better handling of pip command code and egg packages. The library now also supports the resolution of imports in notebooks and local files. These changes provide a solid foundation for full `pip install` command support, improving overall robustness and functionality. Furthermore, extensive updates to tests, including workflow linter and job dlt task linter modifications, ensure the reliability of the library when working with Jupyter notebooks and pip-installable libraries. * Infer simple f-string values when computing values during linting ([#1876](#1876)). This commit enhances the open-source library by adding support for inferring simple f-string values during linting, addressing issue [#1871](#1871) and progressing [#1205](#1205). The new functionality works for simple f-strings but currently does not support nested f-strings. It introduces the InferredValue class and updates the visit_call, visit_const, and _check_str_constant methods for better linter feedback. Additionally, it includes modifications to a unit test file and adjustments to error location in code. The commit also presents an example of simple f-string handling, emphasizing the limitations yet providing a solid foundation for future development. Co-authored by Eric Vergnaud. * Propagate widget parameters and data security mode to `CurrentSessionState` ([#1872](#1872)). In this release, the `spark_version_compatibility` function in `crawlers.py` has been refactored to `runtime_version_tuple`, returning a tuple of integers instead of a string. The function now handles custom runtimes and DLT, and raises a ValueError if the version components cannot be converted to integers. Additionally, the `CurrentSessionState` class has been updated to propagate named parameters from jobs and check for DBFS paths as both named and positional parameters. New attributes, including `spark_conf`, `named_parameters`, and `data_security_mode`, have been added to the class, all with default values of `None`. The `WorkflowTaskContainer` class has also been modified to include an additional `job` parameter in its constructor and new attributes for `named_parameters`, `spark_conf`, `runtime_version`, and `data_security_mode`. The `_register_cluster_info` method and `_lint_task` method in `WorkflowLinter` have also been updated to use the new `CurrentSessionState` attributes when linting a task. A new method `Job()` has been added to the `WorkflowTaskContainer` class, used in multiple unit tests to create a `Job` object and pass it as an argument to the `WorkflowTaskContainer` constructor. The tests cover various scenarios for library types, such as jar files, PyPI libraries, Python wheels, and requirements files, and ensure that the `WorkflowTaskContainer` object can extract the relevant information from a `Job` object and store it for later use. * Support inferred values when linting DBFS mounts ([#1868](#1868)). This commit adds value inference and enhances the consistency of advice messages in the context of linting Databricks File System (DBFS) mounts, addressing issue [#1205](#1205). It improves the precision of deprecated file system path calls and updates the handling of default DBFS references, making the code more robust and future-proof. The linter's behavior has been enhanced to detect DBFS paths in various formats, including string constants and variables. The test suite has been updated to include new cases and provide clearer deprecation warnings. This commit also refines the way advice is generated for deprecated file system path calls and renames `Advisory` to `Deprecation` in some places, providing more accurate and helpful feedback to developers. * Support inferred values when linting spark.sql ([#1870](#1870)). In this release, we have added support for inferring the values of table names when linting PySpark code, improving the accuracy and usefulness of the PySpark linter. This feature includes the ability to handle inferred values in Spark SQL code and updates to the test suite to reflect the updated linting behavior. The `QueryMatcher` class in `pyspark.py` has been updated to infer the value of the table name argument in a `Call` node, and an advisory message is generated if the value cannot be inferred. Additionally, the use of direct filesystem references, such as "s3://bucket/path", will be deprecated in favor of more dynamic and flexible querying. For example, the table "old.things" has been migrated to "brand.new.stuff" in the Unity Catalog. Furthermore, a loop has been introduced to demonstrate the ability to compute table names programmatically within SQL queries, enhancing the system's flexibility and adaptability. * Support inferred values when linting sys path ([#1866](#1866)). In this release, the library's linting system has been enhanced with added support for inferring values in the system path. The `DependencyGraph` class in `graph.py` has been updated to handle new node types, including `SysPathChange`, `NotebookRunCall`, `ImportSource`, and `UnresolvedPath`. The `UnresolvedPath` node is added for unresolved paths during linting, and new methods have been introduced in `conftest.py` for testing, such as `DependencyResolver`, `Whitelist`, `PythonLibraryResolver`, `NotebookResolver`, and `ImportFileResolver`. Additionally, the library now recognizes inferred values, including absolute paths added to the system path via `sys.path.append`. New tests have been added to ensure the correct behavior of the `DependencyResolver` class. This release also introduces a new file, `sys-path-with-fstring.py`, which demonstrates the use of Python's f-string syntax to append values to the system path, and a new method, `BaseImportResolver`, has been added to the `DependencyResolver` class to resolve imports more flexibly and robustly.

JCZuurmond force-pushed the feat/detect-usage-of-external-oss-and-private-libraries branch from 0ac873c to e9c5df7 Compare June 7, 2024 08:53

JCZuurmond self-assigned this Jun 7, 2024

JCZuurmond requested review from nfx, asnare and ericvergnaud June 7, 2024 08:54

JCZuurmond commented Jun 7, 2024

View reviewed changes

JCZuurmond marked this pull request as ready for review June 7, 2024 09:20

JCZuurmond requested a review from a team June 7, 2024 09:20

JCZuurmond had a problem deploying to account-admin June 7, 2024 09:20 — with GitHub Actions Error

JCZuurmond force-pushed the feat/detect-usage-of-external-oss-and-private-libraries branch from e9c5df7 to 46cb2bc Compare June 7, 2024 09:25

JCZuurmond had a problem deploying to account-admin June 7, 2024 09:25 — with GitHub Actions Failure

JCZuurmond mentioned this pull request Jun 7, 2024

[BUG]: Support library installs with a path that contains spaces #1859

Closed

1 task

JCZuurmond had a problem deploying to account-admin June 7, 2024 11:48 — with GitHub Actions Error

JCZuurmond had a problem deploying to account-admin June 7, 2024 12:03 — with GitHub Actions Failure

JCZuurmond had a problem deploying to account-admin June 7, 2024 14:15 — with GitHub Actions Failure

nfx requested changes Jun 7, 2024

View reviewed changes

JCZuurmond force-pushed the feat/detect-usage-of-external-oss-and-private-libraries branch from ca3e950 to 9be7b05 Compare June 10, 2024 07:31

JCZuurmond temporarily deployed to account-admin June 10, 2024 07:31 — with GitHub Actions Inactive

nfx requested changes Jun 10, 2024

View reviewed changes

JCZuurmond had a problem deploying to account-admin June 11, 2024 09:39 — with GitHub Actions Failure

JCZuurmond added 7 commits June 11, 2024 12:13

Fix name in the dunder string

0e32edd

Add optional installation parameters

5a53cc2

Use pip internals to parse pip command

7775e00

Catch AssertionError when importing setup from setuptools

3b03800

Fix format issues with installation parameters

0cd5965

Pass installation arguments to install pip

adfe602

Add installation arguments to expected calls

084e472

JCZuurmond added 21 commits June 11, 2024 12:13

Support installing multiple libraries at once

c41d963

Test installing multiple libraries with demo wheel and pytest

a359cc3

Quote with shlex to avoid injection problems

cbd80c5

Add test with pip install with spaces

d96848b

Add all pip install integration tests

eb6be26

Pass installation arguments if it has flags

72cee43

Format

b5bfb80

Update docs

534bac0

Check when missing libraries in installation command

7fdc395

Update docs

a81f0c6

Test unsupported pip command

157f2ef

Test for installation arguments

b6be598

Test for library missing in installation command

577f163

Fix typo

21c3cae

Refactor pip_resolver to python_library_resolver

1c8144d

Support installing multiple eggs

d079e5d

Fix typo

4830ae9

Add integration test for installing from index url

a60569b

Fix integration test

72e99cd

Remove using install commands

6ebf61b

Remove dependency on pip._internal

b23516a

JCZuurmond force-pushed the feat/detect-usage-of-external-oss-and-private-libraries branch from 6df5e1c to b23516a Compare June 11, 2024 10:13

JCZuurmond had a problem deploying to account-admin June 11, 2024 10:13 — with GitHub Actions Failure

JCZuurmond requested a review from nfx June 11, 2024 10:13

nfx approved these changes Jun 11, 2024

View reviewed changes

nfx merged commit a638a03 into main Jun 11, 2024
6 of 8 checks passed

nfx deleted the feat/detect-usage-of-external-oss-and-private-libraries branch June 11, 2024 15:06

nfx mentioned this pull request Jun 12, 2024

Release v0.27.0 #1898

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete support for pip install command #1853

Complete support for pip install command #1853

JCZuurmond commented Jun 6, 2024 •

edited

Loading

codecov bot commented Jun 6, 2024 •

edited

Loading

JCZuurmond Jun 7, 2024 •

edited

Loading

asnare Jun 10, 2024 •

edited

Loading

JCZuurmond Jun 10, 2024

JCZuurmond Jun 10, 2024

nfx Jun 10, 2024

JCZuurmond left a comment

JCZuurmond Jun 7, 2024

github-actions bot commented Jun 7, 2024 •

edited

Loading

nfx Jun 7, 2024

JCZuurmond Jun 10, 2024

nfx Jun 10, 2024

nfx left a comment

Complete support for pip install command #1853

Complete support for pip install command #1853

Conversation

JCZuurmond commented Jun 6, 2024 • edited Loading

Changes

Linked issues

Functionality

Tests

codecov bot commented Jun 6, 2024 • edited Loading

Codecov Report

JCZuurmond Jun 7, 2024 • edited Loading

Choose a reason for hiding this comment

asnare Jun 10, 2024 • edited Loading

Choose a reason for hiding this comment

JCZuurmond Jun 10, 2024

Choose a reason for hiding this comment

JCZuurmond Jun 10, 2024

Choose a reason for hiding this comment

nfx Jun 10, 2024

Choose a reason for hiding this comment

JCZuurmond left a comment

Choose a reason for hiding this comment

JCZuurmond Jun 7, 2024

Choose a reason for hiding this comment

github-actions bot commented Jun 7, 2024 • edited Loading

nfx Jun 7, 2024

Choose a reason for hiding this comment

JCZuurmond Jun 10, 2024

Choose a reason for hiding this comment

nfx Jun 10, 2024

Choose a reason for hiding this comment

nfx left a comment

Choose a reason for hiding this comment

JCZuurmond commented Jun 6, 2024 •

edited

Loading

codecov bot commented Jun 6, 2024 •

edited

Loading

JCZuurmond Jun 7, 2024 •

edited

Loading

asnare Jun 10, 2024 •

edited

Loading

github-actions bot commented Jun 7, 2024 •

edited

Loading