PyLint serves as a valuable tool for developers by performing various checks on code quality. It scrutinizes the length of lines, ensures conformity to coding standards regarding variable naming, validates the usage of imported modules, verifies the implementation of declared interfaces, identifies instances of duplicated code, and much more. This plugin extends PyLint with checks for common mistakes and issues in Python code specifically in Databricks Environment.
- PyLint Plugin for Databricks
- Installation as PyLint plugin
- Integration with Databricks CLI
- PyLint Ecosystem
- Why not (just) Ruff?
- Automated code analysis
- Project Support
You can install this project via pip
:
pip install databricks-labs-pylint
and then use it with pylint
:
pylint --load-plugins=databricks.labs.pylint.all <your-python-file>.py
You can also add databricks.labs.pylint.all
to load-plugins
configuration in your pylintrc
or pyproject.toml
file.
You can use this plugin with Databricks CLI to check individual notebooks or entire directories.
First, you need to install this plugin locally:
databricks labs install pylint-plugin
Then, you can call the nbcheck
command without any arguments to lint all Python notebooks in you home folder:
databricks labs pylint-plugin nbcheck
Or you can specify a --path
flag to lint a specific notebook or folder:
databricks labs pylint-plugin nbcheck --path /Users/[email protected]/PrepareData
More than 400k repositories use PyLint, and it is one of the most popular static code analysis tools in the Python ecosystem. This plugin allows you to work with PyLint in the same way you are used to, but with additional checks for Databricks-specific issues. It is also compatible with the following PyLint integrations:
- VSCode PyLint extension (MIT License)
- IntelliJ/PyCharm PyLint plugin (Apache License 2.0)
- Airflow Plugin (MIT License)
- GitHub Action (MIT License)
- Azure DevOps Task (MIT License)
- GitLab CodeClimate (GPLv3 License)
Even though Ruff is 10x+ faster than PyLint, it doesn't have a plugin system yet, nor does it have a feature parity with PyLint yet. Other projects use MyPy, Ruff, and PyLint together to achieve the most comprehensive code analysis. You can try using Ruff and just the checkers from this plugin in the same CI pipeline and pre-commit hook.
Every check has a code, that follows an existing convention:
{I,C,R,W,E,F}89{0-9}{0-9}
, where89
is the base ID for this plugin.{I,C,R,W,E,F}
mean forInfo
,Convention
,Refactor
,Warning
,Error
, andFatal
.
To use this checker, add databricks.labs.pylint.airflow
to load-plugins
configuration in your pylintrc
or pyproject.toml
file.
XXX cluster missing data_security_mode
required for Unity Catalog compatibility. Before you enable Unity Catalog, you must set the data_security_mode
to 'NONE', so that your existing jobs would keep the same behavior. Failure to do so may cause your jobs to fail with unexpected errors.
To disable this check on a specific line, add # pylint: disable=missing-data-security-mode
at the end of it.
XXX cluster has unsupported runtime: XXX. The runtime version is not supported by Unity Catalog. Please upgrade to a runtime greater than or equal to 11.3.
To disable this check on a specific line, add # pylint: disable=unsupported-runtime
at the end of it.
To use this checker, add databricks.labs.pylint.dbutils
to load-plugins
configuration in your pylintrc
or pyproject.toml
file.
Use Databricks SDK instead: w.dbfs.copy(XXX, XXX). Migrate all usage of dbutils to Databricks SDK. See the more detailed documentation at https://databricks-sdk-py.readthedocs.io/en/latest/workspace/files/dbfs.html
To disable this check on a specific line, add # pylint: disable=dbutils-fs-cp
at the end of it.
Use Databricks SDK instead: with w.dbfs.download(XXX) as f: f.read(). Migrate all usage of dbutils to Databricks SDK. See the more detailed documentation at https://databricks-sdk-py.readthedocs.io/en/latest/workspace/files/dbfs.html
To disable this check on a specific line, add # pylint: disable=dbutils-fs-head
at the end of it.
Use Databricks SDK instead: w.dbfs.list(XXX). Migrate all usage of dbutils to Databricks SDK. See the more detailed documentation at https://databricks-sdk-py.readthedocs.io/en/latest/workspace/files/dbfs.html
To disable this check on a specific line, add # pylint: disable=dbutils-fs-ls
at the end of it.
Mounts are not supported with Unity Catalog, switch to using Unity Catalog Volumes instead. Migrate all usage to Unity Catalog
To disable this check on a specific line, add # pylint: disable=dbutils-fs-mount
at the end of it.
Credentials utility is not supported with Unity Catalog. Migrate all usage to Unity Catalog
To disable this check on a specific line, add # pylint: disable=dbutils-credentials
at the end of it.
Use Databricks SDK instead: w.jobs.submit( tasks=[jobs.SubmitTask(existing_cluster_id=..., notebook_task=jobs.NotebookTask(notebook_path=XXX), task_key=...) ]).result(timeout=timedelta(minutes=XXX)). Migrate all usage of dbutils to Databricks SDK. See the more detailed documentation at https://databricks-sdk-py.readthedocs.io/en/latest/workspace/jobs/jobs.html
To disable this check on a specific line, add # pylint: disable=dbutils-notebook-run
at the end of it.
Use Databricks SDK instead: from databricks.sdk import WorkspaceClient(); w = WorkspaceClient(). Do not hardcode secrets in code, use Databricks SDK instead, which natively authenticates in Databricks Notebooks. See more at https://databricks-sdk-py.readthedocs.io/en/latest/authentication.html
To disable this check on a specific line, add # pylint: disable=pat-token-leaked
at the end of it.
Do not use internal APIs, rewrite using Databricks SDK: XXX. Do not use internal APIs. Use Databricks SDK for Python: https://databricks-sdk-py.readthedocs.io/en/latest/index.html
To disable this check on a specific line, add # pylint: disable=internal-api
at the end of it.
To use this checker, add databricks.labs.pylint.legacy
to load-plugins
configuration in your pylintrc
or pyproject.toml
file.
Don't use databricks_cli, use databricks.sdk instead: pip install databricks-sdk. Migrate all usage of Legacy CLI to Databricks SDK. See the more detailed documentation at https://databricks-sdk-py.readthedocs.io/en/latest/index.html
To disable this check on a specific line, add # pylint: disable=legacy-cli
at the end of it.
Incompatible with Unity Catalog: XXX. Migrate all usage to Databricks Unity Catalog. Use https://github.com/databrickslabs/ucx for more details
To disable this check on a specific line, add # pylint: disable=incompatible-with-uc
at the end of it.
To use this checker, add databricks.labs.pylint.notebooks
to load-plugins
configuration in your pylintrc
or pyproject.toml
file.
Notebooks should not have more than 75 cells. Otherwise, it's hard to maintain and understand the notebook for other people and the future you
To disable this check on a specific line, add # pylint: disable=notebooks-too-many-cells
at the end of it.
Using %run is not allowed. Use functions instead of %run to avoid side effects and make the code more testable. If you need to share code between notebooks, consider creating a library. If still need to call another code as a separate job, use Databricks SDK for Python: https://databricks-sdk-py.readthedocs.io/en/latest/index.html
To disable this check on a specific line, add # pylint: disable=notebooks-percent-run
at the end of it.
To use this checker, add databricks.labs.pylint.spark
to load-plugins
configuration in your pylintrc
or pyproject.toml
file.
Using spark outside the function is leading to untestable code. Do not use global spark object, pass it as an argument to the function instead, so that the function becomes testable in a CI/CD pipelines.
To disable this check on a specific line, add # pylint: disable=spark-outside-function
at the end of it.
Rewrite to display in a notebook: display(XXX). Use display() instead of show() to visualize the data in a notebook.
To disable this check on a specific line, add # pylint: disable=use-display-instead-of-show
at the end of it.
Function XXX is missing a 'spark' argument. Function refers to a global spark variable, which may not always be available. Pass the spark object as an argument to the function instead, so that the function becomes testable in a CI/CD pipelines.
To disable this check on a specific line, add # pylint: disable=no-spark-argument-in-function
at the end of it.
To use this checker, add databricks.labs.pylint.readability
to load-plugins
configuration in your pylintrc
or pyproject.toml
file.
List comprehension spans multiple lines, rewrite as for loop. List comprehensions in Python are typically used to create new lists by iterating over an existing iterable in a concise, one-line syntax. However, when a list comprehension becomes too complex or spans multiple lines, it may lose its readability and clarity, which are key advantages of Python's syntax.
To disable this check on a specific line, add # pylint: disable=rewrite-as-for-loop
at the end of it.
To use this checker, add databricks.labs.pylint.mocking
to load-plugins
configuration in your pylintrc
or pyproject.toml
file.
Obscure implicit test dependency with mock.patch(XXX). Rewrite to inject dependencies through constructor.. Using patch
to mock dependencies in unit tests can introduce implicit
dependencies within a class, making it unclear to other developers. Constructor arguments, on the other hand,
explicitly declare dependencies, enhancing code readability and maintainability. However, reliance on patch
for testing may lead to issues during refactoring, as updates to underlying implementations would necessitate
changes across multiple unrelated unit tests. Moreover, the use of hard-coded strings in patch
can obscure
which unit tests require modification, as they lack strongly typed references. This coupling of the class
under test to concrete classes signifies a code smell, and such code is not easily portable to statically typed
languages where monkey patching isn't feasible without significant effort. In essence, extensive patching of
external clients suggests a need for refactoring, with experienced engineers recognizing the potential for
dependency inversion in such scenarios.
To address this issue, refactor the code to inject dependencies through the constructor. This approach explicitly declares dependencies, enhancing code readability and maintainability. Moreover, it allows for dependency inversion, enabling the use of interfaces to decouple the class under test from concrete classes. This decoupling facilitates unit testing, as it allows for the substitution of mock objects for concrete implementations, ensuring that the class under test behaves as expected. By following this approach, you can create more robust and maintainable unit tests, improving the overall quality of your codebase.
Use require-explicit-dependency
option to specify the package names that contain code for your project.
To disable this check on a specific line, add # pylint: disable=explicit-dependency-required
at the end of it.
Obscure implicit test dependency with MagicMock(). Rewrite with create_autospec(ConcreteType).. Using MagicMock
to mock dependencies in unit tests can introduce implicit dependencies
within a class, making it unclear to other developers. create_autospec(ConcreteType) is a better alternative, as it
automatically creates a mock object with the same attributes and methods as the concrete class. This
approach ensures that the mock object behaves like the concrete class, allowing for more robust and
maintainable unit tests. Moreover, reliance on MagicMock
for testing leads to issues during refactoring,
as updates to underlying implementations would necessitate changes across multiple unrelated unit tests.
To disable this check on a specific line, add # pylint: disable=obscure-mock
at the end of it.
Mock not assigned to a variable: XXX. Every mocked object should be assigned to a variable to allow for assertions.
To disable this check on a specific line, add # pylint: disable=mock-no-assign
at the end of it.
Missing usage of mock for XXX. Usually this check means a hidden bug, where object is mocked, but we don't check if it was used correctly. Every mock should have at least one assertion, return value, or side effect specified.
To disable this check on a specific line, add # pylint: disable=mock-no-usage
at the end of it.
To use this checker, add databricks.labs.pylint.eradicate
to load-plugins
configuration in your pylintrc
or pyproject.toml
file.
Remove commented out code: XXX. Version control helps with keeping track of code changes. There is no need to keep commented out code in the codebase. Remove it to keep the codebase clean.
To disable this check on a specific line, add # pylint: disable=dead-code
at the end of it.
To test this plugin in isolation, you can use the following command:
pylint --load-plugins=databricks.labs.pylint.all --disable=all --enable=missing-data-security-mode,unsupported-runtime,dbutils-fs-cp,dbutils-fs-head,dbutils-fs-ls,dbutils-fs-mount,dbutils-credentials,dbutils-notebook-run,pat-token-leaked,internal-api,legacy-cli,incompatible-with-uc,notebooks-too-many-cells,notebooks-percent-run,spark-outside-function,use-display-instead-of-show,no-spark-argument-in-function,rewrite-as-for-loop,explicit-dependency-required,obscure-mock,mock-no-assign,mock-no-usage,dead-code .
Please note that this project is provided for your exploration only and is not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS, and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of this project.
Any issues discovered through the use of this project should be filed as GitHub Issues on this repository. They will be reviewed as time permits, but no formal SLAs for support exist.