Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extras not added as dependencies for mixed-case packages #720

Closed
mattoberle opened this issue Jun 6, 2022 · 1 comment
Closed

Extras not added as dependencies for mixed-case packages #720

mattoberle opened this issue Jun 6, 2022 · 1 comment
Assignees

Comments

@mattoberle
Copy link
Contributor

mattoberle commented Jun 6, 2022

🐞 bug report

Affected Rule

pip_parse

Is this a regression?

Not to my knowledge.

Description

  • PyPI distributions are case-insensitive and treat hyphens and underscores as equivalent (per PEP 426).
  • pip-compile, the recommended tool for building requirements_lock.txt, always converts package names to lowercase.
  • The whl.name value in bazel.extract_wheel derives from metadata in the package.

Given the above, dependencies are not linked in certain scenarios.

whl.name  # SQLAlchemy
extras  # {"sqlalchemy": {'postgresql_psycopg2binary'}}

extras_requested = extras[whl.name] if whl.name in extras else set()
whl_deps = sorted(whl.dependencies(extras_requested))

extras_requested  # set()

We need to apply some degree of sanitization to the keys of the extras dict and the lookup value to avoid the problem.
Or have extras be a dict class with a custom __getitem__ and __contains__ (not sure what would have the smaller impact).
I've applied a simple .lower() to whl.name as a patch in our WORKSPACE.
That works for us because our requirements_lock.txt is all lowercase, but that isn't a universal solution.

🔬 Minimal Reproduction

Running bazel test :test will fail with:

exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //:test
-----------------------------------------------------------------------------
F.
======================================================================
FAIL: test_import_psycopg2 (__main__.ExampleTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "bazel-out/k8-fastbuild/bin/test.runfiles/example/test.py", line 13, in test_import_psycopg2
    import psycopg2
ModuleNotFoundError: No module named 'psycopg2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "bazel-out/k8-fastbuild/bin/test.runfiles/example/test.py", line 15, in test_import_psycopg2
    self.fail("Package 'psycopg2' not found.")
AssertionError: Package 'psycopg2' not found.

----------------------------------------------------------------------
Ran 2 tests in 0.412s

FAILED (failures=1)

Running bazel query 'kind(py_library, deps(@pip_sqlalchemy//:pkg)) will output:

@pip_greenlet//:pkg
@pip_sqlalchemy//:pkg

If you switch to the patched branch the test will pass and the query will output:

@pip_greenlet//:pkg
@pip_psycopg2_binary//:pkg
@pip_sqlalchemy//:pkg

🌍 Your Environment

Operating System:

  
Ubuntu 20.04.4 LTS
  

Output of bazel version:

  
5.1.1
  

Rules_python version:

  
0.8.1
  

Anything else relevant?

I'm happy to open a corresponding PR for this issue.

mattoberle added a commit to mattoberle/rules_python that referenced this issue Jun 6, 2022
This commit addresses issue bazelbuild#720.

[PEP 426](https://peps.python.org/pep-0426/#name) states that distribution
names are case-insensitive and "-" is interchangeable with "_".

The `pip-compile` command creates a lockfile where all package names are lowercase.
The tool may also modify interchangeable characters.

The following examples are all valid `requirements.txt` or `requirements_lock.txt` entries:

```
SQLAlchemy[postgresql_psycopg2binary]==1.4.36
sqlalchemy[postgresql_psycopg2binary]==1.4.36
sentry_sdk[flask]==1.5.8
sentry-sdk[flask]==1.5.8
```

A distribution's `METADATA` file contains the stylization chosen by the publisher.
By applying a "sanitise" function when building the `extras` dict and when performing lookups
we can eliminate this difference as a concern.
mattoberle added a commit to mattoberle/rules_python that referenced this issue Jun 6, 2022
This commit addresses issue bazelbuild#720.

[PEP 426](https://peps.python.org/pep-0426/#name) states that distribution
names are case-insensitive and "-" is interchangeable with "_".

The `pip-compile` command creates a lockfile where all package names are lowercase.
The tool may also modify interchangeable characters.

The following examples are all valid `requirements.txt` or `requirements_lock.txt` entries:

```
SQLAlchemy[postgresql_psycopg2binary]==1.4.36
sqlalchemy[postgresql_psycopg2binary]==1.4.36
sentry_sdk[flask]==1.5.8
sentry-sdk[flask]==1.5.8
```

A distribution's `METADATA` file contains the stylization chosen by the publisher.
By applying a "sanitise" function when building the `extras` dict and when performing lookups
we can eliminate this difference as a concern.
@thundergolfer thundergolfer self-assigned this Jun 14, 2022
alexeagle added a commit that referenced this issue Jun 21, 2022
* Use PEP 426 rules when setting deps from extras

This commit addresses issue #720.

[PEP 426](https://peps.python.org/pep-0426/#name) states that distribution
names are case-insensitive and "-" is interchangeable with "_".

The `pip-compile` command creates a lockfile where all package names are lowercase.
The tool may also modify interchangeable characters.

The following examples are all valid `requirements.txt` or `requirements_lock.txt` entries:

```
SQLAlchemy[postgresql_psycopg2binary]==1.4.36
sqlalchemy[postgresql_psycopg2binary]==1.4.36
sentry_sdk[flask]==1.5.8
sentry-sdk[flask]==1.5.8
```

A distribution's `METADATA` file contains the stylization chosen by the publisher.
By applying a "sanitise" function when building the `extras` dict and when performing lookups
we can eliminate this difference as a concern.

* Use PEP 503 rules when sanitising extras

* Normalize distribution name with pkg_resources

`pypa/installer` is used to parse Wheel metadata, but does not currently
provide a method for normalizing distribution names:

- pypa/installer#97

`pypa/pkg_resources` provides `Requirement.parse` which returns an instance
of `Requirement` where `.key` is the canonical distribution name per PEP 503.

The `Requirement` class can also parse `extras`, but it returns a normalized
form that I believe could break the installation of the extras.

* Use Requirement.parse to populate extra reqs

* Revert "Use Requirement.parse to populate extra reqs"

This reverts commit f0faa97.

* Test for distribution name normalization in extras

* Replace pkg_resources with packaging.utils

This replaces `pkg_resources.Requirement.parse` with
`packaging.utils.canonicalize_name`. Doing this pulls in a vendored
requirement from `pip`, which may be undesirable.

The code we want is just:

```
re.sub(r"[-_.]+", "-", name).lower()
```

This commit also leaves a reference to `pkg_resources` in `wheel.py` which
does not canonicalize the name.

Co-authored-by: Jonathon Belotti <[email protected]>
Co-authored-by: Alex Eagle <[email protected]>
@thundergolfer
Copy link

Closing this as fixed in #724

mattem pushed a commit to mattem/rules_python that referenced this issue Jul 7, 2022
* Use PEP 426 rules when setting deps from extras

This commit addresses issue bazelbuild#720.

[PEP 426](https://peps.python.org/pep-0426/#name) states that distribution
names are case-insensitive and "-" is interchangeable with "_".

The `pip-compile` command creates a lockfile where all package names are lowercase.
The tool may also modify interchangeable characters.

The following examples are all valid `requirements.txt` or `requirements_lock.txt` entries:

```
SQLAlchemy[postgresql_psycopg2binary]==1.4.36
sqlalchemy[postgresql_psycopg2binary]==1.4.36
sentry_sdk[flask]==1.5.8
sentry-sdk[flask]==1.5.8
```

A distribution's `METADATA` file contains the stylization chosen by the publisher.
By applying a "sanitise" function when building the `extras` dict and when performing lookups
we can eliminate this difference as a concern.

* Use PEP 503 rules when sanitising extras

* Normalize distribution name with pkg_resources

`pypa/installer` is used to parse Wheel metadata, but does not currently
provide a method for normalizing distribution names:

- pypa/installer#97

`pypa/pkg_resources` provides `Requirement.parse` which returns an instance
of `Requirement` where `.key` is the canonical distribution name per PEP 503.

The `Requirement` class can also parse `extras`, but it returns a normalized
form that I believe could break the installation of the extras.

* Use Requirement.parse to populate extra reqs

* Revert "Use Requirement.parse to populate extra reqs"

This reverts commit f0faa97.

* Test for distribution name normalization in extras

* Replace pkg_resources with packaging.utils

This replaces `pkg_resources.Requirement.parse` with
`packaging.utils.canonicalize_name`. Doing this pulls in a vendored
requirement from `pip`, which may be undesirable.

The code we want is just:

```
re.sub(r"[-_.]+", "-", name).lower()
```

This commit also leaves a reference to `pkg_resources` in `wheel.py` which
does not canonicalize the name.

Co-authored-by: Jonathon Belotti <[email protected]>
Co-authored-by: Alex Eagle <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants