Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any chance of discovering package dependencies without building wheels? #40

Closed
petergardfjall opened this issue Aug 21, 2020 · 12 comments · Fixed by #113
Closed
Labels
question Further information is requested

Comments

@petergardfjall
Copy link

petergardfjall commented Aug 21, 2020

Description

I'm trying to understand python dependency management in general, and pipgrep's approach specifically. In particular, I'd like to understand if it is possible, at all, to determine dependencies for a package without ever having to build wheels.

For example, the result of running pipgrip to see the dependency tree of a package containing non-Python code differs depending on the version I want to analyze. For example,

pipgrip  --tree numpy==1.9.2

fails (clang compilation fails with a massive stacktrace), whereas

pipgrip --tree numpy==1.19.1

succeeds, which appears to be caused by the latter case (1.19.1) having a readily-available wheel in PyPi that matches my Python interpreter (version, abi, platform) [1] whereas the former (1.9.2) does not have a pre-built bdist wheel that matches my Python interpreter and, therefore, needs to be built from the sdist (which requires a ton of packages to be installed on my computer).

Now, my question is if it's strictly necessary for pipgrip to build wheels to determine dependencies?
Would it be possible to determine the tree of dependencies by only looking at sdist distributions (and hence, never having to build wheels)? I'm probably missing something in my understanding of python's dependency management so feel free to enlighten me! :)

[1] {'implementation_name': 'cpython', 'implementation_version': '3.8.2', 'os_name': 'posix', 'platform_machine': 'x86_64', 'platform_release': '5.4.0-42-generic', 'platform_system': 'Linux', 'platform_version': '#46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020', 'python_full_version': '3.8.2', 'platform_python_implementation': 'CPython', 'python_version': '3.8', 'sys_platform': 'linux'}

Use case / motivation / context

Related Issues

@petergardfjall petergardfjall added the question Further information is requested label Aug 21, 2020
@ddelange
Copy link
Owner

ddelange commented Aug 21, 2020

Hi @petergardfjall,

Interesting question which has also interested me for a while.

On using sdist:

I think in the end it all depends on how the developers set up their packages. If they have a standardized setup.py, it would be relatively easy (preferably using a requirements.txt). But there are a lot of difficult cases in the wild that will make custom sdist introspection very error prone. I also seem to remember older versions matplotlib, where the final dependencies are only known for your env/system after actually running e.g. setup.py develop (or building the wheel).

As the potential speedup of avoiding wheel building is of course huge, a first step could be to use the pypi api as mentioned here: #38 (comment)

I think the pypi api will give enough info to replace the whole return value of this function (and save two calls to pip including wheel building):

def discover_dependencies_and_versions(

However, I think it'll be hard to get rid of wheels in pipgrip completely, as pypi api will not always have the info (or potentially missing info for weird packages!) and support for custom index-urls will be quite a hassle to get to play nice. And then there's of course the env/implementation specifics that pip works hard to support (and indirectly propagate into pipgrip), so it would be kinda cutting corners I guess. Example: running pipgrip 'tensorflow<2' on py3.8.

On the other hand of course, there are packages that require non-python libs and will fail to build (see issue I linked above). Pypi api could be used as (imperfect) fallback in such cases.

For the numpy error you're facing, try using gcc instead of clang. It's my standard c++ compiler and I think often avoids such problems :) (e.g. #36)

Hope that helps!

@petergardfjall
Copy link
Author

@ddelange thanks for sharing your thoughts.

From a very unscientific investigation, I believe that the PyPi API approach (looking for requires_dist) might not give good results to any greater extent, as it seems to me like the PyPi metadata in quite a lot of cases does not include any dependency information whatsoever.

I was thinking that one could perhaps use pip download to just download the sdist archives and then inspect them to build up the dependency tree from the setup.py files (and their install_requires). One thing that appealed to me was that using such an approach, it might even be possible to examine dependencies for a different python interpreter/OS (--python-version, --abi, --platform, --implementation), however I quickly realized that pip download in some cases actually builds native code, since it needs to run setup.py in order to extract dependency metadata [1].

As mentioned here pypa/pip#1884 (comment):

It's an unfortunate fact of the Python packaging ecosystem that anything
related to packaging always involves arbitrary code execution (referring to
setup.py).

So I seem to have realized that a "build-free" dependency resolution procedure appears a lot more difficult to achieve than I first, naively, thought.

I get the feeling that we won't get further on this topic. Build-free dependency resolution appears to be a dead-end as far as I can see, so I'm gonna close the issue.

[1] https://discuss.python.org/t/pip-download-just-the-source-packages-no-building-no-metadata-etc/4651/3

@ddelange
Copy link
Owner

Another interesting discussion for future reference: pypi/warehouse#8254

(found via https://github.com/uranusjr/warehouse-filebrowser)

@ddelange
Copy link
Owner

It seems there is movement on .whl.METADATA potentially becoming available on PyPI :)

pypi/warehouse#474 (comment)

@abitrolly
Copy link

Are there any new on the front? https://www.python.org/dev/peps/pep-0643/ is Accepted. Does that help in any way?

@ddelange
Copy link
Owner

ddelange commented Mar 9, 2021

Since PyPA's funding period (mainly used to get resolvelib into production) has ended, I don't think there is currently a timeline available for this big overhaul. But since they're committed to PEP compliance, it will come sooner or later :)

Until then, it probably even makes sense to reopen this issue in the hope that the overhaul can consequently be used to speed up this lib

@ddelange ddelange reopened this Mar 9, 2021
@ddelange ddelange pinned this issue Mar 9, 2021
@ddelange
Copy link
Owner

Update: PEP-658 draft has been merged and implementation details are being discussed ref pypi/warehouse#8254 (comment)

@abitrolly
Copy link

abitrolly commented Sep 3, 2021

I've submitted pypi/warehouse#9972 so hopefully it will be possible to just download METADATA file from the wheel instead of the complete wheel.

But if there is no wheel, then pipgrip will still try to build it,. So no help here. But for solving that problem it would be very useful to get some user friendly output about the problem. In a form of dashboard or table. That could also be used in tests or CI/CD pipelines. First there could be several modes.

  • offline (do not even try to fetch anything)
  • meta (only fetch wheel metadata, do not try to build anything)
  • wheel (fetch metadata and wheels, but not anything else and don't build anything)
  • maxgrip (fetch, build, do whatever it takes to)

Then some real-time representations for errors.

numpy  | 1.9.2 | + | W  |
cython | 3.8.2 |   |    | *pending*
tflite | 2.0.2 | E |    | no wheel for platform:linux_x86_64

+ - means package info successfully parsed
E - error, parse description

W - parsed from the wheel
WM - parsed from downloaded wheel metadata
BW- built wheel and parsed

The format can be expressed in CSV. What for? To share the data on which dependency trees have problems resolving with the explanation why.

@ddelange
Copy link
Owner

ddelange commented Sep 7, 2021

Let's hope these PEP's get adopted by PyPA soon! It will not only be a huge resolving speedup for the users of pip, it will theoretically allow pipgrip to resolve much faster too.

However, unless pip exposes a new command to fetch this metadata via CLI, to exploit the METADATA availability, pipgrip would have to move away from using pip as sole gateway to the internet. For now, I'm hoping this can be avoided and pip will expose this metadata download functionality:

Mainly, because implementing the logic in-house to talk to warehouse directly and correctly is a major complexity increase and development effort towards for instance

  • Maintaining support for --index-url and --extra-index-url (now simply propagated to pip)
  • Handling platform/processor specifics identically to how pip will implement handling them (now mostly left to pip)
  • Respecting other pip-related configuration files and environment variables set by the user (now left to pip to respect)

Luckily in the best case there's no need to think about all that and we can vendor pip's upcoming metadata logic. I hope this does not extinguish your enthusiasm! Let's revisit once there is movement on PyPA's side 💥

@ddelange
Copy link
Owner

ddelange commented May 22, 2023

Update:

now let's see if pipgrip can leverage it with the current pip api :)

@ddelange
Copy link
Owner

ddelange commented May 22, 2023

It looks like there is one blocker left to avoid downloading wheels: pypa/pip#11512 (comment)

When there are no compatible wheels available, the corresponding metadata files won't help either, and so pip will be forced to attempt a build from source. So strictly speaking, this ticket will not be fixed even after pipgrip leverages PEP 658. With the adoption, however, I think most has been said and done on this ticket and so I will close it with a PR soon 👍

@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Released 0.10.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants