Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pip install --dry-run shouldn't download full wheels when metadata file available #12603

Open
1 task done
notatallshaw opened this issue Mar 28, 2024 · 18 comments · May be fixed by #12186
Open
1 task done

Pip install --dry-run shouldn't download full wheels when metadata file available #12603

notatallshaw opened this issue Mar 28, 2024 · 18 comments · May be fixed by #12186
Labels
C: install report pip install --report type: enhancement Improvements to functionality type: performance Commands take too long to run

Comments

@notatallshaw
Copy link
Member

Description

When running pip install --dry-run {package} pip downloads the metadata file and then the full wheel

Expected behavior

Dry run installs don't need to download the full wheels

pip version

24.0

Python version

3.11

OS

Linux

How to Reproduce

pip install --dry-run kaleido==0.2.1

Output

$ pip install --dry-run kaleido==0.2.1
Collecting kaleido==0.2.1
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl.metadata (15 kB)
Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.9/79.9 MB 34.6 MB/s eta 0:00:00
Would install kaleido-0.2.1

Code of Conduct

@notatallshaw notatallshaw added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Mar 28, 2024
@pfmoore
Copy link
Member

pfmoore commented Mar 28, 2024

PRs welcome. However, one potential upside of downloading the wheels is that if the dry run is followed by an actual install, the install itself will be faster as everything needed will be cached. So it's not immediately obvious that this change would be beneficial overall.

@notatallshaw
Copy link
Member Author

notatallshaw commented Mar 28, 2024

PRs welcome. However, one potential upside of downloading the wheels is that if the dry run is followed by an actual install, the install itself will be faster as everything needed will be cached. So it's not immediately obvious that this change would be beneficial overall.

I think there are two main use cases for dry run, one where you're validating what would install and then installing, and one where you're collecting the output of dry run as part of a larger environment management process, in this case you may not be installing locally, you could be preparing files for a docker build, or many other things.

I think you're thinking of the first case. Even in that case you may decide that the versions are wrong and you want to update the requirements, you may need to do this several times. If the wheels are very large, it's a lot of wasted download and time as you iterate finding the right requirements.

@rootsmusic
Copy link

rootsmusic commented Apr 11, 2024

I've inserted the --dry-run option, but it seems to be ignored by pip. Is that unexpected behavior related to this issue?

@notatallshaw
Copy link
Member Author

notatallshaw commented Apr 11, 2024

I've inserted the --dry-run option, but it seems to be ignored by pip. Is that unexpected behavior related to this issue?

Dry run works fine for me, in the sense that it doesn't install any packages, if you have an issue that appears to be a bug create a new issue with steps to reproduce, including your environment such as your pip version etc.

@rootsmusic
Copy link

Dry run works fine for me

@notatallshaw, you're right. I mistakenly assumed that it wasn't, because the output says "Downloading" (which I quickly interrupted). At the end, the output does say "would install" the downloaded package.

@notatallshaw
Copy link
Member Author

notatallshaw commented Apr 13, 2024

I started working on a PR and I think it's going to be a smaller change once #12300 lands because there's some specific legacy version and specifier warnings which assume the full wheel file is there. So I'm going to wait until that is no longer a concern rather than trying to fix that code.

Although this is going to be bigger than a 10 line change, because InstallRequirement needs to be changed so it can use the metadata file if it is there, instead of the wheel file. This seems like a positive change though, should be less IO.

@pradyunsg pradyunsg removed the type: bug A confirmed bug or unintended behavior label Apr 14, 2024
@pradyunsg
Copy link
Member

x-ref #12186

@notatallshaw
Copy link
Member Author

That PR seems effectively dead, hopefully I can touch this code in a much simpler manner.

@ichard26 ichard26 added type: enhancement Improvements to functionality C: install report pip install --report type: performance Commands take too long to run and removed S: needs triage Issues/PRs that need to be triaged labels Apr 19, 2024
@sfc-gh-nsharma
Copy link

sfc-gh-nsharma commented May 31, 2024

where you're collecting the output of dry run as part of a larger environment management process

We have this use case as well. We use --dry-run with --report to figure out dependencies. The wheels downloads slow down the process significantly.

@pelson
Copy link
Contributor

pelson commented Jun 22, 2024

@notatallshaw - just checking how are you getting along with the PR? I was surprised when I discovered that dry-run does the full download, and personally will be very happy when dry-run is simply using repository metadata to figure out what it would install.

@notatallshaw
Copy link
Member Author

Not had time to work on it, might revisit it in a few weeks, and I was only going to submit it if I could make a relatively simple change. If anyone else wants to give it a go, by all means don't wait for me.

@ddelange
Copy link
Contributor

ddelange commented Jul 2, 2024

fwiw, for the time being, I've been happily running #12186 on my machine:

pip install 'pip @ https://github.com/cosmicexplorer/pip/archive/refs/heads/metadata_only_resolve_no_whl.zip'

@webknjaz
Copy link
Member

webknjaz commented Aug 5, 2024

I was just about to file the same issue :)

P.S. @notatallshaw I can't edit the issue title. Could you fix the typo in the word “shouldn't”?

@notatallshaw notatallshaw changed the title Pip install --dry-run shouln't download full wheels when metadata file available Pip install --dry-run shouldn't download full wheels when metadata file available Aug 5, 2024
@notatallshaw
Copy link
Member Author

notatallshaw commented Aug 5, 2024

Done.

Also there was some recent progress in #12863

Personally, I'd quite like the work done by @cosmicexplorer to land.

@mmartial
Copy link

mmartial commented Aug 5, 2024

Just sharing a concrete example that brought me here.
I have a container with PyTorch 2.2 and CUDA 12.3
I used --dry-run to see what adding xformers would modify.
The pip install --dry-run xformers downloaded Torch 2.3 and all the Nvidia-related content, which is over 1GB.

@cosmicexplorer
Copy link
Contributor

@notatallshaw please keep pinging me if you perceive any blocking on my part! I have extreme confidence in my approach for all of my open PRs, which has been refined and honed over years starting from #7819 ever since I realized pip was the right place for the optimization work I was doing for Twitter Cortex ML in pantsbuild/pants#8793 (which produced install --report and --use-feature=fast-deps). I have the time and energy to make sure these all land, and I have the utmost respect for the pip maintainers, who are some of my favorite people to interact with and learn from. I know we're all busy and doing several things at once, and I left these PRs hanging for a few months earlier this year, which definitely slowed things down (sorry!).

If you apply all my current diffs (https://github.com/pypa/pip/pulls/cosmicexplorer) in series (the last one is #12258, sorry I need to rebase this, I'll do that now), you will get a truly fantastic performance improvement with minimal complexity, making use of the metadata resolution framework introduced in prior PRs to read metadata from cache (and even further). In addition to performance, it will also drastically reduce the number and magnitude of HTTP requests made against pypi: see #12256. Each of these subsequent PRs demonstrates a robust performance improvement, especially when resolving large binary wheels for ML frameworks like @mmartial discussed. The overall performance improvement is quite drastic (especially with --dry-run for large wheels), and after I rebase them all I can show clear benchmarks. Please let me know if I can improve my presentation/proposal of any of these changes to make them more convincing.

On a personal level, I really appreciate you advocating for my work, and I would love if you could continue to help nudge me to make sure this gets done. I'm @[email protected] on mastodon and if you DM me there or on twitter I will be more likely to respond. I think this code is good, I think it's right for pip, and I will do my part to keep iterating on these PRs until they're pip quality.

@rsxdalv
Copy link

rsxdalv commented Oct 24, 2024

PRs welcome. However, one potential upside of downloading the wheels is that if the dry run is followed by an actual install, the install itself will be faster as everything needed will be cached. So it's not immediately obvious that this change would be beneficial overall.

I think it's an upside for regular python apps, but with pytorch, xformers, accelerate etc it looks like this:

(C:\Users\admin\Desktop\one-click-installers-tts-main\tts-generation-webui\installer_files\env) C:\Users\admin\Desktop\one-click-installers-tts-main\tts-generation-webui>pip install --dry-run torch==2.3.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch==2.3.0
  Downloading https://download.pytorch.org/whl/cu118/torch-2.3.0%2Bcu118-cp310-cp310-win_amd64.whl (2673.0 MB)
     ━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/2.7 GB 40.9 MB/s eta 0:01:02
ERROR: Operation cancelled by user

(C:\Users\admin\Desktop\one-click-installers-tts-main\tts-generation-webui\installer_files\env) C:\Users\admin\Desktop\one-click-installers-tts-main\tts-generation-webui>pip install --dry-run torch==2.5.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch==2.5.0
  Downloading https://download.pytorch.org/whl/cu118/torch-2.5.0%2Bcu118-cp310-cp310-win_amd64.whl (2700.2 MB)
     ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/2.7 GB 40.9 MB/s eta 0:01:04
ERROR: Operation cancelled by user

@ddelange
Copy link
Contributor

you can use one of @cosmicexplorer's branches:

pip install 'pip @ https://github.com/cosmicexplorer/pip/archive/refs/heads/metadata_only_resolve_no_whl.zip'

they are being tracked here: #12921

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: install report pip install --report type: enhancement Improvements to functionality type: performance Commands take too long to run
Projects
None yet