-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipenv PEP 503 Improvement: Pipenv downloads PyTorch for all versions of Python, grabbing 16GB of data instead of just 1.7GB. #4963
Comments
@Bananaman I believe the issue here is that the private server |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
I think I see now that the reason you are using the other package server is you are looking for a cuda specific version of torch that is not in pypi? |
Yeah, my card requires PyTorch built for CUDA Toolkit 11.x, which can only be found at the PyTorch repository.
Well there's 2 issues here:
The best fix would be to do "if running under CPython, look for matching identifier in package filenames such as 'cp39' and only download that/those if such an identifier is found". As far as I have heard, the There's lots of room for improvement of Pipenv's PEP 503 support. Phase 1 could be "Skip every The most important thing would be to skip the other How feasible is it that Pipenv can be extended to filter out useless downloads? Hopefully the internal code isn't too rigid. |
@Bananaman Thanks for your feedback, and I am pretty new here to this code base still but from what I gather about the dependency resolution is that this may require an upstream change somewhere, but I think this is good discussion and could lead to some improvements. |
Ahh, I see. If someone knows what dependency resolver pipenv uses, we'll know where to file the issue then. :) |
@Bananaman I've learned a lot recently -- it uses Pip's dependency resolver. I've done work to get pipenv vendor'd to 22.0.4 (its currently on 21.x) here: #4969 However I just tried your example on this branch and I think now you have new issues with the install instructions on the newer pip resolver:
EDIT: Actually is it possible that a new version has replaced the older one on that URL? Because it seems to have this: Though that lead to it failing to lock:
DOUBLE EDIT: Oh man my last Installation Failed was a result of it using my system python 3.10 and the pre-built wheels there only go up to python3.9. Tryin again now with |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
@Bananaman there was this ticket I worked yesterday/today and it shed some light on indexes, plus I realized I have nvidia on my laptop just not on the VM, so I am doing some experiments within Windows now. I had a test run that was very quick and generated this file, but there was no hash for the package
So then I tried an experimental branch locally of the pip 22.0.4 resolver updates combined with my other PR for index resolving fixes. This time I watched my network router and I saw it download the 16+ GB over several minutes of waiting, which definitely took 8x longer than whatever generated the above Pipfile.lock. However one difference is that the new Pipfile.lock does have all of the hashes and there are 8 hashes so that explains why it took so much longer.
But then I re-ran it the way I had it prior with specifying the --index based on my learnings and the fixes in my newer branches and this was fast, didn't download anything actually (maybe the wheels are already cached?)
Moving out the lockfile and then regenerating it was very quick and downloaded nothing, I am thinking because its still in cache somewhere. I haven't had great luck finding these large files on my windows file system however.
Removing the virtualenv with Here is what my generated Pipfile looks like from the initial command:
If you are interested in trying out this branch to see if it has improvements for your issue, I have pushed it out here, its called |
Also noting that I did a followup where I had hoped adding the markers for
|
@Bananaman Also noting that it may be a reasonable workaround to target a very specific wheels file in this case. For example, I tried:
Then the Pipfile.lock contains just the hash for the wheel I installed:
I am not sure what the level of effort would be to get the markers (python_version and system) to restrict which wheels, but I suspect the level of effort is high. The net outcome would be worth it, but without more analysis of the code, I am not sure if this is another case where patching pip resolver itself would be required. |
Two updates on this front: As I a result, I think we can close this as completed. |
Happy to report that I worked on a PR to assist pytorch in adding hashes to the pytorch indexes and they finished the back population today, now locking pytorch is much faster on latest pipenv versions. |
I recently posted the correct way to install PyTorch as a PEP 503 repository in Pipenv:
#4961 (comment)
There's just one annoying issue in Pipenv: It downloads PyTorch for every version of CPython.
So let's say my project is based on
pipenv install --python=3.9
. And I then run the command to install PyTorch (see guide above for details):pipenv install --extra-index-url https://download.pytorch.org/whl/cu113/ "torch==1.10.1+cu113"
.Well, Pipenv then downloads all versions of PyTorch into
~/.cache/pipenv
: cp36, cp37, cp38, cp39 and probably a few more. And then it finally installs the intended architecture (torch-1.10.1+cu113-cp39
).This means that the download took 16 GB and 30 minutes, instead of 1.7 GB and 4 minutes. Wasting a ton of disk space and time on downloading extra copies of the library for old Python versions that I'll never use.
I confirmed that the extra downloaded data is versions for old Python releases, because I went into the Pipenv cache and looked inside the hashed archives to check their WHEEL metadata. It was stuff like the "Python 3.6" torch version etc.
I'm using
pipenv 2022.1.8
.My guess is that Pipenv's current algorithm just searches PEP 503 repos for packages whose name start with
torch-*
and downloads them ALL and then looks at the embedded "wheel metadata" in all downloaded archives to figure out which one matches the installed Python version.Can Pipenv be improved to detect the "cp39" filename hints in PEP 503 repos and only download the version that matches the installed Python version?
The text was updated successfully, but these errors were encountered: