-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributing large binary payloads as separate downloads #7852
Comments
Just searching around, and I think that a potential alternative is to use a private PyPI deployment (maybe using Another potential alternative is to provide a custom post-install script, but I just realized that the wheel format officially don't support pre or post-install scripts, and I also migrated to |
I have the same problem and I like that you want to solve this entirely within pip if possible.
and I've seen (but am currently forgetting) lots of neural network projects that have written their own pytorch has (pytorch/pytorch#26340 (comment)) put up this page: https://download.pytorch.org/whl/torch_stable.html (but they also distribute via pypi; they have 800MB behemoths on there) and in my own current project we solved this by hacking together our own package manager parallel to pip: I don't like any of these because they break the packaging system. Unfortunately, I don't think pypa is going to want to change anything. They want pypi to be standalone. They said pypa/pip#5898 (comment) :
Their position doesn't make a lot of sense to me because you can still pick a specific server if you make your users install from source, e.g.:
(this is not good if your package actually has source to build, so skimage isn't a great demo here, but if you're just full of data files this should be alright) You already mentioned another workaround of just putting up your own a repo (like pytorch) (perhaps using https://github.com/chriskuehl/dumb-pypi?). It's not totally seamless but you can improve on that by:
But with all three nothing can depend on your package because None of these workarounds are very good. It seems like either:
|
What's the problem this feature will solve?
Including me, many people are requesting the size limit increase for their wheel packages uploaded to PyPI. In particular, distributing whole OS-specific prebuilt binaries and GPU code binaries often takes hundreds of megabytes.
I think if we have a way to distribute large binary payloads separately, similar to Git LFS, it would be good for both reducing network traffic and PyPI maintenance.
#474 is also related to the idea.
Describe the solution you'd like
This is my rough idea. Maybe there are many edges to clear out.
MANIFEST.in
can associate specific files and file patterns with paths prefixed with external resource identifiers.e.g.,
assets/mydata.bin
->mybinary/mydata.bin
setup.py
orsetup.cfg
can define external resource identifiers as a mapping from slug names to URL prefixes.e.g.,
mybinary
->https://mys3bucket.s3.amazonaws.com/mypackage
Additional context
I'd like to avoid overloading PyPI and network resources, but still want have a seamless way to distribute large-size binaries with the Python's standard packaging mechanism.
The disadvantage of this approach is that wheels become not self-contained, and versioning of external resources may be broken by package maintainers' mistakes. (Maybe, PyPI can provide a fallback repository of external resources, because it is already hosting large-sized packages now based on requests.)
We could mitigate human errors by enforcing specific rules about naming the external resource directories, like requiring them to have the names same to the wheel file names, and using checksums. Moreover, we could extend
wheel
andtwine
to handle split-packaging files that exceeds a certain size limit automatically and use a user-provided credentials to upload to sepcific locations (e.g., S3) and PyPI as a fallback.I just want to give an idea and see what people think.
For example, considering a significant amount of technical efforts to implement and maintain the above idea, it might be more feasible to just allow larger uploads to PyPI.
There may be a past discussion about the same topic, but please forgive me if this is a duplication and guide me to the discussion thread.
The text was updated successfully, but these errors were encountered: