-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --exclude option to auditwheel repair #368
Conversation
Codecov ReportBase: 92.42% // Head: 92.45% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #368 +/- ##
==========================================
+ Coverage 92.42% 92.45% +0.02%
==========================================
Files 23 23
Lines 1268 1272 +4
Branches 311 312 +1
==========================================
+ Hits 1172 1176 +4
Misses 55 55
Partials 41 41
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution,
I left a couple remarks inlined.
The PR is failing CI:
- we do not have pre-commit.ci setup yet so please make sure to run
nox -s lint
in order to check/repair for linter issues. if you don't know it, nox is a command-line tool that automates testing in multiple Python environments. - codecov is complaining of a drop in coverage because there's no test associated with this new feature. Can you please add some tests for this ?
Thanks for reviewing! I'll try to find the time to add tests and fix linting |
27939cf
to
82ef32c
Compare
I think this is a nice addition. Any chance we can get this in sooner? 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please increase the test coverage
26c1bb8
to
ebd5098
Compare
@martinRenou I made a few changes to the implementation, mainly to simplify it and added tests for it. I hope you don't mind that I pushed the changes directly into your branch. |
This allows excluding libraries from the resulting wheels, like OpenGL or Vulkan, which are provided by the OS.
We have fairly complex code in a test to build the numpy wheels. We can extract this and reuse it in other tests in the future which need a cleanly build wheel before testing `auditwheel repair` functionality.
ebd5098
to
093bd1c
Compare
Build numpy from source and when repairing, exclude the gfortran library. This is only tested in a single manylinux image to avoid increasing the test time too much, as the functionality is not dependent on the platform.
093bd1c
to
b5f3907
Compare
I don't mind at all. Thanks a lot! |
I haven't merged this yet, as I'm still unsure if it is a good idea to exclude arbitrary libraries. On one hand, it solves a real issue, and I don't see how else we could achieve this, but it also ends up violating the manylinux spec. There is a risk that this will cause confusion and bad user experience, and it makes it more likely that users will run into ABI incompatibilities due to version mismatches between the extension and its excluded dependencies. Another concern is that if PyPI wanted to block uploads of non-conforming wheels, this feature would make it more difficult to implement it (see pypi/warehouse#5420). I wonder if this needs a larger discussion with the PyPA community, since the scope is bigger here than just auditwheel. If we want to properly support wheels which require some special external libraries, that could require changes in tooling in other areas as well. I don't have the bandwidth to take this on, but if someone wanted to raise this on the Packaging Discourse, that would be a good start. Any thoughts @mayeut? |
I think this would allow abusing the purpose of auditwheel. I commented on the original issue about not pulling |
There was PEP 668, which AIUI was also trying to help this use case of having some libraries externally managed. This is the result of a few discussions in different threads over the years (including past PyCons): |
This would require a new ABI compatibility analysis to be done to ensure whitelisting is allowed per pep-600
Yes, it would allow abusing the purpose but allowing package maintainers to use this (responsibly) is a way not to have auditwheel maintainers on the critical path of many projects. As can be seen by the activity in the repo, I do not have that much time to spend on this project, I'm not sure about @lkollar, but I guess that's about the same. see also my other comment in #310 (comment) There are already many packages on PyPI abusing PEP600, this is no reason to do so but, IMHO, having an escape hatch is always a good thing to have. It's probably better to have this escape hatch and have the rest of auditwheel run its process than to have a bunch of hand-tagged manylinux1 wheels on PyPI. One example of abuse (or let's say at least gray area) are GPU dependent packages. Those raised some discussion but I'm not sure there's a real conclusion yet (& the discussion starter was package size rather than pep600 compliance).
Agreed but my comment above applies: I think the pros outweigh the cons.
I think that if PyPI is to implement that feature, they should not block on missing shared objects. They can't know for sure they're not compliant unless doing a manual review. We can't even know for sure that a manylinux1 tagged wheel that embeds some manylinux2014+ only symbol versions is not in fact manylinux1 compliant although in 99.9% of cases, we'd be right to assume it should be tagged manylinux2014.
While I personally think this should be merged, EDIT: @rgommers is probably right about a large community-wide discussion 2 comments below this one. |
Speaking from my personal experience, the whitelisting discussion here is intimately related to the package size discussion. They are the different facets of the same problem. For GPU packages (or any packages that rely on vender-provided, proprietary prebuilt binaries) without whitelisting they only have the following options:
Option 1 is technically still in violation of the auditwheel spirit, because it's still dynamic linking and could allow a non-self-contained wheel. It also requires a careful runtime check to ensure the correct shared library is dlopen'd. Option 2, on the other hand, leads to significantly increased package sizes and, in particular for proprietary packages, potential violation of the vendor's EULA (some vendors put very strict redistribution limitation). Finally, to adopt either option requires the downstream package providers to invest significant effort. I am +1 for supporting whitelisting or any kind of escape hatch. The only (minor) disagreement I have with @mayeut's above comment is I see no cons, only pros 🙂 |
This has now been proposed independently multiple times:
So that's a good fraction of the most popular scientific computing, data science and ML/AI libraries - plus some visualization mixed in for good measure.
I would agree with the latter part. For the former I'd say: this is not abuse, there are good technical reasons for wanting to exclude a shared library. At least four separate ones have been given:
I am all for discouraging stuffing native dependencies in wheels when it is not needed, because it doesn't fit well with the design of PyPI and Python packaging. There's lots of ways to shoot ourselves in the foot, and the author-led model of uploading new releases of versions one by one, built without a common infrastructure, will make it hard to avoid problems or correct them later. That said, it is very clear that there is a need. So there should be an escape hatch. The users of that escape hatch are responsible for ensuring that they do the preloading correctly in order to not end up with end users with broken systems. But in the end it's also not that hard to get it right, once the correct pattern is established (which is checking the location of the Python package providing the shared library first, then preloading it - and not simply using RPATH, as discussed in several of the other issues).
I agree with @mayeut here. This is a little nontrivial, but it's a dependency management problem. The limited set of projects that really need this do have maintainers that have some understanding of the issues. And I'm sure it'll go wrong once - but that's why there's a "yank" button on PyPI.
Unless there are very concrete issues for the projects listed above that actually want to use this (did you have any in mind @lkollar?), it'd be great to just move this forward. In the end this is a ~10 line patch + tests, and if there really is a huge showstopper that turns up later - because there is none presented so far - then it can be reconsidered or reverted at that point. Requiring someone to start a large community-wide discussion, which may then veer into "please write a PEP" or "please consider all of the original pynativelib proposal" does not seem like a fair ask. Nor a necessary one. Not even desirable probably, since we don't want to encourage everyone to start doing this - only when one really needs it, and there's no other way out. |
Thanks for your feedback @rgommers. I edited my comment about a community-wide discussion. I think you're absolutely right on that one. I guess @mattip changed his view about not wanting this PR to be merged. |
Yes, I changed my view as I learned more about the pain points mentioned above: practicality sometimes beats purity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let another week to @lkollar to comment. I'll merge this next week otherwise.
I think this PR also requires an option to update the did. I.e., add the relative path of the skipped libraries in My first thought is to change the value of
if the value ends with For example, my own use case (exclude auditwheel repair --exclude \
'libtorch_python.so:$ORIGIN/../torch/lib' \
...
# very long list of sonames
...
package.whl EDIT: Even further, I think it would be better to support glob patterns rather than specify the sonames one by one. Because the dependency libraries grow over time while new For example, $ /usr/bin/ldd _C.cpython-38-x86_64-linux-gnu.so
linux-vdso.so.1 (0x00007fff6d2f3000)
libtorch_python.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch_python.so (0x00007f59db972000)
libgomp.so.1 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/libgomp.so.1 (0x00007f59db939000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f59db922000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f59db8ff000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f59db8f9000)
libstdc++.so.6 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/libstdc++.so.6 (0x00007f59db743000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f59db5f4000)
libgcc_s.so.1 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/libgcc_s.so.1 (0x00007f59db5db000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f59db3e9000)
/lib64/ld-linux-x86-64.so.2 (0x00007f59dc82e000)
libshm.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libshm.so (0x00007f59db3df000)
libtorch.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch.so (0x00007f59db3d8000)
libtorch_cuda.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so (0x00007f59db3b9000)
libtorch_cuda_cpp.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so (0x00007f59d5786000)
libnvToolsExt.so.1 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libnvToolsExt.so.1 (0x00007f59d557b000)
libtorch_cpu.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so (0x00007f59c8e47000)
libc10_cuda.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libc10_cuda.so (0x00007f59c8df1000)
libcudart.so.11.0 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcudart.so.11.0 (0x00007f59c8b49000)
libcudnn.so.8 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libcudnn.so.8 (0x00007f59c891e000)
libtorch_cuda_cu.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cu.so (0x00007f599b37d000)
libc10.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libc10.so (0x00007f599b2e0000)
libcusparse.so.11 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcusparse.so.11 (0x00007f598d14b000)
libcurand.so.10 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcurand.so.10 (0x00007f598755a000)
libcufft.so.10 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcufft.so.10 (0x00007f597eacb000)
libmkl_intel_lp64.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libmkl_intel_lp64.so (0x00007f597df2c000)
libmkl_gnu_thread.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libmkl_gnu_thread.so (0x00007f597c3a1000)
libmkl_core.so => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libmkl_core.so (0x00007f5977f31000)
libcupti-ea0c9f68.so.11.6 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/libcupti-ea0c9f68.so.11.6 (0x00007f5977682000)
libcublas.so.11 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11 (0x00007f596dc0a000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f596dc05000)
libcublasLt.so.11 => /home/PanXuehai/Miniconda3/envs/torchopt/lib/python3.8/site-packages/torch/lib/../../../../libcublasLt.so.11 (0x00007f5958676000) I think it would be nice to have:
|
No, please do not do that, it is not going to work. See #391 (comment) for why, and further down in that discussion for a more robust solution: #391 (comment). |
The |
@rgommers @leofang Thanks for the clarification. It's reasonable to skip only for the system dependencies. Waiting for a more robust approach for excluding third-party libraries. Adding a wrapping layer looks better than |
Let's get this merged. |
Sorry for not answering earlier @mayeut. I was going to agree with merging this though. Thanks! |
This is a follow-up of #310 by @rossant, only keeping the
exclude
option as suggested by @mayeutCloses #76
Closes #241
Closes #310
Fixes #391
Original PR description:
This is a quick fix to solve an issue I have with my project and for which auditwheel seemed to help. I don't know if this approach is suitable to other users of auditwheel.
I develop a C graphics library (datoviz) that depends on Vulkan and comes with Cython bindings. datoviz/datoviz#13 for Linux (at least Ubuntu 20.04, that would be a start).
Any help would be appreciated, regarding either this pull request, and/or a way to solve the issue I'm facing. Thanks!