-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llvmlite rebuild and publishing to conda-forge channel (to include SVML patch changes) #72
Comments
Thanks for the report, Alexander! 🙏 Would you like to start a PR? |
I don't know how conda-forge infra works exactly, but just retriggering llvmlite rebuild with a presence of patched llvm should work. @jakirkham should we just bump version or build number somewhere? |
Bumping the What I'm less sure about (and maybe you can help answer this question), is what happens if an |
Actually I'm somewhat confused to see LLVM as runtime dependency of llvmlite, I always thought native llvmlite part is linking statically with LLVM. |
Ok, after some checking, it seems llvmlite from conda-forge indeed depends on llvm dynamically (import symbols), @jakirkham do happens to know why it is not linking statically (I will iterate with Numba developers too)? So to answer original question
Yes, such configuration WILL cause problems, Numba users will get wrong results and/or weird crashes. |
So we probably should set |
In PR ( #21 ) we discussed and decided to switch to dynamic linking. This is much easier for us to maintain that way Unfortunately don't think it makes sense to switch back to static linking Think we will need to figure out another solution that maintains dynamic linking |
For Numba, we have seen problems arising from the use of dynamically linked LLVM. These problems are usually caused by multiple libLLVM versions in the same environment as needed by other libraries that uses LLVM (e.g. julia, pytorch). We usually advice users to switch to the statically linked llvmlite from our channel. Below, I tried to gather all known issues for references: |
Yes there was a lengthy discussion about this in the PR linked above Unfortunately it deviates from conda-forge policy of preferring dynamic linking and adds extra maintenance burden that hasn't been worth it So far we haven't seen issues with this approach and it has been in use for ~4yrs |
We generally separate libraries well (e.g. by package version, or where co-installable, by SOVER), so that the problems that @sklam describes generally do not affect users in conda-forge. This cannot be expected of regular user installs, but as John mentions, this is intentionally the standard in conda-forge, and we're willing to do the work necessary to keep using shared libraries, while -- obviously -- avoiding user crashes and the like. If you are able to provoke a problem with our setup, it'll be a bug that we'll fix on the side of the LLVM packaging. But |
You can potentially find linked LLVM in places completely out of conda-forge control (e.g. system GPU runtime) so it doesn't sound very robust to me :), but anyways it worked like this before so I guess it is fine for now. We will do PR bumping build version to rebuild llvmlite with patched llvm. |
Yes, but those places out of conda-forge's control should not be linking to conda-forge's LLVM, and thing built on top of conda-forge should be linking to conda-forge's artefacts first and foremost. The example of the system GPU runtime is tricky in the sense that someone might conceivably compile a package simultaneously against a conda environment and the system (as we cannot provide the latter). That should hopefully be a pretty rare case already, and mostly be solvable by prioritizing the system location in the linker invocation. Perhaps @jakirkham or @isuruf want to correct me if I'm saying something stupid, but this is the overall approach. |
From what I have seen, all the other places that use a non-standard LLVM statically link it. This works around the issues one could possibly see. All crashes I have seen in the past were related where two libs linked against the same SOVERSION of LLVM but one of them had used custom (pacthed or dev) version of LLVM. |
What are you suggesting @xhochy? That we add a |
We dropped that output as it was way too much effort maintain. Don't think it is worth it |
If it's just a question of effort, then I'm pretty sure I could handle that pretty easily with the new installation script setup. I've been doing ~most of llvmdev maintenance since about 11.1, and I don't remember seeing that output, so it must've been a while ago. Would be worth a try at least. |
So I've looked at this a bit more, and I think I was wrong to say that the static builds aren't there. They're just spread out into more finegrained libraries, rather than one
Those are currently part of Adding a |
Given shared libraries have worked fine for a number of years here and shared libraries are preferred in conda-forge (and are generally easier to work with as a result), prefer sticking with shared libraries |
The intention of my comment was that we should stick with shared here. Other libs that need static because they do non-release things with LLVM, e.g. Tensorflow, already link statically. |
Think it would be helpful if the team behind the SVML patch here pushed a bit harder on getting this upstreamed into LLVM latest (understand old versions are probably not worth the effort). That would apply pressure where it is most effective and focused (instead of the diffuse and indirect approach of fixing this in various package managers) Not saying this blocks fixing the issue here (just to be clear), but it does seem like a more maintainable/effective solution |
So I remembered this discussion when looking again at the
But like 10 months ago, the test suite segfaults immediately on osx when As a test, I built arrow against LLVM 14, and lo and behold, the tests then didn't segfault anymore (CI run). I find it hard to imagine more compelling evidence for the argument that mixing libllvm in particular is risky. It's not a common situation for us (because it needs explicitly versioned outputs to even have two different versions in the same env), and co-installability in such cases is fraught in general (here's an explanation by @isuruf why). So I don't get the opposition here. Shared libraries are serving us very well, but there are reasonable exceptions (we do this semi-regularly for other tricky libraries too, e.g. protobuf) - CFEP-18 is also far from black-and-white on this:
PS. To be sure, I diffed the environments for failure (before) and after (success). Test environment diff (from failure to success)@@ -42,13 +42,12 @@ The following NEW packages will be INSTALLED:
gflags: 2.2.2-hb1e8313_1004 conda-forge
glog: 0.6.0-h8ac2a54_0 conda-forge
hypothesis: 6.84.3-pyha770c72_0 conda-forge
- icu: 73.2-hf5e326d_0 conda-forge
idna: 3.4-pyhd8ed1ab_0 conda-forge
iniconfig: 2.0.0-pyhd8ed1ab_0 conda-forge
jmespath: 1.0.1-pyhd8ed1ab_0 conda-forge
krb5: 1.21.2-hb884880_0 conda-forge
libabseil: 20230802.0-cxx17_h048a20a_3 conda-forge
- libarrow: 14.0.0-heeec12f_4_cpu local
+ libarrow: 14.0.0-hce17ce0_4_cpu local
libblas: 3.9.0-18_osx64_openblas conda-forge
libbrotlicommon: 1.1.0-h0dc2134_0 conda-forge
libbrotlidec: 1.1.0-h0dc2134_0 conda-forge
@@ -66,10 +65,8 @@ The following NEW packages will be INSTALLED:
libgfortran5: 13.2.0-h2873a65_1 conda-forge
libgoogle-cloud: 2.12.0-hc7e40ee_2 conda-forge
libgrpc: 1.57.0-ha2534ac_1 conda-forge
- libiconv: 1.17-hac89ed1_0 conda-forge
liblapack: 3.9.0-18_osx64_openblas conda-forge
libllvm14: 14.0.6-hc8e404f_4 conda-forge
- libllvm15: 15.0.7-he4b1e75_3 conda-forge
libnghttp2: 1.52.0-he2ab024_0 conda-forge
libopenblas: 0.3.24-openmp_h48a4ad5_0 conda-forge
libprotobuf: 4.23.4-he0c2237_6 conda-forge
@@ -77,7 +74,6 @@ The following NEW packages will be INSTALLED:
libssh2: 1.11.0-hd019ec5_0 conda-forge
libthrift: 0.19.0-h88b220a_0 conda-forge
libutf8proc: 2.8.0-hb7f2c08_0 conda-forge
- libxml2: 2.11.5-h3346baf_1 conda-forge
libzlib: 1.2.13-h8a1eda9_5 conda-forge
llvm-openmp: 16.0.6-hff08bdf_0 conda-forge
llvmlite: 0.40.1-py311hcbb5c6d_0 conda-forge
@@ -92,8 +88,8 @@ The following NEW packages will be INSTALLED:
packaging: 23.1-pyhd8ed1ab_0 conda-forge
pandas: 2.1.0-py311hab14417_0 conda-forge
pluggy: 1.3.0-pyhd8ed1ab_0 conda-forge
- pyarrow: 14.0.0-py311h54e7ce8_4_cpu local
- pyarrow-tests: 14.0.0-py311h54e7ce8_4_cpu local
+ pyarrow: 14.0.0-py311hcbd9da5_4_cpu local
+ pyarrow-tests: 14.0.0-py311hcbd9da5_4_cpu local
pycparser: 2.21-pyhd8ed1ab_0 conda-forge
pyopenssl: 23.2.0-pyhd8ed1ab_1 conda-forge
pysocks: 1.7.1-pyha2e5f31_6 conda-forge Test environment for segfaulting case
|
Given the segfaults with pyarrow + sparse + numba + llvmlite when using shared libllvm, I think we should proceed with linking llvm statically here, as strongly suggested by @Hardcode84 - I feel we're not affording sufficient weight to the opinion of upstream maintainers here; it's fine to insist on our way as long as things work, but here they clearly don't. Those objecting to linking statically would need to at least propose a solution that avoids those segfaults for arrow (link statically there?), but overall I find these segfaults to be pretty compelling evidence for making a CFEP-18 exception, because of course they could be happening in other circumstances as well. |
Do you have a reproducer I can take a look at? |
Uncomment this line in the arrow recipe (resp. co-install |
@isuruf, have you had a chance to look at the llvmlite crashes with arrow+sparse? |
How can we move this topic forward? AFAICT llvmlite still hasn't been rebuilt against the SVML-enabled libllvm. Given the lack of progress here for ~9 month, I will urge that we build against static libllvm, unless & until someone debugs the issues that are being surfaced in arrow+sparse. If someone still disagrees, let's discuss it in the next core call. |
Feel free to add to the agenda for tomorrow Regardless of what we do, think it would help to have a current reproducer Attempted to follow this step ( #72 (comment) ) above: conda-forge/pyarrow-feedstock#121 Though please feel free to push other changes there. Ideally we would reproduce this issue on CI so we can share a bit more (would like to ask others to take a look) |
Comment:
Hello, I see the PR (conda-forge/llvmdev-feedstock#192) which is enabled SVML patch for LLVM 14. But per my understanding to finalize the story with SVML it's required llvmlite rebuild and publication to conda-forge channel (https://anaconda.org/conda-forge/llvmlite/).
Is it possible to make rebuild ASAP or there are any problems/blockers?
The text was updated successfully, but these errors were encountered: