Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llvmlite rebuild and publishing to conda-forge channel (to include SVML patch changes) #72

Open
xaleryb opened this issue Aug 3, 2023 · 27 comments
Labels

Comments

@xaleryb
Copy link

xaleryb commented Aug 3, 2023

Comment:

Hello, I see the PR (conda-forge/llvmdev-feedstock#192) which is enabled SVML patch for LLVM 14. But per my understanding to finalize the story with SVML it's required llvmlite rebuild and publication to conda-forge channel (https://anaconda.org/conda-forge/llvmlite/).

Is it possible to make rebuild ASAP or there are any problems/blockers?

@jakirkham
Copy link
Member

Thanks for the report, Alexander! 🙏

Would you like to start a PR?

@Hardcode84
Copy link

I don't know how conda-forge infra works exactly, but just retriggering llvmlite rebuild with a presence of patched llvm should work.

@jakirkham should we just bump version or build number somewhere?

@jakirkham
Copy link
Member

Bumping the build/number seems like a good first step

What I'm less sure about (and maybe you can help answer this question), is what happens if an llvmlite build with SVML enabled picks up LLVM libraries that don't have SVML support built? Could that installation combination run into issues?

@Hardcode84
Copy link

Hardcode84 commented Aug 3, 2023

Actually I'm somewhat confused to see LLVM as runtime dependency of llvmlite, I always thought native llvmlite part is linking statically with LLVM.

@Hardcode84
Copy link

Ok, after some checking, it seems llvmlite from conda-forge indeed depends on llvm dynamically (import symbols), @jakirkham do happens to know why it is not linking statically (I will iterate with Numba developers too)? So to answer original question

What I'm less sure about (and maybe you can help answer this question), is what happens if an llvmlite build with SVML enabled picks up LLVM libraries that don't have SVML support built?

Yes, such configuration WILL cause problems, Numba users will get wrong results and/or weird crashes.

@Hardcode84
Copy link

So we probably should set export LLVMLITE_SHARED=0 to force static linking and drop runtime llvm dependency. I know it was causing problems in the past, but they should be handled now, as llvmlite now sets -Wl,--exclude-libs,ALL so there won't be any conflicts with potential shared llvm libs loaded in same process.

@jakirkham
Copy link
Member

In PR ( #21 ) we discussed and decided to switch to dynamic linking. This is much easier for us to maintain that way

Unfortunately don't think it makes sense to switch back to static linking

Think we will need to figure out another solution that maintains dynamic linking

@sklam
Copy link

sklam commented Aug 3, 2023

For Numba, we have seen problems arising from the use of dynamically linked LLVM. These problems are usually caused by multiple libLLVM versions in the same environment as needed by other libraries that uses LLVM (e.g. julia, pytorch). We usually advice users to switch to the statically linked llvmlite from our channel. Below, I tried to gather all known issues for references:

@jakirkham
Copy link
Member

Yes there was a lengthy discussion about this in the PR linked above

Unfortunately it deviates from conda-forge policy of preferring dynamic linking and adds extra maintenance burden that hasn't been worth it

So far we haven't seen issues with this approach and it has been in use for ~4yrs

@h-vetinari
Copy link
Member

We generally separate libraries well (e.g. by package version, or where co-installable, by SOVER), so that the problems that @sklam describes generally do not affect users in conda-forge. This cannot be expected of regular user installs, but as John mentions, this is intentionally the standard in conda-forge, and we're willing to do the work necessary to keep using shared libraries, while -- obviously -- avoiding user crashes and the like.

If you are able to provoke a problem with our setup, it'll be a bug that we'll fix on the side of the LLVM packaging. But LLVMLITE_SHARED=0 will most likely not have any effect because there are no static builds of libLLVM in conda-forge available.

@Hardcode84
Copy link

Hardcode84 commented Aug 4, 2023

We generally separate libraries well (e.g. by package version, or where co-installable, by SOVER), so that the problems that @sklam describes generally do not affect users in conda-forge. This cannot be expected of regular user installs, but as John mentions, this is intentionally the standard in conda-forge, and we're willing to do the work necessary to keep using shared libraries, while -- obviously -- avoiding user crashes and the like.

If you are able to provoke a problem with our setup, it'll be a bug that we'll fix on the side of the LLVM packaging. But LLVMLITE_SHARED=0 will most likely not have any effect because there are no static builds of libLLVM in conda-forge available.

You can potentially find linked LLVM in places completely out of conda-forge control (e.g. system GPU runtime) so it doesn't sound very robust to me :), but anyways it worked like this before so I guess it is fine for now. We will do PR bumping build version to rebuild llvmlite with patched llvm.

@h-vetinari
Copy link
Member

You can potentially find linked LLVM in places completely out of conda-forge control (e.g. system GPU runtime) so it doesn't sound very robust to me :)

Yes, but those places out of conda-forge's control should not be linking to conda-forge's LLVM, and thing built on top of conda-forge should be linking to conda-forge's artefacts first and foremost.

The example of the system GPU runtime is tricky in the sense that someone might conceivably compile a package simultaneously against a conda environment and the system (as we cannot provide the latter). That should hopefully be a pretty rare case already, and mostly be solvable by prioritizing the system location in the linker invocation.

Perhaps @jakirkham or @isuruf want to correct me if I'm saying something stupid, but this is the overall approach.

@xhochy
Copy link
Member

xhochy commented Aug 9, 2023

From what I have seen, all the other places that use a non-standard LLVM statically link it. This works around the issues one could possibly see. All crashes I have seen in the past were related where two libs linked against the same SOVERSION of LLVM but one of them had used custom (pacthed or dev) version of LLVM.

@h-vetinari
Copy link
Member

What are you suggesting @xhochy? That we add a libllvm-static?

@jakirkham
Copy link
Member

We dropped that output as it was way too much effort maintain. Don't think it is worth it

@h-vetinari
Copy link
Member

We dropped that output as it was way too much effort maintain. Don't think it is worth it

If it's just a question of effort, then I'm pretty sure I could handle that pretty easily with the new installation script setup. I've been doing ~most of llvmdev maintenance since about 11.1, and I don't remember seeing that output, so it must've been a while ago. Would be worth a try at least.

@h-vetinari
Copy link
Member

So I've looked at this a bit more, and I think I was wrong to say that the static builds aren't there. They're just spread out into more finegrained libraries, rather than one libLLVM.a. I believe that upstream llmvlite already takes this into account:

if ($ENV{LLVMLITE_SHARED})
    set(llvm_libs LLVM)
else()
    set(llvm_libs ${LLVM_AVAILABLE_LIBS})
endif()

Those are currently part of llvmdev, so I have to correct my previous statement, and if it's important for this feedstock, you can indeed set LLVMLITE_SHARED=0 and things should work.

Adding a libllvm-static output would be trivial, but I'm not even sure that we need it - llvmdev is already there, and it's not like we'd need an extra run-export for a static lib anyway.

@jakirkham
Copy link
Member

Given shared libraries have worked fine for a number of years here and shared libraries are preferred in conda-forge (and are generally easier to work with as a result), prefer sticking with shared libraries

@xhochy
Copy link
Member

xhochy commented Aug 12, 2023

The intention of my comment was that we should stick with shared here. Other libs that need static because they do non-release things with LLVM, e.g. Tensorflow, already link statically.

@jakirkham
Copy link
Member

Think it would be helpful if the team behind the SVML patch here pushed a bit harder on getting this upstreamed into LLVM latest (understand old versions are probably not worth the effort). That would apply pressure where it is most effective and focused (instead of the diffuse and indirect approach of fixing this in various package managers)

Not saying this blocks fixing the issue here (just to be clear), but it does seem like a more maintainable/effective solution

@h-vetinari
Copy link
Member

So I remembered this discussion when looking again at the sparse integration of pyarrow after quite a while.

pyarrow builds on libarrow, which requires the llvm headers for a subcomponent, and as of next version, also the libllvm library. Obviously, I wanted to use the shared build there. Currently we're using LLVM 15 there, in line with our compiler pins.

But like 10 months ago, the test suite segfaults immediately on osx when sparse (and thus numba/llvmlite/libllvm14) is present in the environment, in addition to the libllvm15 from libarrow (CI run)

As a test, I built arrow against LLVM 14, and lo and behold, the tests then didn't segfault anymore (CI run).

I find it hard to imagine more compelling evidence for the argument that mixing libllvm in particular is risky. It's not a common situation for us (because it needs explicitly versioned outputs to even have two different versions in the same env), and co-installability in such cases is fraught in general (here's an explanation by @isuruf why).

So I don't get the opposition here. Shared libraries are serving us very well, but there are reasonable exceptions (we do this semi-regularly for other tricky libraries too, e.g. protobuf) - CFEP-18 is also far from black-and-white on this:

In general, we prefer our packages to work completely without static linkage between different packages but are aware that static archives are sometimes needed / useful.

PS. To be sure, I diffed the environments for failure (before) and after (success).

Test environment diff (from failure to success)
@@ -42,13 +42,12 @@ The following NEW packages will be INSTALLED:
     gflags:              2.2.2-hb1e8313_1004            conda-forge
     glog:                0.6.0-h8ac2a54_0               conda-forge
     hypothesis:          6.84.3-pyha770c72_0            conda-forge
-    icu:                 73.2-hf5e326d_0                conda-forge
     idna:                3.4-pyhd8ed1ab_0               conda-forge
     iniconfig:           2.0.0-pyhd8ed1ab_0             conda-forge
     jmespath:            1.0.1-pyhd8ed1ab_0             conda-forge
     krb5:                1.21.2-hb884880_0              conda-forge
     libabseil:           20230802.0-cxx17_h048a20a_3    conda-forge
-    libarrow:            14.0.0-heeec12f_4_cpu          local
+    libarrow:            14.0.0-hce17ce0_4_cpu          local
     libblas:             3.9.0-18_osx64_openblas        conda-forge
     libbrotlicommon:     1.1.0-h0dc2134_0               conda-forge
     libbrotlidec:        1.1.0-h0dc2134_0               conda-forge
@@ -66,10 +65,8 @@ The following NEW packages will be INSTALLED:
     libgfortran5:        13.2.0-h2873a65_1              conda-forge
     libgoogle-cloud:     2.12.0-hc7e40ee_2              conda-forge
     libgrpc:             1.57.0-ha2534ac_1              conda-forge
-    libiconv:            1.17-hac89ed1_0                conda-forge
     liblapack:           3.9.0-18_osx64_openblas        conda-forge
     libllvm14:           14.0.6-hc8e404f_4              conda-forge
-    libllvm15:           15.0.7-he4b1e75_3              conda-forge
     libnghttp2:          1.52.0-he2ab024_0              conda-forge
     libopenblas:         0.3.24-openmp_h48a4ad5_0       conda-forge
     libprotobuf:         4.23.4-he0c2237_6              conda-forge
@@ -77,7 +74,6 @@ The following NEW packages will be INSTALLED:
     libssh2:             1.11.0-hd019ec5_0              conda-forge
     libthrift:           0.19.0-h88b220a_0              conda-forge
     libutf8proc:         2.8.0-hb7f2c08_0               conda-forge
-    libxml2:             2.11.5-h3346baf_1              conda-forge
     libzlib:             1.2.13-h8a1eda9_5              conda-forge
     llvm-openmp:         16.0.6-hff08bdf_0              conda-forge
     llvmlite:            0.40.1-py311hcbb5c6d_0         conda-forge
@@ -92,8 +88,8 @@ The following NEW packages will be INSTALLED:
     packaging:           23.1-pyhd8ed1ab_0              conda-forge
     pandas:              2.1.0-py311hab14417_0          conda-forge
     pluggy:              1.3.0-pyhd8ed1ab_0             conda-forge
-    pyarrow:             14.0.0-py311h54e7ce8_4_cpu     local
-    pyarrow-tests:       14.0.0-py311h54e7ce8_4_cpu     local
+    pyarrow:             14.0.0-py311hcbd9da5_4_cpu     local
+    pyarrow-tests:       14.0.0-py311hcbd9da5_4_cpu     local
     pycparser:           2.21-pyhd8ed1ab_0              conda-forge
     pyopenssl:           23.2.0-pyhd8ed1ab_1            conda-forge
     pysocks:             1.7.1-pyha2e5f31_6             conda-forge
Test environment for segfaulting case
The following NEW packages will be INSTALLED:

    aiobotocore:         2.5.4-pyhd8ed1ab_0             conda-forge
    aiohttp:             3.8.5-py311h2725bcf_0          conda-forge
    aioitertools:        0.11.0-pyhd8ed1ab_0            conda-forge
    aiosignal:           1.3.1-pyhd8ed1ab_0             conda-forge
    async-timeout:       4.0.3-pyhd8ed1ab_0             conda-forge
    attrs:               23.1.0-pyh71513ae_1            conda-forge
    aws-c-auth:          0.7.3-h1fca4dd_3               conda-forge
    aws-c-cal:           0.6.2-h03343b3_0               conda-forge
    aws-c-common:        0.9.0-h0dc2134_0               conda-forge
    aws-c-compression:   0.2.17-hc1c6d78_2              conda-forge
    aws-c-event-stream:  0.3.2-hf265e0f_0               conda-forge
    aws-c-http:          0.7.12-h572c275_1              conda-forge
    aws-c-io:            0.13.32-hc0caee9_3             conda-forge
    aws-c-mqtt:          0.9.5-h54c5ab7_1               conda-forge
    aws-c-s3:            0.3.17-hab1dffe_0              conda-forge
    aws-c-sdkutils:      0.1.12-hc1c6d78_1              conda-forge
    aws-checksums:       0.1.17-hc1c6d78_1              conda-forge
    aws-crt-cpp:         0.23.1-hd14e152_1              conda-forge
    aws-sdk-cpp:         1.11.156-h706e9e7_1            conda-forge
    backports.zoneinfo:  0.2.1-py311h6eed73b_7          conda-forge
    boto3:               1.28.17-pyhd8ed1ab_0           conda-forge
    botocore:            1.31.17-pyhd8ed1ab_3           conda-forge
    brotlipy:            0.7.0-py311h5547dcb_1005       conda-forge
    bzip2:               1.0.8-h0d85af4_4               conda-forge
    c-ares:              1.19.1-h0dc2134_0              conda-forge
    ca-certificates:     2023.7.22-h8857fd0_0           conda-forge
    certifi:             2023.7.22-pyhd8ed1ab_0         conda-forge
    cffi:                1.15.1-py311ha86e640_3         conda-forge
    charset-normalizer:  3.2.0-pyhd8ed1ab_0             conda-forge
    click:               8.1.7-unix_pyh707e725_0        conda-forge
    cloudpickle:         2.2.1-pyhd8ed1ab_0             conda-forge
    colorama:            0.4.6-pyhd8ed1ab_0             conda-forge
    cramjam:             2.7.0-py311h299eb51_0          conda-forge
    cryptography:        41.0.3-py311h892b619_0         conda-forge
    cython:              0.29.36-py311hdf8f085_0        conda-forge
    exceptiongroup:      1.1.3-pyhd8ed1ab_0             conda-forge
    fastparquet:         2023.8.0-py311h4a70a88_0       conda-forge
    frozenlist:          1.4.0-py311h2725bcf_0          conda-forge
    fsspec:              2023.9.0-pyh1a96a4e_0          conda-forge
    gflags:              2.2.2-hb1e8313_1004            conda-forge
    glog:                0.6.0-h8ac2a54_0               conda-forge
    hypothesis:          6.84.3-pyha770c72_0            conda-forge
    idna:                3.4-pyhd8ed1ab_0               conda-forge
    iniconfig:           2.0.0-pyhd8ed1ab_0             conda-forge
    jmespath:            1.0.1-pyhd8ed1ab_0             conda-forge
    krb5:                1.21.2-hb884880_0              conda-forge
    libabseil:           20230802.0-cxx17_h048a20a_3    conda-forge
    libarrow:            14.0.0-hce17ce0_4_cpu          local      
    libblas:             3.9.0-18_osx64_openblas        conda-forge
    libbrotlicommon:     1.1.0-h0dc2134_0               conda-forge
    libbrotlidec:        1.1.0-h0dc2134_0               conda-forge
    libbrotlienc:        1.1.0-h0dc2134_0               conda-forge
    libcblas:            3.9.0-18_osx64_openblas        conda-forge
    libcrc32c:           1.1.2-he49afe7_0               conda-forge
    libcurl:             8.3.0-h5f667d7_0               conda-forge
    libcxx:              16.0.6-hd57cbcb_0              conda-forge
    libedit:             3.1.20191231-h0678c8f_2        conda-forge
    libev:               4.33-haf1e3a3_1                conda-forge
    libevent:            2.1.12-ha90c15b_1              conda-forge
    libexpat:            2.5.0-hf0c8a7f_1               conda-forge
    libffi:              3.4.2-h0d85af4_5               conda-forge
    libgfortran:         5.0.0-13_2_0_h97931a8_1        conda-forge
    libgfortran5:        13.2.0-h2873a65_1              conda-forge
    libgoogle-cloud:     2.12.0-hc7e40ee_2              conda-forge
    libgrpc:             1.57.0-ha2534ac_1              conda-forge
    liblapack:           3.9.0-18_osx64_openblas        conda-forge
    libllvm14:           14.0.6-hc8e404f_4              conda-forge
    libnghttp2:          1.52.0-he2ab024_0              conda-forge
    libopenblas:         0.3.24-openmp_h48a4ad5_0       conda-forge
    libprotobuf:         4.23.4-he0c2237_6              conda-forge
    libsqlite:           3.43.0-h58db7d2_0              conda-forge
    libssh2:             1.11.0-hd019ec5_0              conda-forge
    libthrift:           0.19.0-h88b220a_0              conda-forge
    libutf8proc:         2.8.0-hb7f2c08_0               conda-forge
    libzlib:             1.2.13-h8a1eda9_5              conda-forge
    llvm-openmp:         16.0.6-hff08bdf_0              conda-forge
    llvmlite:            0.40.1-py311hcbb5c6d_0         conda-forge
    lz4-c:               1.9.4-hf0c8a7f_0               conda-forge
    minio-server:        2023.08.23.10.07.06-h8857fd0_3 conda-forge
    multidict:           6.0.4-py311h5547dcb_0          conda-forge
    ncurses:             6.4-hf0c8a7f_0                 conda-forge
    numba:               0.57.1-py311h5a8220d_0         conda-forge
    numpy:               1.24.4-py311hc44ba51_0         conda-forge
    openssl:             3.1.2-h8a1eda9_0               conda-forge
    orc:                 1.9.0-ha4ae40d_2               conda-forge
    packaging:           23.1-pyhd8ed1ab_0              conda-forge
    pandas:              2.1.0-py311hab14417_0          conda-forge
    pluggy:              1.3.0-pyhd8ed1ab_0             conda-forge
    pyarrow:             14.0.0-py311hcbd9da5_4_cpu     local      
    pyarrow-tests:       14.0.0-py311hcbd9da5_4_cpu     local      
    pycparser:           2.21-pyhd8ed1ab_0              conda-forge
    pyopenssl:           23.2.0-pyhd8ed1ab_1            conda-forge
    pysocks:             1.7.1-pyha2e5f31_6             conda-forge
    pytest:              7.4.2-pyhd8ed1ab_0             conda-forge
    pytest-lazy-fixture: 0.6.3-py_0                     conda-forge
    python:              3.11.5-h30d4d87_0_cpython      conda-forge
    python-dateutil:     2.8.2-pyhd8ed1ab_0             conda-forge
    python-tzdata:       2023.3-pyhd8ed1ab_0            conda-forge
    python_abi:          3.11-3_cp311                   conda-forge
    pytz:                2023.3.post1-pyhd8ed1ab_0      conda-forge
    re2:                 2023.03.02-h096449b_0          conda-forge
    readline:            8.2-h9e318b2_1                 conda-forge
    s3fs:                2023.9.0-pyhd8ed1ab_0          conda-forge
    s3transfer:          0.6.2-pyhd8ed1ab_0             conda-forge
    scipy:               1.11.2-py311h16c3c4d_1         conda-forge
    setuptools:          68.2.2-pyhd8ed1ab_0            conda-forge
    six:                 1.16.0-pyh6c4a22f_0            conda-forge
    snappy:              1.1.10-h225ccf5_0              conda-forge
    sortedcontainers:    2.4.0-pyhd8ed1ab_0             conda-forge
    sparse:              0.14.0-pyhd8ed1ab_0            conda-forge
    tk:                  8.6.12-h5dbffcc_0              conda-forge
    tomli:               2.0.1-pyhd8ed1ab_0             conda-forge
    typing-extensions:   4.7.1-hd8ed1ab_0               conda-forge
    typing_extensions:   4.7.1-pyha770c72_0             conda-forge
    tzdata:              2023c-h71feb2d_0               conda-forge
    urllib3:             1.26.15-pyhd8ed1ab_0           conda-forge
    wrapt:               1.15.0-py311h5547dcb_0         conda-forge
    xz:                  5.2.6-h775f41a_0               conda-forge
    yarl:                1.9.2-py311h2725bcf_0          conda-forge
    zstd:                1.5.5-h829000d_0               conda-forge

@h-vetinari
Copy link
Member

Given the segfaults with pyarrow + sparse + numba + llvmlite when using shared libllvm, I think we should proceed with linking llvm statically here, as strongly suggested by @Hardcode84 - I feel we're not affording sufficient weight to the opinion of upstream maintainers here; it's fine to insist on our way as long as things work, but here they clearly don't.

Those objecting to linking statically would need to at least propose a solution that avoids those segfaults for arrow (link statically there?), but overall I find these segfaults to be pretty compelling evidence for making a CFEP-18 exception, because of course they could be happening in other circumstances as well.

@isuruf
Copy link
Member

isuruf commented Jan 5, 2024

Do you have a reproducer I can take a look at?

@h-vetinari
Copy link
Member

h-vetinari commented Jan 5, 2024

Do you have a reproducer I can take a look at?

Uncomment this line in the arrow recipe (resp. co-install pyarrow-tests with sparse and run the test suite). This should segfault, at least on osx-64. If you then downgrade the llvm-version used to build arrow to match the one from llvmlite, things should pass again (this was pretty much the test I did in September, described in the comment above).

@h-vetinari
Copy link
Member

@isuruf, have you had a chance to look at the llvmlite crashes with arrow+sparse?

@h-vetinari
Copy link
Member

How can we move this topic forward? AFAICT llvmlite still hasn't been rebuilt against the SVML-enabled libllvm.

Given the lack of progress here for ~9 month, I will urge that we build against static libllvm, unless & until someone debugs the issues that are being surfaced in arrow+sparse. If someone still disagrees, let's discuss it in the next core call.

@jakirkham
Copy link
Member

Feel free to add to the agenda for tomorrow

Regardless of what we do, think it would help to have a current reproducer

Attempted to follow this step ( #72 (comment) ) above: conda-forge/pyarrow-feedstock#121

Though please feel free to push other changes there. Ideally we would reproduce this issue on CI so we can share a bit more (would like to ask others to take a look)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants