-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuDF: Need to canonicalize dlopen'd library names #12708
Comments
What is the right approach to do this when we want to build libcudf artifacts for both CUDA 11 and 12? Will we need to guard the dlopens with preprocessor checks of |
The right thing to do is to dlopen |
Right, but say I am compiling libcudf twice, once for CUDA 11 and once for CUDA 12. Is the expectation that we do something like
in the code? |
This seems to defeat the purposes of dlopen where we want to loosely couple to the shared library at build time. |
It's a bit tricky because we want to couple loosely in some ways (maybe it's OK for the library not to exist, or maybe we're OK with using any minor version available) but we still need to couple tightly in other ways (if our main binary is compiled against CUDA 12 then we probably have to use CUDA 12 versions of the dlopened libraries to guarantee ABI compatibility). But I'm not sure if my snippet is the best solution, or if you would still include additional runtime checks for different versions to provide more useful error messages, or something else. |
cc @vuule (who may have more thoughts on this) |
Pretty much on the same page as Vyas here. If we generate the .so name at compile time based on the runtime version, we can use it in |
This isn't the approach the CTK takes. The SOVERSION values of CTK libraries is not tightly coupled to the CTK major version. For example libcufle in CTK 12.0 ships If we had to encode this we need to parse the |
Suppose we could create a shim library that links against the right This might strike the balance Vyas described above. Namely linking to the version that we built and tested against (so have some confidence has the feature set we need to run) while simultaneously handling the case were Should also handle the linking to the right SOVERSION as Rob mentioned (without hopefully needing to bake too much logic about CTK versioning in cuDF). Thoughts? 🙂 |
I suppose that a shim library could work. I kind of dislike the idea of having a bunch of shim libraries for all the dlopens in RAPIDS… but it does seem that it would solve the problem. I think I’d rather handle the problem directly with hardcoded versions even if it requires manual updates and fallback logic. Would it be possible to overcome the problem that @vuule mentioned (below) by explicitly testing that the dlopen succeeds? I don’t think we can be guaranteed that new versions would work out of the box, so having manual/hardcoded logic to enable new versions doesn’t sound too awful to me. The net LOC should be lower than the shim approach and it would be more self-contained in libcudf source code.
|
I think we would always have to dlopen/dlsym so that we don't get a |
If the libraries are not lying about their ABI compatibility, then hard-coded versions "just work", no? e.g. in 11.8, libcufft is major versioned as .10, whereas in 12.0 it is versioned as .11. This says to me that these two libraries are not ABI compatible, and so it would be unsafe to dlopen the unversioned In cudf, the dlopen of |
Yeah we discussed this today and the consensus was to just hard-code for now. There is only one version to track atm. So that seems like the simplest fix. We can revisit when there is a SOVERSION bump, which will likely correspond to a new CUDA 12.x that we try to support. |
Closes #12708 Authors: - Ashwin Srinath (https://github.com/shwina) - Lawrence Mitchell (https://github.com/wence-) Approvers: - Bradley Dice (https://github.com/bdice) - Divye Gala (https://github.com/divyegala) - Vukasin Milovanovic (https://github.com/vuule) URL: #13210
Documenting an offline discussion with @jakirkham in preparation of CUDA 12 bring-up on conda-forge.
Currently, there're some places where
libXXXXX.so
without any SONAME being dlopen'd, for example,cudf/cpp/src/io/utilities/file_io_utilities.cpp
Line 120 in c7db81a
There will become problematic because the
libXXXXX.so
symlink is supposed to exist only in thelibXXXXX-dev
package by the stand practice of Linux distros, not inlibXXXXX
which only containslibXXXXX.so.{major}
. The dlopen'd names need to be canonicalized.The text was updated successfully, but these errors were encountered: