Fix for CUDA Toolkit packages containing incorrect RPATH #10

jakirkham · 2023-11-30T22:09:03Z

Introduction

We recently became aware of an issue in the cuda-nvtx-feedstock where the RPATHs in the libraries in the package were incorrect ( conda-forge/cuda-nvtx-feedstock#2 ). These incorrect RPATHs are the result of the directory layout used for CUDA packages. All distributions of CUDA place their contents in a top-level targets directory with various subdirectories for different architectures to better support cross-compilation. The CUDA packages on conda-forge mimic this structure, but to support standard runtime library use cases, the library contents of CUDA packages are symlinked into the top-level lib directory. The problem is that due to how $ORIGIN is handled for symlinks, the RPATHs are set relative to the true library location at build time, but at runtime $ORIGIN is the location of the symlink rather than the true library location, and as a result at runtime the RPATHs result in package searches outside of the environment.

We would like to maintain the targets layout because it matches how CUDA is provided in other distributions. This also means we want to keep the real libraries in the targets directory rather than placing them directly in lib. We would also like to avoid ballooning the package size or adding any RPATHs that point outside the environment since that is broken at best and dangerous at worst. To satisfy all of these constraints, our proposed solution is to manually set the RPATH to $ORIGIN with patchelf during the conda package build step on all the libraries in the targets directory. At runtime, the RPATH setting of $ORIGIN will resolve to $PREFIX/lib, producing the desired behavior. There are some potential caveats to how this may work within the context of conda-build, as we discuss below, but we have verified that this produces the desired runtime results.

Problem Statement

The CTK packages are structured to have a runtime package and a -dev package.
- Example: cuda-nvtx-dev & cuda-nvtx. The runtime package, cuda-nvtx, contains the libraries.
- The -dev package has a dependency on the runtime package so that these libraries are available at build time.
Library files are in paths like $PREFIX/targets/<arch>/*.so*.
- These are used to link against at build time.
- This is the preferred location, because it communicates that we have a cross-compiler-friendly library location and matches CUDA packages in other distribution forms.
Library files are symlinked into $PREFIX/lib/*.so*
Conda-build detects the .so files in the deeper folder, $PREFIX/targets/<arch>/lib.
- It sets the RPATH to be $ORIGIN/../../../lib.
At runtime, the symlink is found in $PREFIX/lib.
1. The library at $PREFIX/targets/<arch>/lib is loaded.
2. Its RPATH is $ORIGIN/../../../lib
3. $ORIGIN is considered to be $PREFIX/lib
4. Due to this, the library search path goes outside of the environment.

This can result in a functioning environment, if either:

The environment is not the base environment
The environment is contained within the base environment’s envs folder
The base environment contains compatible libraries

Or:

The compatible libraries are accessible via LD_LIBRARY_PATH or standard ld.so search paths.

If either of those cases are not met, the environment will not be functional.

Our Solution

Keep existing file locations and symlink direction
- Actual library files in targets/…/lib
- Symlink in lib for each CUDA library that points to the library in ../targets/<arch>/…
Use patchelf to set RPATH to $ORIGIN for libraries $PREFIX/targets/<arch>/*.so*
Set build: binary_relocation: false so that conda-build doesn’t otherwise change the RPATHs of these libraries
At runtime, loading libraries from their symlinks in $PREFIX/lib will look for libraries adjacent to the symlink in $PREFIX/lib.
- This is key to the NVIDIA libraries loading Conda’slibstdc++, instead of the system libstdc++.
- This relies on the assumption that libraries are never loaded at runtime from that targets/…/lib folder.
- The only functional runtime approach is to load them from $PREFIX/lib.
Conda-build’s missing DSO detection needs to be disabled by setting error_overlinking to false

Justification

This approach aligns more closely with how the CUDA Toolkit is distributed outside of conda than the alternatives we considered below. It also avoids unnecessarily bloating the package.

Considered Alternatives

Reverse symlink direction

This was originally proposed by @isuruf in reverse symlinks cuda-nvtx-feedstock#3
Instead of having the library files reside in targets/…/lib, the actual library file would be placed in $PREFIX/lib, and the symlink would be created in the targets/…/lib folder
Conda-build will detect the library’s location, and set RPATH to $ORIGIN/../lib
Loading the library from the targets/…/lib folder is broken, similar to the proposed solution. Library can only be loaded correctly from the $PREFIX/lib location.
Conda-build’s missing DSO detection should work correctly under this scheme.

Comments

This approach would result in a different CUDA Toolkit layout in Conda compared to other distributions. Alignment across CUDA Toolkit distributions is important for libraries using CUDA to have similar expectations and behaviors both inside and outside of conda environments.

Duplicating library in both locations

Instead of symlinking, the library files would be contained in both the *-dev and the runtime packages. It would exist in the targets/…/lib location in the *-dev package, and in $PREFIX/lib in the runtime package.
Conda-build would detect and correctly set RPATH in both instances.
- The library in the *-dev package would have an RPATH of $ORIGIN/../../../lib, which evaluates to $PREFIX/lib. Loading of sibling libraries in the targets folder would rely on fallback to RUNPATH, which is $ORIGIN.
- The library in the runtime package would have an RPATH of $ORIGIN/../lib, which again evaluates to $PREFIX/lib. Sibling libraries are present in this same folder in this package, so the fallback to RUNPATH doesn’t come into play.
Conda-build’s missing DSO detection should work correctly under this scheme.

Comments

The cuda metapackage makes the assumption that both build-time and run-time components are provided. Because we duplicate libraries in these packages between the -devel and runtime packages, the effective size of the cuda metapackage would be roughly doubled. This is prohibitive. Additionally, having -dev and -runtime variants of a metapackage is not favorable, because it would differ from other ways of distributing CUDA.

The text was updated successfully, but these errors were encountered:

jakirkham · 2023-12-07T17:43:15Z

All fixes have been merged. Closing as completed

jakirkham · 2024-07-23T19:42:13Z

Reopening to look at bin where it appears similar work may be needed

jakirkham · 2024-08-13T19:51:21Z

cc @billysuh7 (to look at doing the same thing for binaries in a couple weeks)

jakirkham · 2024-10-15T21:35:46Z

billysuh7 · 2024-11-13T02:36:12Z

nsight-compute may not require any action. All the 'not found' warnings are against 32-bit ELF DSO files which are false positives. See bug conda-forge/nsight-compute-feedstock#33

jakirkham mentioned this issue Nov 30, 2023

RPATHs are wrong. conda-forge/cuda-nvtx-feedstock#2

Closed

1 task

jakirkham closed this as completed Dec 7, 2023

jakirkham mentioned this issue Dec 14, 2023

libnppc.so.12.0.0.30: ELF load command address/offset not properly aligned conda-forge/libnpp-feedstock#2

Closed

jakirkham mentioned this issue Jan 25, 2024

Update for CUDA 12.3.2 conda-forge/cuda-sanitizer-api-feedstock#5

Merged

jakirkham mentioned this issue Feb 8, 2024

Question about rpath patching conda-forge/cuda-cudart-feedstock#21

Closed

JeanChristopheMorinPerso mentioned this issue Feb 8, 2024

Pkg 3792: initial feedstock, v12.3 AnacondaRecipes/cuda-cudart-feedstock#1

Merged

robertmaynard mentioned this issue Feb 22, 2024

Continued incorrect rpath fixups for libraries symlinked into $PREFIX/lib conda/conda-build#5198

Open

2 tasks

jakirkham reopened this Jul 23, 2024

This was referenced Aug 6, 2024

Patch nvprof so it doesn't link outside the conda environment conda-forge/cuda-nvprof-feedstock#12

Merged

Use rdma-core package (instead of CDT) & add to linux_aarch64 conda-forge/libcufile-feedstock#14

Merged

sisodia1701 assigned billysuh7 Aug 20, 2024

jakirkham unassigned billysuh7 Oct 1, 2024

jakirkham assigned billysuh7 Oct 15, 2024

jakirkham mentioned this issue Oct 15, 2024

Add target libraries to executable's RPATH conda-forge/cuda-cuobjdump-feedstock#15

Merged

5 tasks

jakirkham mentioned this issue Oct 29, 2024

Add lib folders to executable's RPATH conda-forge/cuda-cuxxfilt-feedstock#14

Merged

5 tasks

carterbox mentioned this issue Nov 12, 2024

NEW: Add libnvjpeg2k and libnvtiff conda-forge/staged-recipes#28142

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for CUDA Toolkit packages containing incorrect RPATH #10

Fix for CUDA Toolkit packages containing incorrect RPATH #10

jakirkham commented Nov 30, 2023

jakirkham commented Dec 7, 2023

jakirkham commented Jul 23, 2024

jakirkham commented Aug 13, 2024

jakirkham commented Oct 15, 2024 •

edited by billysuh7

Loading

billysuh7 commented Nov 13, 2024

Fix for CUDA Toolkit packages containing incorrect RPATH #10

Fix for CUDA Toolkit packages containing incorrect RPATH #10

Comments

jakirkham commented Nov 30, 2023

Introduction

Problem Statement

This can result in a functioning environment, if either:

If either of those cases are not met, the environment will not be functional.

Our Solution

Justification

Considered Alternatives

Reverse symlink direction

Comments

Duplicating library in both locations

Comments

jakirkham commented Dec 7, 2023

jakirkham commented Jul 23, 2024

jakirkham commented Aug 13, 2024

jakirkham commented Oct 15, 2024 • edited by billysuh7 Loading

billysuh7 commented Nov 13, 2024

jakirkham commented Oct 15, 2024 •

edited by billysuh7

Loading