Add cuda-nvcc-impl recipe #22802

adibbley · 2023-05-12T14:44:59Z

xref: conda-forge/cuda-nvcc-feedstock#12

Following this PR cuda-nvcc-feedstock will need to be updated as:

- cuda-nvcc_{{ cross_target_platform }}
  - Meta-package that has the activation script
  - requirements:
     - cuda-nvcc-tools
     - cuda-nvcc-dev_{{ cross_target_platform }}
     - cuda-nvcc-impl      (if native. needed for CMake)
  - files:
      - etc/conda/activate.d/
      - etc/conda/deactivate.d/

- cuda-nvcc
  - requirements:
     - cuda-nvcc-impl
  - files:
      - etc/conda/activate.d/
      - etc/conda/deactivate.d/

Checklist

conda-forge-webservices · 2023-05-12T14:45:17Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/cuda-nvcc-impl) and found it was in an excellent condition.

adibbley · 2023-05-12T15:24:03Z

cc @isuruf @jakirkham

recipes/cuda-nvcc-impl/meta.yaml

recipes/cuda-nvcc-impl/nvcc.profile.for_prefix_bin

Co-authored-by: Isuru Fernando <[email protected]>

recipes/cuda-nvcc-impl/meta.yaml

Co-authored-by: Isuru Fernando <[email protected]>

recipes/cuda-nvcc-impl/meta.yaml

Co-authored-by: Isuru Fernando <[email protected]>

recipes/cuda-nvcc-impl/meta.yaml

Co-authored-by: Isuru Fernando <[email protected]>

recipes/cuda-nvcc-impl/meta.yaml

recipes/cuda-nvcc-impl/nvcc.profile.for_prefix_bin

Co-authored-by: Isuru Fernando <[email protected]>

isuruf · 2023-05-12T23:23:33Z

@adibbley, pushed a couple of commits. Let me know what you think

adibbley · 2023-05-15T12:07:57Z

@robertmaynard can you take a look at aaba0ec as well please? The cmake tests are passing, but not sure if there is something else that should be checked.

robertmaynard · 2023-05-15T12:48:57Z

@robertmaynard can you take a look at aaba0ec as well please? The cmake tests are passing, but not sure if there is something else that should be checked.

Sure I will look at the latest commit and report back

robertmaynard · 2023-05-15T15:19:00Z

There looks to be a real regression with this layout where device linking (-dlink ) + device lto ( -dlto ) fails. Taking the input of the linking command and using a local cuda-12.0 linker works, or using the existing cuda-nvcc package.

looking at the verbose output of the linking we see:

#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/home/rmaynard/miniconda3/envs/cuda_tk12/bin
#$ _THERE_=/home/rmaynard/miniconda3/envs/cuda_tk12/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux
#$ NVVMIR_LIBRARY_DIR=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm/libdevice
#$ LD_LIBRARY_PATH=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib:/usr/local/cuda/lib64
#$ PATH=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/../../bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/../../nvvm/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin:/home/rmaynard/miniconda3/condabin:/home/rmaynard/.local/bin:/usr/local/cuda/bin:/home/rmaynard/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/home/rmaynard/.cargo/bin
#$ INCLUDES="-I/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/include"  
#$ LIBRARIES=  "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib/stubs" "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ nvlink -m64  --shared --arch=sm_50 --register-link-binaries="/tmp/tmpxft_00013268_00000000-3_cmake_device_link.reg.c"  -w -lcudadevrt -lcudart_static -lrt -lpthread -ldl   "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib/stubs" "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib" -cpu-arch=X86_64 -report-arch -dlto -nvvmpath="/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm" "CMakeFiles/CudaOnlyDeviceLTO.dir/main.cu.o" "libCUDA_dlto.a"  -lcudadevrt  -o "/tmp/tmpxft_00013268_00000000-5_cmake_device_link.compute_50.cubin" --host-ccbin "/home/rmaynard/miniconda3/envs/cuda_tk12/bin/x86_64-conda-linux-gnu-c++"
nvlink fatal   : elfLink linker library load error (target: sm_50)
# --error 0x1 --

Still unclear to me on what has changed in the enviornment compared to the existing cuda-nvcc package that caused this failure. Currently looking in the strace outputs of the nvlink execution.

recipes/cuda-nvcc-impl/meta.yaml

robertmaynard · 2023-05-15T19:42:48Z

There looks to be a real regression with this layout where device linking (-dlink ) + device lto ( -dlto ) fails. Taking the input of the linking command and using a local cuda-12.0 linker works, or using the existing cuda-nvcc package.

looking at the verbose output of the linking we see:

#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/home/rmaynard/miniconda3/envs/cuda_tk12/bin
#$ _THERE_=/home/rmaynard/miniconda3/envs/cuda_tk12/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux
#$ NVVMIR_LIBRARY_DIR=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm/libdevice
#$ LD_LIBRARY_PATH=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib:/usr/local/cuda/lib64
#$ PATH=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/../../bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/../../nvvm/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin:/home/rmaynard/miniconda3/condabin:/home/rmaynard/.local/bin:/usr/local/cuda/bin:/home/rmaynard/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/home/rmaynard/.cargo/bin
#$ INCLUDES="-I/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/include"  
#$ LIBRARIES=  "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib/stubs" "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ nvlink -m64  --shared --arch=sm_50 --register-link-binaries="/tmp/tmpxft_00013268_00000000-3_cmake_device_link.reg.c"  -w -lcudadevrt -lcudart_static -lrt -lpthread -ldl   "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib/stubs" "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib" -cpu-arch=X86_64 -report-arch -dlto -nvvmpath="/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm" "CMakeFiles/CudaOnlyDeviceLTO.dir/main.cu.o" "libCUDA_dlto.a"  -lcudadevrt  -o "/tmp/tmpxft_00013268_00000000-5_cmake_device_link.compute_50.cubin" --host-ccbin "/home/rmaynard/miniconda3/envs/cuda_tk12/bin/x86_64-conda-linux-gnu-c++"
nvlink fatal   : elfLink linker library load error (target: sm_50)
# --error 0x1 --

Still unclear to me on what has changed in the enviornment compared to the existing cuda-nvcc package that caused this failure. Currently looking in the strace outputs of the nvlink execution.

The issue seems that the nvvm/lib64/libnvvm.so.4.0.0 differs in size ( and not considereed a shared lib ) after I built this locally versus the existing cuda-nvcc.
I built for the generix linux64 platform.

Working cuda-nvcc info:

$ ldd /home/rmaynard/miniconda3/envs/old_cuda12_working/bin/../targets/x86_64-linux//nvvm/lib64/libnvvm.so.4.0.0
   linux-vdso.so.1 (0x00007ffe0055b000)
   libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f13c0ff3000)
   librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f13c0fe9000)
   libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f13c0fe3000)
   libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f13c0e94000)
   libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f13c0ca2000)
   /lib64/ld-linux-x86-64.so.2 (0x00007f13c2cc3000)
   
$ ll /home/rmaynard/miniconda3/envs/old_cuda12_working/bin/../targets/x86_64-linux//nvvm/lib64/
total 26808
drwxrwxr-x 2 rmaynard rmaynard     4096 May 15 11:05 ./
drwxrwxr-x 6 rmaynard rmaynard     4096 May 15 11:05 ../
lrwxrwxrwx 1 rmaynard rmaynard       16 May 15 11:05 libnvvm.so -> libnvvm.so.4.0.0*
lrwxrwxrwx 1 rmaynard rmaynard       16 May 15 11:05 libnvvm.so.4 -> libnvvm.so.4.0.0*
-rwxrwxr-x 3 rmaynard rmaynard 27440865 Apr 21 12:57 libnvvm.so.4.0.0*

Now on the failing conda env that uses this pr.

$ ldd /home/rmaynard/miniconda3/envs/cuda_tk12/nvvm/lib64/libnvvm.so.4.0.0 
	not a dynamic executable
$ ll /home/rmaynard/miniconda3/envs/cuda_tk12/nvvm/lib64/
total 26808
drwxrwxr-x 2 rmaynard rmaynard     4096 May 15 15:35 ./
drwxrwxr-x 6 rmaynard rmaynard     4096 May 15 15:35 ../
lrwxrwxrwx 1 rmaynard rmaynard       16 May 15 15:35 libnvvm.so -> libnvvm.so.4.0.0*
lrwxrwxrwx 1 rmaynard rmaynard       16 May 15 15:35 libnvvm.so.4 -> libnvvm.so.4.0.0*
-rwxrwxr-x 2 rmaynard rmaynard 27440857 May 15 10:12 libnvvm.so.4.0.0*```

recipes/cuda-nvcc-impl/meta.yaml

robertmaynard · 2023-05-15T20:19:55Z

The removal of running patchelf on libnvvm has allowed all the CMake CUDA Tests to pass locally.

I have a minor concern that we should symlink everything ( from the CUDA toolkit ) from $PREFIX/bin into $PREFIX/<target>/bin so that projects that are CMake based and use CUDAToolkit_BIN_DIR will still work ( e.g. CUDAToolkit_BIN_DIR/bin2c works ). But we can spin that request out into an issue if needed.

isuruf · 2023-05-15T20:21:45Z

Can you also open an issue in CMake to fix the layout issue, so that we can remove this hack sometime in the future?

robertmaynard · 2023-05-15T20:55:37Z

Can you also open an issue in CMake to fix the layout issue, so that we can remove this hack sometime in the future?

Sure can. Would you consider this an accurate summary of the issue:

CMake should only infer location of `nvcc` from `#$ TOP` output from compiler. Infer include directoy instead from `#$ INCLUDES`, and libraries from `#$ LIBRARIES`.  This aligns them with the existing behavior `nvvm` ( see `_CUDA_NVVMIR_LIBRARY_DIR` ).

This will allow fully splayed layouts of the CUDA toolkit to be supported, like the ones that conda-forge provides going forward with CUDA 12.

Currently wrong compiler detection variables with conda's desired CUDA 12.+ layout:

- CMAKE_CUDA_COMPILER_TOOLKIT_LIBRARY_ROOT currently is "$CONDA_PREFIX" should be "$CONDA_PREFIX/<target>/lib"
- CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES  currently is " "$CONDA_PREFIX/include" should be "$CONDA_PREFIX/<target>/include/"

isuruf · 2023-05-15T21:02:42Z

I would argue that TOP should not be used by CMake at all. It's just a variable without any meaning.

robertmaynard · 2023-05-15T21:48:27Z

I would argue that TOP should not be used by CMake at all. It's just a variable without any meaning.

We need to parse some compiler output from nvcc to determine the proper 'root' for cuda toolkit executables ( nvcc, fatbinary, etc ).

We can't use something like which nvcc since that doesn't punch through compiler wrapper symlinks ( ccache, gcc colors, etc ). In addition thinigs like the NVHPC have two different nvcc executables ( not symlinks, no nvcc.profiles ) but only a single location for the other executables such as fatbinary.

root@bfe5e65a3f4a:/host_pwd# which nvcc
/opt/nvidia/hpc_sdk/Linux_x86_64/23.1/compilers/bin/nvcc
root@bfe5e65a3f4a:/host_pwd# ll /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/compilers/bin/ | grep fat
root@bfe5e65a3f4a:/host_pwd# /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/compilers/bin/nvcc -v test.cu
#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/opt/nvidia/hpc_sdk/Linux_x86_64/23.1/cuda/12.0/bin
#$ _THERE_=/opt/nvidia/hpc_sdk/Linux_x86_64/23.1/cuda/12.0/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/opt/nvidia/hpc_sdk/Linux_x86_64/23.1/cuda/12.0/bin/..

_HERE_ and _THERE_ don't work since they don't see through symlinks and can't be influenced by the nvcc.profile ( we used to use _HERE_). So our options are TOP or PATH and we went with TOP since it looked easier ( no need to iterate ).

isuruf · 2023-05-15T21:51:04Z

Yeah, I was thinking PATH would be a better option for bin/ just like INCLUDES is a better option for include/.

jakirkham

Thanks all! 🙏

Sounds like we are happy with this change

Next step is to update PR ( conda-forge/cuda-nvcc-feedstock#13 ) to use these packages once they are produced

isuruf · 2023-05-16T18:16:02Z

recipes/cuda-nvcc-impl/meta.yaml

+      - include/crt                    # [linux]
+      - include/fatbinary_section.h    # [linux]
+      - include/nvPTXCompiler.h        # [linux]


Are these three symlinks needed?

They shouldn't be needed based on testing that happened against prior iterations without them.
Removed them here now.

jakirkham · 2023-05-16T20:45:04Z

Thanks all! 🙏

Let's follow up on anything else in the feedstock

Next step is to update PR ( conda-forge/cuda-nvcc-feedstock#13 ) based on these changes

adibbley added 3 commits May 12, 2023 09:51

Add cuda-nvcc-impl recipe

1b9cd17

Move tests to top level

226db0b

Move cudart requirement to impl package

cfb65e7

Merge branch 'main' into cuda-nvcc-impl

d679b23

adibbley marked this pull request as ready for review May 12, 2023 15:17

adibbley mentioned this pull request May 12, 2023

Package layout conda-forge/cuda-nvcc-feedstock#12

Closed