Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cuda-nvcc-impl recipe #22802

Merged
merged 19 commits into from
May 16, 2023
Merged

Add cuda-nvcc-impl recipe #22802

merged 19 commits into from
May 16, 2023

Conversation

adibbley
Copy link
Contributor

@adibbley adibbley commented May 12, 2023

xref: conda-forge/cuda-nvcc-feedstock#12

Following this PR cuda-nvcc-feedstock will need to be updated as:

- cuda-nvcc_{{ cross_target_platform }}
  - Meta-package that has the activation script
  - requirements:
     - cuda-nvcc-tools
     - cuda-nvcc-dev_{{ cross_target_platform }}
     - cuda-nvcc-impl      (if native. needed for CMake)
  - files:
      - etc/conda/activate.d/
      - etc/conda/deactivate.d/

- cuda-nvcc
  - requirements:
     - cuda-nvcc-impl
  - files:
      - etc/conda/activate.d/
      - etc/conda/deactivate.d/

Checklist

  • Title of this PR is meaningful: e.g. "Adding my_nifty_package", not "updated meta.yaml".
  • License file is packaged (see here for an example).
  • Source is from official source.
  • Package does not vendor other packages. (If a package uses the source of another package, they should be separate packages or the licenses of all packages need to be packaged).
  • If static libraries are linked in, the license of the static library is packaged.
  • Package does not ship static libraries. If static libraries are needed, follow CFEP-18.
  • Build number is 0.
  • A tarball (url) rather than a repo (e.g. git_url) is used in your recipe (see here for more details).
  • GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there.
  • When in trouble, please check our knowledge base documentation before pinging a team.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/cuda-nvcc-impl) and found it was in an excellent condition.

@adibbley adibbley marked this pull request as ready for review May 12, 2023 15:17
@adibbley
Copy link
Contributor Author

cc @isuruf @jakirkham

recipes/cuda-nvcc-impl/meta.yaml Outdated Show resolved Hide resolved
recipes/cuda-nvcc-impl/meta.yaml Outdated Show resolved Hide resolved
recipes/cuda-nvcc-impl/meta.yaml Show resolved Hide resolved
@isuruf
Copy link
Member

isuruf commented May 12, 2023

@adibbley, pushed a couple of commits. Let me know what you think

@adibbley
Copy link
Contributor Author

@robertmaynard can you take a look at aaba0ec as well please? The cmake tests are passing, but not sure if there is something else that should be checked.

@robertmaynard
Copy link
Contributor

@robertmaynard can you take a look at aaba0ec as well please? The cmake tests are passing, but not sure if there is something else that should be checked.

Sure I will look at the latest commit and report back

@robertmaynard
Copy link
Contributor

There looks to be a real regression with this layout where device linking (-dlink ) + device lto ( -dlto ) fails. Taking the input of the linking command and using a local cuda-12.0 linker works, or using the existing cuda-nvcc package.

looking at the verbose output of the linking we see:

#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/home/rmaynard/miniconda3/envs/cuda_tk12/bin
#$ _THERE_=/home/rmaynard/miniconda3/envs/cuda_tk12/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux
#$ NVVMIR_LIBRARY_DIR=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm/libdevice
#$ LD_LIBRARY_PATH=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib:/usr/local/cuda/lib64
#$ PATH=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/../../bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/../../nvvm/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin:/home/rmaynard/miniconda3/condabin:/home/rmaynard/.local/bin:/usr/local/cuda/bin:/home/rmaynard/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/home/rmaynard/.cargo/bin
#$ INCLUDES="-I/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/include"  
#$ LIBRARIES=  "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib/stubs" "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ nvlink -m64  --shared --arch=sm_50 --register-link-binaries="/tmp/tmpxft_00013268_00000000-3_cmake_device_link.reg.c"  -w -lcudadevrt -lcudart_static -lrt -lpthread -ldl   "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib/stubs" "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib" -cpu-arch=X86_64 -report-arch -dlto -nvvmpath="/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm" "CMakeFiles/CudaOnlyDeviceLTO.dir/main.cu.o" "libCUDA_dlto.a"  -lcudadevrt  -o "/tmp/tmpxft_00013268_00000000-5_cmake_device_link.compute_50.cubin" --host-ccbin "/home/rmaynard/miniconda3/envs/cuda_tk12/bin/x86_64-conda-linux-gnu-c++"
nvlink fatal   : elfLink linker library load error (target: sm_50)
# --error 0x1 --

Still unclear to me on what has changed in the enviornment compared to the existing cuda-nvcc package that caused this failure. Currently looking in the strace outputs of the nvlink execution.

@robertmaynard
Copy link
Contributor

There looks to be a real regression with this layout where device linking (-dlink ) + device lto ( -dlto ) fails. Taking the input of the linking command and using a local cuda-12.0 linker works, or using the existing cuda-nvcc package.

looking at the verbose output of the linking we see:

#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/home/rmaynard/miniconda3/envs/cuda_tk12/bin
#$ _THERE_=/home/rmaynard/miniconda3/envs/cuda_tk12/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux
#$ NVVMIR_LIBRARY_DIR=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm/libdevice
#$ LD_LIBRARY_PATH=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib:/usr/local/cuda/lib64
#$ PATH=/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/../../bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/../../nvvm/bin:/home/rmaynard/miniconda3/envs/cuda_tk12/bin:/home/rmaynard/miniconda3/condabin:/home/rmaynard/.local/bin:/usr/local/cuda/bin:/home/rmaynard/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/home/rmaynard/.cargo/bin
#$ INCLUDES="-I/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/include"  
#$ LIBRARIES=  "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib/stubs" "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ nvlink -m64  --shared --arch=sm_50 --register-link-binaries="/tmp/tmpxft_00013268_00000000-3_cmake_device_link.reg.c"  -w -lcudadevrt -lcudart_static -lrt -lpthread -ldl   "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib/stubs" "-L/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/lib" -cpu-arch=X86_64 -report-arch -dlto -nvvmpath="/home/rmaynard/miniconda3/envs/cuda_tk12/bin/../targets/x86_64-linux/nvvm" "CMakeFiles/CudaOnlyDeviceLTO.dir/main.cu.o" "libCUDA_dlto.a"  -lcudadevrt  -o "/tmp/tmpxft_00013268_00000000-5_cmake_device_link.compute_50.cubin" --host-ccbin "/home/rmaynard/miniconda3/envs/cuda_tk12/bin/x86_64-conda-linux-gnu-c++"
nvlink fatal   : elfLink linker library load error (target: sm_50)
# --error 0x1 --

Still unclear to me on what has changed in the enviornment compared to the existing cuda-nvcc package that caused this failure. Currently looking in the strace outputs of the nvlink execution.

The issue seems that the nvvm/lib64/libnvvm.so.4.0.0 differs in size ( and not considereed a shared lib ) after I built this locally versus the existing cuda-nvcc.
I built for the generix linux64 platform.

Working cuda-nvcc info:

$ ldd /home/rmaynard/miniconda3/envs/old_cuda12_working/bin/../targets/x86_64-linux//nvvm/lib64/libnvvm.so.4.0.0
   linux-vdso.so.1 (0x00007ffe0055b000)
   libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f13c0ff3000)
   librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f13c0fe9000)
   libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f13c0fe3000)
   libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f13c0e94000)
   libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f13c0ca2000)
   /lib64/ld-linux-x86-64.so.2 (0x00007f13c2cc3000)
   
$ ll /home/rmaynard/miniconda3/envs/old_cuda12_working/bin/../targets/x86_64-linux//nvvm/lib64/
total 26808
drwxrwxr-x 2 rmaynard rmaynard     4096 May 15 11:05 ./
drwxrwxr-x 6 rmaynard rmaynard     4096 May 15 11:05 ../
lrwxrwxrwx 1 rmaynard rmaynard       16 May 15 11:05 libnvvm.so -> libnvvm.so.4.0.0*
lrwxrwxrwx 1 rmaynard rmaynard       16 May 15 11:05 libnvvm.so.4 -> libnvvm.so.4.0.0*
-rwxrwxr-x 3 rmaynard rmaynard 27440865 Apr 21 12:57 libnvvm.so.4.0.0*

Now on the failing conda env that uses this pr.

$ ldd /home/rmaynard/miniconda3/envs/cuda_tk12/nvvm/lib64/libnvvm.so.4.0.0 
	not a dynamic executable
$ ll /home/rmaynard/miniconda3/envs/cuda_tk12/nvvm/lib64/
total 26808
drwxrwxr-x 2 rmaynard rmaynard     4096 May 15 15:35 ./
drwxrwxr-x 6 rmaynard rmaynard     4096 May 15 15:35 ../
lrwxrwxrwx 1 rmaynard rmaynard       16 May 15 15:35 libnvvm.so -> libnvvm.so.4.0.0*
lrwxrwxrwx 1 rmaynard rmaynard       16 May 15 15:35 libnvvm.so.4 -> libnvvm.so.4.0.0*
-rwxrwxr-x 2 rmaynard rmaynard 27440857 May 15 10:12 libnvvm.so.4.0.0*```

@robertmaynard
Copy link
Contributor

The removal of running patchelf on libnvvm has allowed all the CMake CUDA Tests to pass locally.

I have a minor concern that we should symlink everything ( from the CUDA toolkit ) from $PREFIX/bin into $PREFIX/<target>/bin so that projects that are CMake based and use CUDAToolkit_BIN_DIR will still work ( e.g. CUDAToolkit_BIN_DIR/bin2c works ). But we can spin that request out into an issue if needed.

@isuruf
Copy link
Member

isuruf commented May 15, 2023

Can you also open an issue in CMake to fix the layout issue, so that we can remove this hack sometime in the future?

@robertmaynard
Copy link
Contributor

Can you also open an issue in CMake to fix the layout issue, so that we can remove this hack sometime in the future?

Sure can. Would you consider this an accurate summary of the issue:

CMake should only infer location of `nvcc` from `#$ TOP` output from compiler. Infer include directoy instead from `#$ INCLUDES`, and libraries from `#$ LIBRARIES`.  This aligns them with the existing behavior `nvvm` ( see `_CUDA_NVVMIR_LIBRARY_DIR` ).

This will allow fully splayed layouts of the CUDA toolkit to be supported, like the ones that conda-forge provides going forward with CUDA 12.

Currently wrong compiler detection variables with conda's desired CUDA 12.+ layout:

- CMAKE_CUDA_COMPILER_TOOLKIT_LIBRARY_ROOT currently is "$CONDA_PREFIX" should be "$CONDA_PREFIX/<target>/lib"
- CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES  currently is " "$CONDA_PREFIX/include" should be "$CONDA_PREFIX/<target>/include/"


@isuruf
Copy link
Member

isuruf commented May 15, 2023

I would argue that TOP should not be used by CMake at all. It's just a variable without any meaning.

@robertmaynard
Copy link
Contributor

I would argue that TOP should not be used by CMake at all. It's just a variable without any meaning.

We need to parse some compiler output from nvcc to determine the proper 'root' for cuda toolkit executables ( nvcc, fatbinary, etc ).

We can't use something like which nvcc since that doesn't punch through compiler wrapper symlinks ( ccache, gcc colors, etc ). In addition thinigs like the NVHPC have two different nvcc executables ( not symlinks, no nvcc.profiles ) but only a single location for the other executables such as fatbinary.

root@bfe5e65a3f4a:/host_pwd# which nvcc
/opt/nvidia/hpc_sdk/Linux_x86_64/23.1/compilers/bin/nvcc
root@bfe5e65a3f4a:/host_pwd# ll /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/compilers/bin/ | grep fat
root@bfe5e65a3f4a:/host_pwd# /opt/nvidia/hpc_sdk/Linux_x86_64/23.1/compilers/bin/nvcc -v test.cu
#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/opt/nvidia/hpc_sdk/Linux_x86_64/23.1/cuda/12.0/bin
#$ _THERE_=/opt/nvidia/hpc_sdk/Linux_x86_64/23.1/cuda/12.0/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/opt/nvidia/hpc_sdk/Linux_x86_64/23.1/cuda/12.0/bin/..

_HERE_ and _THERE_ don't work since they don't see through symlinks and can't be influenced by the nvcc.profile ( we used to use _HERE_). So our options are TOP or PATH and we went with TOP since it looked easier ( no need to iterate ).

@isuruf
Copy link
Member

isuruf commented May 15, 2023

Yeah, I was thinking PATH would be a better option for bin/ just like INCLUDES is a better option for include/.

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all! 🙏

Sounds like we are happy with this change

Next step is to update PR ( conda-forge/cuda-nvcc-feedstock#13 ) to use these packages once they are produced

Comment on lines 121 to 123
- include/crt # [linux]
- include/fatbinary_section.h # [linux]
- include/nvPTXCompiler.h # [linux]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these three symlinks needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They shouldn't be needed based on testing that happened against prior iterations without them.
Removed them here now.

@jakirkham jakirkham merged commit 7827c45 into conda-forge:main May 16, 2023
@jakirkham
Copy link
Member

Thanks all! 🙏

Let's follow up on anything else in the feedstock

Next step is to update PR ( conda-forge/cuda-nvcc-feedstock#13 ) based on these changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants