Default value of `CMAKE_CUDA_ARCHITECTURES` of `NATIVE` breaks cmake CMakeDetermineCUDACompiler #84

csadorf · 2022-10-20T09:34:40Z

The default value of NATIVE for CMAKE_CUDA_ARCHITECTURES defined here

Line 1571 in d9f617e

export CUDAARCHS="${CUDAARCHS:-${CMAKE_CUDA_ARCHITECTURES:-NATIVE}}"

breaks CMakeDetermineCUDACompiler with error message

  CMAKE_CUDA_ARCHITECTURES:

    NATIVE

  is not one of the following:

    * a semicolon-separated list of integers, each optionally
      followed by '-real' or '-virtual'
    * a special value: all, all-major, native

See also: d9f617e#r87361335

The text was updated successfully, but these errors were encountered:

csadorf · 2022-10-20T10:07:33Z

The support for the native argument for CUDA_ARCHITECTURES was added to cmake here.

It appears the magic value was always native, not NATIVE, so I am a bit puzzled why this has not been an issue before. Especially, because the default value of NATIVE is found in multiple places within rapids:

rg 'ARCHITECTURE.*NATIVE'
cugraph/build.sh
193:        CUGRAPH_CMAKE_CUDA_ARCHITECTURES="NATIVE"
215:        CUGRAPH_CMAKE_CUDA_ARCHITECTURES="NATIVE"

cuml/cpp/README.md
47:| CMAKE_CUDA_ARCHITECTURES |  List of GPU architectures, semicolon-separated | Empty  | List the GPU architectures to compile the GPU targets for. Set to "NATIVE" to auto detect GPU architecture of the system, set to "ALL" to compile for all RAPIDS supported archs: ["60" "62" "70" "72" "75" "80" "86"].  |

cuml/build.sh
237:        CUML_CMAKE_CUDA_ARCHITECTURES="NATIVE"

compose/etc/bash-utils.sh
1571:    export CUDAARCHS="${CUDAARCHS:-${CMAKE_CUDA_ARCHITECTURES:-NATIVE}}"

cuspatial/build.sh
144:    CUSPATIAL_CMAKE_CUDA_ARCHITECTURES="-DCMAKE_CUDA_ARCHITECTURES=NATIVE"

raft/build.sh
295:        RAFT_CMAKE_CUDA_ARCHITECTURES="NATIVE"

cudf/build.sh
262:        CUDF_CMAKE_CUDA_ARCHITECTURES="${CUDF_CMAKE_CUDA_ARCHITECTURES:-NATIVE}"
263:        if [[ "$CUDF_CMAKE_CUDA_ARCHITECTURES" == "NATIVE" ]]; then

csadorf · 2022-10-20T15:07:00Z

After some digging I found that the NATIVE value is a valid argument for the rapids_cuda_set_architectures() function defined in rapids-cmake/cuda/set_architectures.cmake, but is not a valid argument for CMAKE_CUDA_ARCHITECTURES. The latter only supports native as of version 3.24.

However, even when rapids_cuda_set_architectures() is not directly called, it will be invoked by rapids_init_architectures(..) with rapids_cuda_set_architectures(NATIVE) in case that the CMAKE_CUDA_ARCHITECTURES variable is set to that value. See here.

This means the bug is only triggered if CMakeDetermineCUDACompiler is run before rapids_cuda_init_architectures(...) is called. I am working on determining the MRE.

Fixes trxcllnt#84.

trxcllnt · 2022-11-14T22:51:25Z

It looks like cuML does call rapids_cuda_init_architectures() before calling project(CUML), so using NATIVE here should be OK.

It seems rapids_cuda_set_architectures doesn't accept native (it does accept NATIVE), therefore using the lower-case form breaks building cugraph (because they enumerate the CMAKE_CUDA_ARCHITECTURES list manually).

Can we change this back to NATIVE?

csadorf · 2022-11-15T09:45:43Z

Can we change this back to NATIVE?

~~Looks like it was already changed?~~ It seems like even just adding CMAKE_CUDA_ARCHITECTURES=NATIVE is causing some issues for cuML. The python builds fail now with:

-- Detecting CXX compile features - done
CMake Error at /raid/sadorf/dev/rapids/rapids-compose/etc/conda/cuda_11.5/envs/rapids/share/cmake-3.24/Modules/CMakeDetermineCUDACompiler.cmake:279 (message):
  CMAKE_CUDA_ARCHITECTURES:

    NATIVE

  is not one of the following:

    * a semicolon-separated list of integers, each optionally
      followed by '-real' or '-virtual'
    * a special value: all, all-major, native

Call Stack (most recent call first):
  CMakeLists.txt:54 (enable_language)

I ~~can~~ will try to find a fix for this. ~~that does not require changing the value back.~~

trxcllnt · 2022-11-15T20:27:46Z

I think the issue was with setting the CUDAARCHS envvar, which CMake says is used to initialize the value of CMAKE_CUDA_ARCHITECTURES. For a while CMake had a bug which made this not true, but now it seems that's been fixed.

I pushed a fix yesterday that sets CUDAARCHS=native, but also explicitly sets CMAKE_CUDA_ARCHITECTURES=NATIVE, which will take precedence over the CUDAARCHS envvar in C++ builds.

We don't explicitly pass CMAKE_CUDA_ARCHITECTURES to Python builds, so cuML's scikit-build will still initialize CMAKE_CUDA_ARCHITECTURES from the CUDAARCHS envvar (i.e. lowercase native).

I tested via build-cuml-python and it seemed to be good, but let me know if you're still seeing issues.

csadorf · 2022-11-16T11:13:12Z

I tested via build-cuml-python and it seemed to be good, but let me know if you're still seeing issues.

Not sure what's different in my setup, but I still needed to initiate build-cuml-python twice, it fails with the above quoted error on the first attempt.

trxcllnt · 2022-11-17T02:05:20Z

Do you have CMAKE_CUDA_ARCHITECTURES set in your .env file?

trxcllnt · 2022-11-17T02:06:44Z

One option could be to set CUDAARCHS=native in your .env file. That would cause it to always use native for both CMAKE_CUDA_ARCHITECTURES and CUDAARCHS.

csadorf · 2022-11-17T08:33:44Z

One option could be to set CUDAARCHS=native in your .env file. That would cause it to always use native for both CMAKE_CUDA_ARCHITECTURES and CUDAARCHS.

That doe solve the problem for me, but I think we should try to get this sorted out so that things just work-out-of-the box. I assume there is something we would need to adjust in the cuml python build process?

vyasr · 2022-11-26T17:55:53Z

I would agree that we should sort this out for all users. I wonder if there is a simpler path of least resistance here. rapids-cmake adds support for two special flags, ALL (build for all architectures) and NATIVE as discussed on this thread. Both of these improvements have been upstreamed to CMake itself (in lowercase form ALL->all and NATIVE->native), but we have not yet updated our minimum required CMake version enough to use both. We currently require 3.23, and while support for all was added in 3.23, support native is only in 3.24. Once we bump our required CMake version across RAPIDS I think we should be able to remove the specialized functionality in rapids-cmake unless there are subtle differences that I am not aware of. @robertmaynard is that something that you planned for when we transition to 3.24? That migration was on my todo list, but not something I got around to in 22.12 because of wheels-related work. Will this problem simply be fixed if we bump our minimum required CMake version to 3.24 and make the associated rapids-cmake changes? If so, I can front-load that in 23.02 since we also have a number of other improvements to the build process that I want to make around Python/CMake builds and wheels that will be facilitated by the move.

robertmaynard · 2022-11-28T15:57:29Z

So there are differences between how NATIVE from rapids-cmake work and how native in CMake 3.24. These differences will have an impact on projects like cugraph.

The biggest difference is that NATIVE is transformed into an explicit list of architectures after the project call. This allows projects like cugraph to iterate the values to subset further. Cmake 3.24+ native will either stay as the string native or become a set of architecutres based on the nvcc compiler version ( will stay native when the compiler supports -arch=native ).

Once we bump our required CMake version across RAPIDS I think we should be able to remove the specialized functionality in rapids-cmake

I agree, while rapids-cmake version of these functions has side-effects that projects like cugraph use this is needed as the long term goal is to remove features from rapids-cmake as they arrive in CMake proper.

Will this problem simply be fixed if we bump our minimum required CMake version to 3.24 and make the associated rapids-cmake changes

No. To me we have a secondary issue that compose is setting CUDAARCHS to NATIVE with no DCMAKE_CUDA_ARCHITECTURES value which breaks projects.

The issue is that rapids-cmake rapids_init_architectures presuming only valid values for CUDAARCHS and doesn't expect it would contain ALL or NATIVE. I think we should extend rapids-cmake to also look for all/native/ALL/NATIVE and if used with CMake < 3.24 we map it to the current NATIVE / ALL behavior in both the env var and the cmake var.

I think the issue was with setting the CUDAARCHS envvar, which CMake says is used to initialize the value of CMAKE_CUDA_ARCHITECTURES. For a while CMake had a bug which made this not true, but now it seems that's been fixed.

I believe that was this rapids-cmake bug: rapidsai/rapids-cmake#268 which means that rapids-cmake 22.10+ will consider CUDAARCHS

It seems like even just adding CMAKE_CUDA_ARCHITECTURES=NATIVE is causing some issues for cuML. The python builds fail now with....

As outline above there is a rapids-cmake bug where it doesn't transform CUDAARCHS values when no CMAKE_CUDA_ARCHITECTURES is also provided.

But that won't unblock you as python cuml is incorrectly setting up rapids_cuda_init_architectures here: https://github.com/rapidsai/cuml/blob/branch-22.12/python/CMakeLists.txt#L54 and https://github.com/rapidsai/cuml/blob/branch-22.12/python/CMakeLists.txt#L88. As noted in the docs for the function this requires a call to project with the same name for the activation to occur. In cuml pythons case it will need todo the following to have the NATIVE hooks actiaved:

  rapids_cuda_init_architectures(cuml-python)
  enable_language(CUDA)
  include("${CMAKE_PROJECT_cuml-python_INCLUDE}")

vyasr · 2022-11-28T18:16:23Z

But that won't unblock you as python cuml is incorrectly setting up rapids_cuda_init_architectures here: https://github.com/rapidsai/cuml/blob/branch-22.12/python/CMakeLists.txt#L54 and https://github.com/rapidsai/cuml/blob/branch-22.12/python/CMakeLists.txt#L88. As noted in the docs for the function this requires a call to project with the same name for the activation to occur. In cuml pythons case it will need todo the following to have the NATIVE hooks actiaved:

I've already fixed this bug in the wheels PR for cuML. I believe it was introduced as part of the patch that added the extra get_raft.cmake invocation to handle the lack of COMPONENTS support in rapids-cmake (now fixed). However, that uncovered an additional problem (which manifests in the form of the same error), which is that when cuML finds a preexisting raft package with the distance COMPONENT it conditionally enables the CUDA language before rapids-cmake has a chance to parse out the NATIVE. Is it worth trying to embed that requirement into libcuml's build system, or should we just modify the Python build?

robertmaynard · 2022-11-28T18:21:15Z

We should modify the python build to manage CMAKE_CUDA_ARCHITECTURES ( via rapids-cmake ) before we call find_package(cuml)

vyasr · 2022-11-28T18:51:19Z

That's specifically necessary here because of the transitive call to find raft that occurs, correct? We normally only need to do this if we're compiling the C++ libraries (not if we find them preinstalled).

robertmaynard · 2022-11-28T18:58:44Z

Yes it is due to the fact that some header dependency ( thrust or cuco ) is enabling the CUDA language

CMake 3.23+ now offers support for `all` and `all-major` as special keywords to `CMAKE_CUDA_ARCHITECTURES`. This means that rapids-cmake overload on `ALL` now creates confusion and breaks expectations ( trxcllnt/rapids-compose#84 https://gitlab.kitware.com/cmake/cmake/-/issues/23739, #104 ) This PR introduces the new magic keyword of `RAPIDS` which behaves the same as `ALL`, and will become the new apporved approach for projects that want to build for RAPIDS supported platforms. Fixes #314 More details on the plans of deprecation for `ALL` can be found at: #318 Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Bradley Dice (https://github.com/bdice) URL: #327

csadorf added a commit to csadorf/rapids-compose that referenced this issue Oct 21, 2022

Use native as default value for CMAKE_CUDA_ARCHITECTURES.

02d2c42

Fixes trxcllnt#84.

trxcllnt closed this as completed in 984bdcf Oct 21, 2022

robertmaynard mentioned this issue Dec 1, 2022

Expand CMAKE_CUDA_ARCHITECTURE to support RAPIDS ( equal to ALL )( rapidsai/rapids-cmake#314

Closed

robertmaynard mentioned this issue Dec 13, 2022

Fea/support cmake cuda architectures rapids value rapidsai/rapids-cmake#327

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default value of `CMAKE_CUDA_ARCHITECTURES` of `NATIVE` breaks cmake CMakeDetermineCUDACompiler #84

Default value of `CMAKE_CUDA_ARCHITECTURES` of `NATIVE` breaks cmake CMakeDetermineCUDACompiler #84

csadorf commented Oct 20, 2022

csadorf commented Oct 20, 2022

csadorf commented Oct 20, 2022

trxcllnt commented Nov 14, 2022

csadorf commented Nov 15, 2022 •

edited

Loading

trxcllnt commented Nov 15, 2022

csadorf commented Nov 16, 2022

trxcllnt commented Nov 17, 2022

trxcllnt commented Nov 17, 2022

csadorf commented Nov 17, 2022

vyasr commented Nov 26, 2022

robertmaynard commented Nov 28, 2022 •

edited

Loading

vyasr commented Nov 28, 2022

robertmaynard commented Nov 28, 2022

vyasr commented Nov 28, 2022

robertmaynard commented Nov 28, 2022

Default value of CMAKE_CUDA_ARCHITECTURES of NATIVE breaks cmake CMakeDetermineCUDACompiler #84

Default value of CMAKE_CUDA_ARCHITECTURES of NATIVE breaks cmake CMakeDetermineCUDACompiler #84

Comments

csadorf commented Oct 20, 2022

csadorf commented Oct 20, 2022

csadorf commented Oct 20, 2022

trxcllnt commented Nov 14, 2022

csadorf commented Nov 15, 2022 • edited Loading

trxcllnt commented Nov 15, 2022

csadorf commented Nov 16, 2022

trxcllnt commented Nov 17, 2022

trxcllnt commented Nov 17, 2022

csadorf commented Nov 17, 2022

vyasr commented Nov 26, 2022

robertmaynard commented Nov 28, 2022 • edited Loading

vyasr commented Nov 28, 2022

robertmaynard commented Nov 28, 2022

vyasr commented Nov 28, 2022

robertmaynard commented Nov 28, 2022

Default value of `CMAKE_CUDA_ARCHITECTURES` of `NATIVE` breaks cmake CMakeDetermineCUDACompiler #84

Default value of `CMAKE_CUDA_ARCHITECTURES` of `NATIVE` breaks cmake CMakeDetermineCUDACompiler #84

csadorf commented Nov 15, 2022 •

edited

Loading

robertmaynard commented Nov 28, 2022 •

edited

Loading