Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] compile error tdigest_aggregation.cu on cuda 12.2 on arm64 #14610

Closed
revans2 opened this issue Dec 11, 2023 · 8 comments · Fixed by #14607
Closed

[BUG] compile error tdigest_aggregation.cu on cuda 12.2 on arm64 #14610

revans2 opened this issue Dec 11, 2023 · 8 comments · Fixed by #14607
Labels
bug Something isn't working Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Dec 11, 2023

Describe the bug
Recently our nightly CI filed for CUDA 12.2 on an arm64 server with the following errors.

[2023-12-11T05:15:56.047Z] [INFO]      [exec] /usr/local/cuda/include/cuda/std/detail/libcxx/include/__functional/invoke.h(402): error: calling a __device__ function("cudf::tdigest::detail::_NV_ANON_NAMESPACE::build_output_column(int,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&, bool,  ::rmm::cuda_stream_view,  ::rmm::mr::device_memory_resource *)::[lambda(int) (instance 2)]::operator ()(int) const") from a __host__ __device__ function("__invoke") is not allowed
[2023-12-11T05:15:56.047Z] [INFO]      [exec]       { return static_cast<_Fp&&>(__f)(static_cast<_Args&&>(__args)...); }
[2023-12-11T05:15:56.047Z] [INFO]      [exec]                                       ^
[2023-12-11T05:15:56.047Z] [INFO]      [exec]           detected during:
[2023-12-11T05:15:56.047Z] [INFO]      [exec]             instantiation of "decltype((<expression>)) cuda::std::__4::__invoke(_Fp &&, _Args &&...) [with _Fp=const lambda [](cudf::size_type)->int &, _Args=<cudf::size_type &>]" at line 126 of /usr/local/cuda/include/cuda/functional
[2023-12-11T05:15:56.047Z] [INFO]      [exec]             instantiation of "_Ret cuda::__4::__detail::__return_type_wrapper<_Ret, _DecayFn>::operator()(_As &&...) const & noexcept [with _Ret=cudf::size_type, _DecayFn=lambda [](cudf::size_type)->int, _As=<cudf::size_type &>]" at line 643 of /home/jenkins/agent/workspace/spark-rapids-jni_nightly-dev/thirdparty/cudf/cpp/src/quantiles/tdigest/tdigest_aggregation.cu
[2023-12-11T05:15:56.047Z] [INFO]      [exec] 
[2023-12-11T05:15:56.047Z] [INFO]      [exec] /usr/local/cuda/include/cuda/std/detail/libcxx/include/__functional/invoke.h(402): error: calling a __device__ function("_ZZN58_INTERNAL_3d18e25e_22_tdigest_aggregation_cu_1c22b31c_95514cudf7tdigest6detail60_GLOBAL__N__3d18e25e_22_tdigest_aggregation_cu_1c22b31c_955119build_output_columnE1?1?1?1?1?1?1?1?1?ENKUl1?E0_clE1?") from a __host__ __device__ function("__invoke") is not allowed
[2023-12-11T05:15:56.047Z] [INFO]      [exec]       { return static_cast<_Fp&&>(__f)(static_cast<_Args&&>(__args)...); }
[2023-12-11T05:15:56.047Z] [INFO]      [exec]                                       ^
[2023-12-11T05:15:56.047Z] [INFO]      [exec]           detected during:
[2023-12-11T05:15:56.047Z] [INFO]      [exec]             instantiation of "decltype((<expression>)) cuda::std::__4::__invoke(_Fp &&, _Args &&...) [with _Fp=const lambda [](cudf::size_type)->int &, _Args=<cudf::size_type &>]" at line 126 of /usr/local/cuda/include/cuda/functional
[2023-12-11T05:15:56.047Z] [INFO]      [exec]             instantiation of "_Ret cuda::__4::__detail::__return_type_wrapper<_Ret, _DecayFn>::operator()(_As &&...) const & noexcept [with _Ret=cudf::size_type, _DecayFn=lambda [](cudf::size_type)->int, _As=<cudf::size_type &>]" at line 643 of /home/jenkins/agent/workspace/spark-rapids-jni_nightly-dev/thirdparty/cudf/cpp/src/quantiles/tdigest/tdigest_aggregation.cu
[2023-12-11T05:15:56.047Z] [INFO]      [exec] 
[2023-12-11T05:15:56.318Z] [INFO]      [exec] 2 errors detected in the compilation of "/home/jenkins/agent/workspace/spark-rapids-jni_nightly-dev/thirdparty/cudf/cpp/src/quantiles/tdigest/tdigest_aggregation.cu".
[2023-12-11T05:15:56.579Z] [INFO]      [exec] gmake[2]: *** [CMakeFiles/cudf.dir/build.make:3497: CMakeFiles/cudf.dir/src/quantiles/tdigest/tdigest_aggregation.cu.o] Error 2
[2023-12-11T05:15:56.580Z] [INFO]      [exec] gmake[2]: *** Waiting for unfinished jobs....
[2023-12-11T05:37:03.812Z] [INFO]      [exec] gmake[1]: *** [CMakeFiles/Makefile2:819: CMakeFiles/cudf.dir/all] Error 2
[2023-12-11T05:37:03.812Z] [INFO]      [exec] gmake: *** [Makefile:156: all] Error 2
[2023-12-11T05:11:21.506Z] + scl enable gcc-toolset-11 ci/nightly-build.sh
[2023-12-11T05:11:21.506Z] + nvidia-smi
[2023-12-11T05:11:21.506Z] Mon Dec 11 05:11:21 2023       
[2023-12-11T05:11:21.506Z] +---------------------------------------------------------------------------------------+
[2023-12-11T05:11:21.506Z] | NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
[2023-12-11T05:11:21.506Z] |-----------------------------------------+----------------------+----------------------+
[2023-12-11T05:11:21.506Z] | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
[2023-12-11T05:11:21.506Z] | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
[2023-12-11T05:11:21.506Z] |                                         |                      |               MIG M. |
[2023-12-11T05:11:21.506Z] |=========================================+======================+======================|
[2023-12-11T05:11:21.506Z] |   0  Tesla T4                       Off | 00000000:01:00.0 Off |                    0 |
[2023-12-11T05:11:21.506Z] | N/A   41C    P8              15W /  70W |      2MiB / 15360MiB |      0%      Default |
[2023-12-11T05:11:21.506Z] |                                         |                      |                  N/A |
[2023-12-11T05:11:21.506Z] +-----------------------------------------+----------------------+----------------------+
[2023-12-11T05:11:21.506Z]                                                                                          
[2023-12-11T05:11:21.506Z] +---------------------------------------------------------------------------------------+
[2023-12-11T05:11:21.506Z] | Processes:                                                                            |
[2023-12-11T05:11:21.506Z] |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
[2023-12-11T05:11:21.506Z] |        ID   ID                                                             Usage      |
[2023-12-11T05:11:21.506Z] |=======================================================================================|
[2023-12-11T05:11:21.506Z] |  No running processes found                                                           |
[2023-12-11T05:11:21.506Z] +---------------------------------------------------------------------------------------+
@revans2 revans2 added bug Something isn't working Needs Triage Need team to review and classify Spark Functionality that helps Spark RAPIDS labels Dec 11, 2023
@bdice
Copy link
Contributor

bdice commented Dec 11, 2023

@revans2 Thanks for the additional information from #14607 -- can you try to make a minimal reproducer to file with CCCL for this?

@revans2
Copy link
Contributor Author

revans2 commented Dec 11, 2023

I am no expert on any of this. All I did was look at the nightly build and saw that it failed/reported the failure. I can possibly rerun/retest things in the environment, but I am not sure I even have direct access to it myself right now. Trying to get a minimal repro case is going to take me a very long time. @sameerz is there someone on our team that can help out with this?

@bdice
Copy link
Contributor

bdice commented Dec 11, 2023

Don’t worry @revans2. I will see what I can do to reproduce and work with the CCCL team.

@bdice
Copy link
Contributor

bdice commented Dec 11, 2023

I think this comes down to an issue with a particular host compiler as well. Do you know what the C++ compiler used was?

@ttnghia
Copy link
Contributor

ttnghia commented Dec 11, 2023

I can't reproduce it locally anyhow. So I suspect that it is due to a bug of a specific compiler in ARM.

rapids-bot bot pushed a commit that referenced this issue Dec 11, 2023
This removes `cuda::proclaim_return_type` from a device lambda because that lambda is going to be nested inside another device lambda, which is in turn enclosed by `cuda::proclaim_return_type`. 

This PR is to fix a compile issue that we encountered:
```
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__functional/invoke.h(402): error: 
calling a __device__ function("cudf::tdigest::detail::_NV_ANON_NAMESPACE::build_output_column(int,   
  ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&, bool,  ::rmm::cuda_stream_view,  ::rmm::mr::device_memory_resource *)
::[lambda(int) (instance 2)]::operator ()(int) const") from a __host__ __device__ function("__invoke") is not allowed
```

Note: The issue is reproducible only in our build environment: ARM architecture, cuda 12 + rockylinux8.

Closes #14610.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Michael Schellenberger Costa (https://github.com/miscco)
  - Karthikeyan (https://github.com/karthikeyann)
  - https://github.com/nvdbaranec
  - Bradley Dice (https://github.com/bdice)

URL: #14607
@ttnghia
Copy link
Contributor

ttnghia commented Dec 11, 2023

According to the build log, host compiler is GNU 11.2.1, and NVCC 12.2. So it is almost the same as on my local machine. But my local machine can't reproduce the issue.

FYI: The issue showed up in a build system using this docker image: https://github.com/NVIDIA/spark-rapids-jni/blob/branch-24.02/ci/Dockerfile.multi (but with modified cuda version to 12).

@ttnghia
Copy link
Contributor

ttnghia commented Dec 11, 2023

To clarify, the bug was trigger when cuda::proclaim_return_type is captured in the lambda capture list of another cuda::proclaim_return_type which in turned used in a __host__ __device__ function:

auto func1 = cuda::proclaim_return_type([]__device__(){...});
auto iter = cudf::detail::make_counting_transform_iterator(0, 
    cuda::proclaim_return_type([func1]__device__(){...}));

// make_counting_transform_iterator is __host__ __device__

@bdice
Copy link
Contributor

bdice commented Dec 11, 2023

@miscco confirmed that the root cause is fixed in CCCL 2.2.0. We won't need to do anything except finish and merge #14576.

karthikeyann pushed a commit to karthikeyann/cudf that referenced this issue Dec 12, 2023
This removes `cuda::proclaim_return_type` from a device lambda because that lambda is going to be nested inside another device lambda, which is in turn enclosed by `cuda::proclaim_return_type`. 

This PR is to fix a compile issue that we encountered:
```
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__functional/invoke.h(402): error: 
calling a __device__ function("cudf::tdigest::detail::_NV_ANON_NAMESPACE::build_output_column(int,   
  ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&,     ::std::unique_ptr<   ::cudf::column,     ::std::default_delete<   ::cudf::column> >  &&, bool,  ::rmm::cuda_stream_view,  ::rmm::mr::device_memory_resource *)
::[lambda(int) (instance 2)]::operator ()(int) const") from a __host__ __device__ function("__invoke") is not allowed
```

Note: The issue is reproducible only in our build environment: ARM architecture, cuda 12 + rockylinux8.

Closes rapidsai#14610.

Authors:
  - Nghia Truong (https://github.com/ttnghia)

Approvers:
  - Michael Schellenberger Costa (https://github.com/miscco)
  - Karthikeyan (https://github.com/karthikeyann)
  - https://github.com/nvdbaranec
  - Bradley Dice (https://github.com/bdice)

URL: rapidsai#14607
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants