Workaround for offload compilation using clang with glibc 12 include files #4761

markdewing · 2023-10-09T22:33:56Z

Shows up when doing offload compilation with Clang on a system with gcc/glibc 12 (or newer) include files.

Clang was fixed to recognize __noinline__ as an attribute, but the CUDA include files still have a define for __noinline__. The attribute started to be used in glibc 12 include files, which expands to __attribute__((__attribute__((noinline)))), which the compiler rejects. Clang includes a workaround, but it applies to CUDA code. This is C++ code using the CUDA include files.
See llvm/llvm-project#57544 (and NVIDIA/thrust#1703 ).

The __clang__ and _GLIBCXX_RELEASE guards were added because I'm not sure of the effects on other compilers.
The _GLIBCXX_RELEASE macro is documented here (item 8): https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html

The error message that indicates the problem:

In file included from /home/mdewing/nvme/physics/qmcpack/main/qmcpack/src/Platforms/CUDA/CUDADeviceManager.cpp:26:
In file included from /home/mdewing/nvme/physics/qmcpack/main/qmcpack/src/Platforms/Host/OutputManager.h:19:
In file included from /home/mdewing/nvme/physics/qmcpack/main/qmcpack/src/Platforms/Host/InfoStream.h:21:
In file included from /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/memory:76:
In file included from /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:53:
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:196:22: error: use of undeclared identifier 'noinline'; did you mean 'inline'?
  196 |       __attribute__((__noinline__))
      |                      ^
/usr/local/cuda-12.2/include/crt/host_defines.h:83:24: note: expanded from macro '__noinline__'
   83 |         __attribute__((noinline))
      |                        ^

What type(s) of changes does this code introduce?

Delete the items that do not apply

Bugfix

Does this introduce a breaking change?

No

What systems has this change been tested on?

Local server

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.

Yes. This PR is up to date with current the current state of 'develop'
Yes. Code added or changed in the PR has been clang-formatted
N/A. This PR adds tests to cover any new code, or to catch a bug that is being fixed
N/A. Documentation has been added (if appropriate)

Deals with clang as the compiler in CUDA mode with gcc/glibc 12 include files. Clang was fixed to recognize __noinline__ as an attribute, but the CUDA include files still have a define for __noinline__. The attribute started to be used in gcc 12 include files, which expands to __attribute__((__attribute__((noinline)))), which the compiler rejects. See NVIDIA/thrust#1703 and llvm/llvm-project#57544

ye-luo · 2023-10-09T22:57:59Z

src/Platforms/CUDA/CUDAruntime.hpp

+//     https://github.com/llvm/llvm-project/issues/57544
+#if defined(__clang__) && (_GLIBCXX_RELEASE >= 12)
+#undef __noinline__
+#endif


Is this workaround recommended by someone? Do you have a link?

It's mentioned in the LLVM issue link in the text. Now that I read the LLVM issue link further, it looks like there's a workaround in clang itself. I don't know why that workaround isn't working with my clang install.

I think the reason the llvm workaround isn't working is because CUDADeviceManager.cpp is C++ file, not a CUDA file.

ye-luo · 2023-10-09T23:01:36Z

src/Platforms/CUDA/CUDAruntime.hpp

@@ -25,6 +25,17 @@
 #include "Platforms/ROCm/cuda2hip.h"
 #endif

+// Work around for clang using gcc/glibc 12 include files in CUDA mode.


In my understanding, CUDA mode refers compiling CUDA source code by Clang. Here it seems to be a conflict between CUDA host library header files and libstdc++ header files.

Yes, the noinline attribute was added to the libstdc++ header files here: gcc-mirror/gcc@dbf8bd3#diff-b358f609a31a4af8af72cc3197566abaa157bb7f8681b45580f1e5477540457cR192-R193

Prior to that there were no uses of the noinline attribute in the libstdc++ headers to cause a problem

ye-luo · 2023-10-09T23:02:14Z

src/Platforms/CUDA/CUDAruntime.hpp

+// to __attribute__((__attribute__((noinline)))), which the compiler rejects.
+// See https://github.com/NVIDIA/thrust/issues/1703 and
+//     https://github.com/llvm/llvm-project/issues/57544
+#if defined(__clang__) && (_GLIBCXX_RELEASE >= 12)


Is there a released version of clang with the fix? Should we restrict clang version?

It's not a bug in clang. If anything, it's caused by a feature getting added to clang.

ye-luo

It doesn't seem to be offload related but only CUDA related. Could you clarify?

Could you provide full reproducer? Namely
What version of clang, what version of gcc toolchain, what version of CUDA. Full CMake command line.

markdewing · 2023-10-10T00:00:38Z

Ah yes, it's not offload, but CUDA. But there's no reason to build CUDA anymore except as an adjunct to offload, right?

CMake configuration is

cmake \
-D CMAKE_BUILD_TYPE=Debug \
-D QMC_MPI=0 \
-D QMC_DATA=/home/mdewing/physics/codes/qmcpack/test_data \
-D BUILD_LMYENGINE_INTERFACE=0 \
-D CMAKE_C_COMPILER=clang \
-D CMAKE_CXX_COMPILER=clang++ \
-D ENABLE_OFFLOAD=1 \
-D ENABLE_CUDA=1 \
../qmcpack

clang is 18
System is Ubuntu 22.04, but gcc-12 is also installed. GCC 11 is the default compiler on this system, but clang must pick up the most recent version.
CUDA is 12.2

markdewing · 2023-10-10T16:15:13Z

This problem is currently not seen in the wild mostly because environments are slow to update their version of GCC.

On Polaris, a sufficiently new GCC is not available.
On Perlmutter, the llvm enviroment and the GCC environment containing a newer GCC don't mix. I expect it would need to use the --gcc-toolchain argument to clang to get the newer gcc. (Though I never finished testing on Perlmutter, because boost (!?) was not available).
On a JLSE machine, I did manage to reproduce it.

I expect this problem will become more visible as sites update their GCC compiler to 12 or newer.

prckent · 2023-10-10T18:09:21Z

I suggest that any regular user building this way is probably making the wrong build by accident. Ideally it would work though, since we do advertise it.

Is it relevant to offload at all? e.g. When latest clang is used for offload with latest gcc libs and latest CUDA?

markdewing · 2023-10-10T18:37:43Z

For the CMake options, -D ENABLE_OFFLOAD=1 -D ENABLE_CUDA=1 is the recommended settings for offload with CUDA, correct?
The reason this hasn't shown up before is that apparently no one (including our CI) is running up-to-date gcc libs.

I don't think this depends on CUDA version (CUDA 12.0 and CUDA 12.1 have other compile errors, CUDA 12.2 is the only 12.x version I can get to work with clang). The compile fails with CUDA 11.8, and I didn't test older versions.

markdewing · 2023-10-10T18:56:36Z

It appears the file must also be compiled for openmp offload for this to be triggered (-fopenmp --offload-arch=sm_80). Another (probably better) solution would be to turn off openmp offload for the Platform/CUDA files.

ye-luo · 2023-10-10T23:34:06Z

I can reproduce the issue

cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_FLAGS=--gcc-toolchain=/soft/gcc/gcc-12.1.0  -DCMAKE_CXX_FLAGS=--gcc-toolchain=/soft/gcc/gcc-12.1.0 -DENABLE_CUDA=ON -DENABLE_OFFLOAD=ON ..

OpenMP offload compilation triggers clang CUDA mode and thus hit the issue.
However, OpenMP include

lib/clang/18/include/openmp_wrappers

instead of

lib/clang/18/include/cuda_wrappers

thus, the fix llvm/llvm-project@a50e54f is not effective.
I copied cuda_wrappers/bits/shared_ptr_base.h to openmp_wrappers/bits/shared_ptr_base.h and the compilation went through.

Option 1. adopt the fix in this PR. unclear if there is any down side.
Option 2. Fix LLVM add the workaround in openmp_wrappers
Option 3. Make OpenMP flags a CMake target instead of a global compiler option. It requires some changes in our CMake but probably will be the most robust fix.

PDoakORNL · 2023-10-13T23:03:28Z

I favor option 3 but I don't have a great grasp of how big that effort is going to be seems like too much for an annoying issue but seems likely to go away as the 2 year window moves on. This fix looks like it would be fine if we're really missing something it wouldn't be hard to revert.

markdewing · 2023-11-03T15:04:19Z

This issue is going to "age in", not "age out". As GCC versions that don't use __noinline__ in their library fall out of the two-year window.

I agree that option 3 - changing the CMake - would be the most robust.

ye-luo · 2023-11-03T16:45:05Z

@markdewing could you introduce a test in cmake and add macro to activate this workaround if the test fails.

ye-luo · 2023-11-03T17:02:22Z

I found a much simpler fix by interchange include order see #4814

prckent · 2023-11-03T17:42:31Z

Lets try the brittle fix in #4814 . If it reoccurs or proves problematic in future we can adopt a more robust solution

prckent · 2023-11-03T17:43:32Z

@markdewing Are you able to check this works for you? I need to reinstall the test setup with gcc12 so it will be a few hours before I can check.

ye-luo reviewed Oct 9, 2023

View reviewed changes

ye-luo mentioned this pull request Nov 3, 2023

Resolve noinline definition conflict between CUDA headerfiles and GCC 12 libraries #4814

Merged

ye-luo closed this Nov 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround for offload compilation using clang with glibc 12 include files #4761

Workaround for offload compilation using clang with glibc 12 include files #4761

markdewing commented Oct 9, 2023 •

edited

Loading

ye-luo Oct 9, 2023

markdewing Oct 10, 2023

markdewing Oct 10, 2023

ye-luo Oct 9, 2023

markdewing Oct 10, 2023

ye-luo Oct 9, 2023

markdewing Oct 10, 2023

ye-luo left a comment

markdewing commented Oct 10, 2023

markdewing commented Oct 10, 2023

prckent commented Oct 10, 2023

markdewing commented Oct 10, 2023 •

edited

Loading

markdewing commented Oct 10, 2023

ye-luo commented Oct 10, 2023

PDoakORNL commented Oct 13, 2023

markdewing commented Nov 3, 2023

ye-luo commented Nov 3, 2023

ye-luo commented Nov 3, 2023 •

edited

Loading

prckent commented Nov 3, 2023

prckent commented Nov 3, 2023

Workaround for offload compilation using clang with glibc 12 include files #4761

Workaround for offload compilation using clang with glibc 12 include files #4761

Conversation

markdewing commented Oct 9, 2023 • edited Loading

What type(s) of changes does this code introduce?

Does this introduce a breaking change?

What systems has this change been tested on?

Checklist

ye-luo Oct 9, 2023

Choose a reason for hiding this comment

markdewing Oct 10, 2023

Choose a reason for hiding this comment

markdewing Oct 10, 2023

Choose a reason for hiding this comment

ye-luo Oct 9, 2023

Choose a reason for hiding this comment

markdewing Oct 10, 2023

Choose a reason for hiding this comment

ye-luo Oct 9, 2023

Choose a reason for hiding this comment

markdewing Oct 10, 2023

Choose a reason for hiding this comment

ye-luo left a comment

Choose a reason for hiding this comment

markdewing commented Oct 10, 2023

markdewing commented Oct 10, 2023

prckent commented Oct 10, 2023

markdewing commented Oct 10, 2023 • edited Loading

markdewing commented Oct 10, 2023

ye-luo commented Oct 10, 2023

PDoakORNL commented Oct 13, 2023

markdewing commented Nov 3, 2023

ye-luo commented Nov 3, 2023

ye-luo commented Nov 3, 2023 • edited Loading

prckent commented Nov 3, 2023

prckent commented Nov 3, 2023

markdewing commented Oct 9, 2023 •

edited

Loading

markdewing commented Oct 10, 2023 •

edited

Loading

ye-luo commented Nov 3, 2023 •

edited

Loading