Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for offload compilation using clang with glibc 12 include files #4761

Closed
wants to merge 1 commit into from

Conversation

markdewing
Copy link
Contributor

@markdewing markdewing commented Oct 9, 2023

Shows up when doing offload compilation with Clang on a system with gcc/glibc 12 (or newer) include files.

Clang was fixed to recognize __noinline__ as an attribute, but the CUDA include files still have a define for __noinline__. The attribute started to be used in glibc 12 include files, which expands to __attribute__((__attribute__((noinline)))), which the compiler rejects. Clang includes a workaround, but it applies to CUDA code. This is C++ code using the CUDA include files.
See llvm/llvm-project#57544 (and NVIDIA/thrust#1703 ).

The __clang__ and _GLIBCXX_RELEASE guards were added because I'm not sure of the effects on other compilers.
The _GLIBCXX_RELEASE macro is documented here (item 8): https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html

The error message that indicates the problem:

In file included from /home/mdewing/nvme/physics/qmcpack/main/qmcpack/src/Platforms/CUDA/CUDADeviceManager.cpp:26:
In file included from /home/mdewing/nvme/physics/qmcpack/main/qmcpack/src/Platforms/Host/OutputManager.h:19:
In file included from /home/mdewing/nvme/physics/qmcpack/main/qmcpack/src/Platforms/Host/InfoStream.h:21:
In file included from /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/memory:76:
In file included from /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr.h:53:
/usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/shared_ptr_base.h:196:22: error: use of undeclared identifier 'noinline'; did you mean 'inline'?
  196 |       __attribute__((__noinline__))
      |                      ^
/usr/local/cuda-12.2/include/crt/host_defines.h:83:24: note: expanded from macro '__noinline__'
   83 |         __attribute__((noinline))
      |                        ^

What type(s) of changes does this code introduce?

Delete the items that do not apply

  • Bugfix

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

Local server

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.

  • Yes. This PR is up to date with current the current state of 'develop'
  • Yes. Code added or changed in the PR has been clang-formatted
  • N/A. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • N/A. Documentation has been added (if appropriate)

Deals with clang as the compiler in CUDA mode with gcc/glibc 12 include files.
Clang was fixed to recognize __noinline__ as an attribute,
but the CUDA include files still have a define for __noinline__.
The attribute started to be used in gcc 12 include files, which expands
to __attribute__((__attribute__((noinline)))), which the compiler rejects.
See NVIDIA/thrust#1703 and
    llvm/llvm-project#57544
// https://github.com/llvm/llvm-project/issues/57544
#if defined(__clang__) && (_GLIBCXX_RELEASE >= 12)
#undef __noinline__
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this workaround recommended by someone? Do you have a link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mentioned in the LLVM issue link in the text. Now that I read the LLVM issue link further, it looks like there's a workaround in clang itself. I don't know why that workaround isn't working with my clang install.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason the llvm workaround isn't working is because CUDADeviceManager.cpp is C++ file, not a CUDA file.

@@ -25,6 +25,17 @@
#include "Platforms/ROCm/cuda2hip.h"
#endif

// Work around for clang using gcc/glibc 12 include files in CUDA mode.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, CUDA mode refers compiling CUDA source code by Clang. Here it seems to be a conflict between CUDA host library header files and libstdc++ header files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the noinline attribute was added to the libstdc++ header files here: gcc-mirror/gcc@dbf8bd3#diff-b358f609a31a4af8af72cc3197566abaa157bb7f8681b45580f1e5477540457cR192-R193

Prior to that there were no uses of the noinline attribute in the libstdc++ headers to cause a problem

// to __attribute__((__attribute__((noinline)))), which the compiler rejects.
// See https://github.com/NVIDIA/thrust/issues/1703 and
// https://github.com/llvm/llvm-project/issues/57544
#if defined(__clang__) && (_GLIBCXX_RELEASE >= 12)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a released version of clang with the fix? Should we restrict clang version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a bug in clang. If anything, it's caused by a feature getting added to clang.

Copy link
Contributor

@ye-luo ye-luo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to be offload related but only CUDA related. Could you clarify?

Could you provide full reproducer? Namely
What version of clang, what version of gcc toolchain, what version of CUDA. Full CMake command line.

@markdewing
Copy link
Contributor Author

Ah yes, it's not offload, but CUDA. But there's no reason to build CUDA anymore except as an adjunct to offload, right?

CMake configuration is

cmake \
-D CMAKE_BUILD_TYPE=Debug \
-D QMC_MPI=0 \
-D QMC_DATA=/home/mdewing/physics/codes/qmcpack/test_data \
-D BUILD_LMYENGINE_INTERFACE=0 \
-D CMAKE_C_COMPILER=clang \
-D CMAKE_CXX_COMPILER=clang++ \
-D ENABLE_OFFLOAD=1 \
-D ENABLE_CUDA=1 \
../qmcpack

clang is 18
System is Ubuntu 22.04, but gcc-12 is also installed. GCC 11 is the default compiler on this system, but clang must pick up the most recent version.
CUDA is 12.2

@markdewing
Copy link
Contributor Author

This problem is currently not seen in the wild mostly because environments are slow to update their version of GCC.

  • On Polaris, a sufficiently new GCC is not available.
  • On Perlmutter, the llvm enviroment and the GCC environment containing a newer GCC don't mix. I expect it would need to use the --gcc-toolchain argument to clang to get the newer gcc. (Though I never finished testing on Perlmutter, because boost (!?) was not available).
  • On a JLSE machine, I did manage to reproduce it.

I expect this problem will become more visible as sites update their GCC compiler to 12 or newer.

@prckent
Copy link
Contributor

prckent commented Oct 10, 2023

I suggest that any regular user building this way is probably making the wrong build by accident. Ideally it would work though, since we do advertise it.

Is it relevant to offload at all? e.g. When latest clang is used for offload with latest gcc libs and latest CUDA?

@markdewing
Copy link
Contributor Author

markdewing commented Oct 10, 2023

For the CMake options, -D ENABLE_OFFLOAD=1 -D ENABLE_CUDA=1 is the recommended settings for offload with CUDA, correct?
The reason this hasn't shown up before is that apparently no one (including our CI) is running up-to-date gcc libs.

I don't think this depends on CUDA version (CUDA 12.0 and CUDA 12.1 have other compile errors, CUDA 12.2 is the only 12.x version I can get to work with clang). The compile fails with CUDA 11.8, and I didn't test older versions.

@markdewing
Copy link
Contributor Author

It appears the file must also be compiled for openmp offload for this to be triggered (-fopenmp --offload-arch=sm_80). Another (probably better) solution would be to turn off openmp offload for the Platform/CUDA files.

@ye-luo
Copy link
Contributor

ye-luo commented Oct 10, 2023

I can reproduce the issue

cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_FLAGS=--gcc-toolchain=/soft/gcc/gcc-12.1.0  -DCMAKE_CXX_FLAGS=--gcc-toolchain=/soft/gcc/gcc-12.1.0 -DENABLE_CUDA=ON -DENABLE_OFFLOAD=ON ..

OpenMP offload compilation triggers clang CUDA mode and thus hit the issue.
However, OpenMP include

lib/clang/18/include/openmp_wrappers

instead of

lib/clang/18/include/cuda_wrappers

thus, the fix llvm/llvm-project@a50e54f is not effective.
I copied cuda_wrappers/bits/shared_ptr_base.h to openmp_wrappers/bits/shared_ptr_base.h and the compilation went through.

Option 1. adopt the fix in this PR. unclear if there is any down side.
Option 2. Fix LLVM add the workaround in openmp_wrappers
Option 3. Make OpenMP flags a CMake target instead of a global compiler option. It requires some changes in our CMake but probably will be the most robust fix.

@PDoakORNL
Copy link
Contributor

I favor option 3 but I don't have a great grasp of how big that effort is going to be seems like too much for an annoying issue but seems likely to go away as the 2 year window moves on. This fix looks like it would be fine if we're really missing something it wouldn't be hard to revert.

@markdewing
Copy link
Contributor Author

This issue is going to "age in", not "age out". As GCC versions that don't use __noinline__ in their library fall out of the two-year window.

I agree that option 3 - changing the CMake - would be the most robust.

@ye-luo
Copy link
Contributor

ye-luo commented Nov 3, 2023

@markdewing could you introduce a test in cmake and add macro to activate this workaround if the test fails.

@ye-luo
Copy link
Contributor

ye-luo commented Nov 3, 2023

I found a much simpler fix by interchange include order see #4814

@prckent
Copy link
Contributor

prckent commented Nov 3, 2023

Lets try the brittle fix in #4814 . If it reoccurs or proves problematic in future we can adopt a more robust solution

@prckent
Copy link
Contributor

prckent commented Nov 3, 2023

@markdewing Are you able to check this works for you? I need to reinstall the test setup with gcc12 so it will be a few hours before I can check.

@ye-luo ye-luo closed this Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants