-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workaround for offload compilation using clang with glibc 12 include files #4761
Conversation
Deals with clang as the compiler in CUDA mode with gcc/glibc 12 include files. Clang was fixed to recognize __noinline__ as an attribute, but the CUDA include files still have a define for __noinline__. The attribute started to be used in gcc 12 include files, which expands to __attribute__((__attribute__((noinline)))), which the compiler rejects. See NVIDIA/thrust#1703 and llvm/llvm-project#57544
// https://github.com/llvm/llvm-project/issues/57544 | ||
#if defined(__clang__) && (_GLIBCXX_RELEASE >= 12) | ||
#undef __noinline__ | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this workaround recommended by someone? Do you have a link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mentioned in the LLVM issue link in the text. Now that I read the LLVM issue link further, it looks like there's a workaround in clang itself. I don't know why that workaround isn't working with my clang install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason the llvm workaround isn't working is because CUDADeviceManager.cpp is C++ file, not a CUDA file.
@@ -25,6 +25,17 @@ | |||
#include "Platforms/ROCm/cuda2hip.h" | |||
#endif | |||
|
|||
// Work around for clang using gcc/glibc 12 include files in CUDA mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, CUDA mode refers compiling CUDA source code by Clang. Here it seems to be a conflict between CUDA host library header files and libstdc++ header files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the noinline attribute was added to the libstdc++ header files here: gcc-mirror/gcc@dbf8bd3#diff-b358f609a31a4af8af72cc3197566abaa157bb7f8681b45580f1e5477540457cR192-R193
Prior to that there were no uses of the noinline attribute in the libstdc++ headers to cause a problem
// to __attribute__((__attribute__((noinline)))), which the compiler rejects. | ||
// See https://github.com/NVIDIA/thrust/issues/1703 and | ||
// https://github.com/llvm/llvm-project/issues/57544 | ||
#if defined(__clang__) && (_GLIBCXX_RELEASE >= 12) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a released version of clang with the fix? Should we restrict clang version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a bug in clang. If anything, it's caused by a feature getting added to clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to be offload related but only CUDA related. Could you clarify?
Could you provide full reproducer? Namely
What version of clang, what version of gcc toolchain, what version of CUDA. Full CMake command line.
Ah yes, it's not offload, but CUDA. But there's no reason to build CUDA anymore except as an adjunct to offload, right? CMake configuration is
clang is 18 |
This problem is currently not seen in the wild mostly because environments are slow to update their version of GCC.
I expect this problem will become more visible as sites update their GCC compiler to 12 or newer. |
I suggest that any regular user building this way is probably making the wrong build by accident. Ideally it would work though, since we do advertise it. Is it relevant to offload at all? e.g. When latest clang is used for offload with latest gcc libs and latest CUDA? |
For the CMake options, -D ENABLE_OFFLOAD=1 -D ENABLE_CUDA=1 is the recommended settings for offload with CUDA, correct? I don't think this depends on CUDA version (CUDA 12.0 and CUDA 12.1 have other compile errors, CUDA 12.2 is the only 12.x version I can get to work with clang). The compile fails with CUDA 11.8, and I didn't test older versions. |
It appears the file must also be compiled for openmp offload for this to be triggered (-fopenmp --offload-arch=sm_80). Another (probably better) solution would be to turn off openmp offload for the Platform/CUDA files. |
I can reproduce the issue
OpenMP offload compilation triggers clang CUDA mode and thus hit the issue.
instead of
thus, the fix llvm/llvm-project@a50e54f is not effective. Option 1. adopt the fix in this PR. unclear if there is any down side. |
I favor option 3 but I don't have a great grasp of how big that effort is going to be seems like too much for an annoying issue but seems likely to go away as the 2 year window moves on. This fix looks like it would be fine if we're really missing something it wouldn't be hard to revert. |
This issue is going to "age in", not "age out". As GCC versions that don't use I agree that option 3 - changing the CMake - would be the most robust. |
@markdewing could you introduce a test in cmake and add macro to activate this workaround if the test fails. |
I found a much simpler fix by interchange include order see #4814 |
Lets try the brittle fix in #4814 . If it reoccurs or proves problematic in future we can adopt a more robust solution |
@markdewing Are you able to check this works for you? I need to reinstall the test setup with gcc12 so it will be a few hours before I can check. |
Shows up when doing offload compilation with Clang on a system with gcc/glibc 12 (or newer) include files.
Clang was fixed to recognize
__noinline__
as an attribute, but the CUDA include files still have a define for__noinline__
. The attribute started to be used in glibc 12 include files, which expands to__attribute__((__attribute__((noinline))))
, which the compiler rejects. Clang includes a workaround, but it applies to CUDA code. This is C++ code using the CUDA include files.See llvm/llvm-project#57544 (and NVIDIA/thrust#1703 ).
The
__clang__
and_GLIBCXX_RELEASE
guards were added because I'm not sure of the effects on other compilers.The
_GLIBCXX_RELEASE
macro is documented here (item 8): https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.htmlThe error message that indicates the problem:
What type(s) of changes does this code introduce?
Delete the items that do not apply
Does this introduce a breaking change?
What systems has this change been tested on?
Local server
Checklist
Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.