-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that all CUDA kernels in cudf have hidden visibility. #14726
Ensure that all CUDA kernels in cudf have hidden visibility. #14726
Conversation
5c496d6
to
a0ee7a9
Compare
a0ee7a9
to
3e2caac
Compare
@@ -1024,7 +1024,7 @@ __device__ int parse_gzip_header(uint8_t const* src, size_t src_size) | |||
* @param parse_hdr If nonzero, indicates that the compressed bitstream includes a GZIP header | |||
*/ | |||
template <int block_size> | |||
__global__ void __launch_bounds__(block_size) | |||
CUDF_KERNEL void __launch_bounds__(block_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we take this opportunity to normalize the order of launch bounds, CUDF_KERNEL
, and the return type across all kernels in libcudf? Some put launch bounds first, others put it last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to make everything consistent as part of the PR, and we can always discuss offline/follow up what style we want. I don't want to hold up the entire PR over a style issue though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can defer on this. It's easy enough to change later. Just wanted to raise that question in case you had a clear preference. I don't know which one I prefer. Maybe CUDF_KERNEL __launch_bounds__(...) void
, but that doesn't align with any of the kernels that I saw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could further future proof it (and potentially enable wider compatibility by making a version that takes the bounds as parameters
CUDF_KERNEL_WITH_LAUNCH_BOUNDS(...) void foo(...)
Or even make all kernels use the same macro with varargs.
CUDF_KERNEL(...) void foo(...)
967a9e1
to
3968a89
Compare
452869e
to
6a53ceb
Compare
/** | ||
* @brief Indicates that the function is a CUDA kernel | ||
*/ | ||
#define CUDF_KERNEL __global__ static |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note (non-blocking): I don't think anyone builds libcudf with rdc=true
, but if you wanted to be extra pedantic, then CUDF_KERNEL
should expand to __attribute__ ((visibility ("hidden")))
when __CUDACC_RDC__
is defined in order to preserve the binary size improvements that come from symbol deduplication within the DLL with rdc=true
.
c8b9590
to
0fd6c03
Compare
This looks like a PR that will need to be duplicated across RAPIDS. So I think it should have a rapidsai/build-planning issue with a checklist of per-repo issues. |
You are correct, we can track the meta issue at: rapidsai/build-planning#12 |
/merge |
This marks all kernels in CUCO as `static` so that they have internal linkage and won't conflict when used by multiple DSOs. I didn't see a single shared/common header in cuco where I could place a `CUCO_KERNEL` macro so I modified each instance instead. While `cccl` went with a `__attribute__ ((visibility ("hidden")))` approach to help reduce RDC size, this approach seemed very invasive for cuco. This is due to the fact that we would need to pragma push and pop both gcc warnings and nvcc warnings in each cuco header so that we don't introduce any warnings. This is needed as the compiler incorrectly state that the `__attribute__ ((visibility ("hidden")))` has no side-effect. Context: rapidsai/cudf#14726 NVIDIA/cccl#166 rapidsai/raft#1722 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yunsong Wang <[email protected]>
Description
To correct potential issues when using a static cuda runtime, we mark all kernels with internal linkage via the
static
keyword orhidden
visibility.Note: This doesn't fix dependencies, but focuses just on the CUDA kernels in cudf directly.
Checklist