-
Notifications
You must be signed in to change notification settings - Fork 448
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is actually not too bad.
5edc5ad
to
4557802
Compare
@brycelelbach suggested to use something like |
cub/util_namespace.cuh
Outdated
@@ -108,6 +108,60 @@ | |||
#define CUB_NS_QUALIFIER ::cub | |||
#endif | |||
|
|||
#define CUB_COUNT_N(_1, _2, _3, _4, _5, _6, _7, _8, _9, _10, _11, _12, _13, _14, _15, N, ...) N | |||
#define CUB_COUNT(...) \ | |||
CUB_IDENTITY(CUB_COUNT_N(__VA_ARGS__, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be needed to support 20. As it currently stands compiling with -arch=all
will generate 14 entries for x86-64 machines. With a new generation of hardware we will easily overflow the 15 count limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I wonder if there is some clever way to collapse all the archs down into a single number or some other condensed representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrhemstad having namespace name readable has advantage in my understanding. If you have a binary, you can see why you have multiple symbols. Can you share a value in the single number representation?
I think this is a good goal, but would like to stick with If we had some way to automatically test for ABI breaks I'd be much more comfortable with this idea. |
What are some examples of types in CUB where we'd even be concerned with ABI? To be honest, I can't even think of anything from CUB that would be part of someone's binary interface. I guess technically someone could have a |
Looking at the impact of this change on a large library like libcudf.
Overall this change looks to have a minimal impact on binary size, which is great to see. |
@jrhemstad one of the examples might be |
@robertmaynard just to make sure, does libcudf use |
No, whole compilation only. I expect the size increase comes from the increase in symbol name length. |
4557802
to
5f7f93d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes requested, but overall I'm ok with the approach here.
5f7f93d
to
773eb59
Compare
Motivation
This PR is an alternative fix of the following issue. The suggested solution of making all kernels static would explode binary sizes for builds with
-rdc=true
. @jrhemstad suggested an alternative approach where we encode the list of architectures we are compiling against in the CUB namespace.Solution
This PR introduces inline namespace whose name is generated from combination of
CUB_VERSION
and__CUDA_ARCH_LIST__
. This solution addresses original issue, since dispatch layer is calling kernels from the same set or architectures it was compiled against. On the other hand, the solution preserves weak linkage when the code compiled with-rdc=true
.Additionally,
CUB_DISABLE_NAMESPACE_MAGIC
macro is provided to disable mentioned changes. ProvidingCUB_DISABLE_NAMESPACE_MAGIC
requires specification ofCUB_WRAPPED_NAMESPACE
.Example
As an example I'm compiling the following code:
When compiled with
-rdc=false
for different architectures:When compiled with
-rdc=false
for same architectures:When compiled with
-rdc=true
for same architectures:Issues
cub::DoubleBuffer
as a function parameter in a precompiled libraries, we'll break the code. CUB doesn't document it's ABI guarantees, so additional discussion is needed. We might break this in 2.0. Alternatively, we might take thecub::DoubleBuffer
out of the inner namespace.TODO