-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalizing offsets iterator #14234
Normalizing offsets iterator #14234
Conversation
benchmark bot, please test this PR |
Enables indexalator to be instantiated from device code. Also add gtests for the output indexalator. This change helps enable for the offset-normalizing-iterator #14234 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #14206
This definitely increases compile time but I was able to minimize the impact somewhat. Here are the top 10 offenders:
Negative numbers mean those 2 files compiled faster. The |
*/ | ||
struct normalize_type { | ||
template <typename T, std::enable_if_t<cudf::is_index_type<T>()>* = nullptr> | ||
__device__ cudf::size_type operator()(void const* tp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not T const* tp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that is not the type that is being passed to the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to understand the impact of type_dispatcher
on the reworked design, it seems to me like we are still using it but there's no cascading calls to type_dispatcher
and it's only called exactly once. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We only call the type-dispatcher in the factory now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and once when setting up the non-templated class input_indexelator::normalize_input
. If you use a normal if-else dispatch there instead of type_dispatcher
, are you able to see any benefits? Especially in src/reductions/scan/scan_inclusive.cu.o
where there's a 6 minute compile-time increase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you are correct, the base class's type-dispatcher is still called inside every element()
call.
I think that is worth considering here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to study the assembly here. Is the type_dispatcher
expanded only once when the class is compiled (so when the header is included) or is it expanded every time element()
is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a separate ctor that just passes the width instead of type-dispatching to resolve it.
This did improved the compile time: https://downloads.rapids.ai/ci/cudf/pull-request/14234/ab0edb7/cuda12_x86_64.ninja_log.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we halved the compile time increment in scan_inclusive
? That is good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some doc nits. LGTM
Splits out the `strings` and `struct` specializations in `scan_inclusive.cu` into separate source files to improve compile time. Each specialization is unique code with limited aggregation types. No functional changes. Just code moved around. Found while working on #14234 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #14358
/merge |
Description
Creates a normalizing offsets iterator that returns an int64 value given either a int32 or int64 column data.
Depends on #14206
Checklist