Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalizing offsets iterator #14234
Normalizing offsets iterator #14234
Changes from 38 commits
32e1029
1548588
f6419b4
b5b4449
0e369dd
e451116
7dcb134
a248d75
88f6dff
081cb84
a28a9ff
df063b6
f5c898c
ccc5bf5
73c04d8
eb586f4
4ba5a70
e76ec97
1ae36f3
6fdeadf
84e7510
2bfb3e1
8d45877
352f516
1add402
e9cfafe
e5f5589
09afe63
704c853
f45ee6d
a132e33
8ee15a3
43c4984
a97d061
1ed5e29
5839fcd
e0d4e5f
6ec6258
8514a71
3a21fe1
ab0edb7
7b10cb7
f4634f9
713d601
0d8caf1
56a73c5
23a8bf6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not
T const* tp
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that is not the type that is being passed to the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to understand the impact of
type_dispatcher
on the reworked design, it seems to me like we are still using it but there's no cascading calls totype_dispatcher
and it's only called exactly once. Is that correct?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We only call the type-dispatcher in the factory now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and once when setting up the non-templated class
input_indexelator::normalize_input
. If you use a normal if-else dispatch there instead oftype_dispatcher
, are you able to see any benefits? Especially insrc/reductions/scan/scan_inclusive.cu.o
where there's a 6 minute compile-time increaseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you are correct, the base class's type-dispatcher is still called inside every
element()
call.I think that is worth considering here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to study the assembly here. Is the
type_dispatcher
expanded only once when the class is compiled (so when the header is included) or is it expanded every timeelement()
is called?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a separate ctor that just passes the width instead of type-dispatching to resolve it.
This did improved the compile time: https://downloads.rapids.ai/ci/cudf/pull-request/14234/ab0edb7/cuda12_x86_64.ninja_log.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we halved the compile time increment in
scan_inclusive
? That is good!