-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace dots_along_rows
with rowNorm
and improve coalescedReduction
performance
#1011
Replace dots_along_rows
with rowNorm
and improve coalescedReduction
performance
#1011
Conversation
Note to reviewers: I am aware that the reduction currently doesn't compile with non-trivial types such as cub pairs due to the shuffle-based reductions. Working on a fix. |
I have fixed support for non-trivial types, please have a detailed look at the last commit and in particular changes to |
After these changes, is the following comment still valid? raft/cpp/include/raft/linalg/norm.cuh Line 41 in 355f693
|
Removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Louis, it is nice to see further improvements in our prims. I see that the bulk of the changes are the updates in the tests cases, thanks for the thorough work!
I have just few smaller comments for the code.
Please update the PR description:
- mention adding general shuffle and reduction op
- move detailed description about performance of different kernels into a separate comment.
If you have any measurements/notes on why is this approach better than cub segmented reduction, then please add a comment.
@tfeher |
Some notes on the performance of the thick vs medium kernel:
Visual demonstration of the performance of the medium vs thick implementations (y-axis is time in ms, lower is better): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Louis for the update LGTM!
rerun tests |
@Nyrio i suspect maybe the CI checks aren’t being executed because of the conflicts in your branch. |
@cjnolet I was waiting for my local compilation and test run to succeed before pushing, but as you can expect, compiling the neighbors test took a few hours. |
Wait, a few hours?!?! What type of environment / configuration are you using? How many cores are you using to compile? |
@gpucibot merge |
1 similar comment
@gpucibot merge |
rerun tests |
@cjnolet It looks like the CI errors are unrelated to the contents of this PR. |
rerun tests |
@Nyrio yep you are right about that. @ajschmidt8 has fixed the issue so we should be able to get this in today, assuming it passes. |
rerun tests |
2 similar comments
rerun tests |
rerun tests |
dots_along_rows
inann_utils.cuh
was in some cases more performant than the corresponding raft primitiverowNorm
, so I have improved that primitive in order to replacedots_along_rows
without performance regressions.rowNorm
for a row-major matrix callscoalescedReduction
, which I have modified to conditionally select one of the following code paths based on the input dimensions:main_op
is applied but notfinal_op
). In the second step, reduces the intermediate buffer using the thin kernel (this timefinal_op
is applied but notmain_op
).Other changes included in this PR:
main_op
such as an argmax, and only for the coalesced reduction I have added test cases withraft::KeyValuePair