Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fused L2 1-NN based on cutlass 3xTF32 / DMMA #1118

Merged
merged 80 commits into from
May 16, 2023

Conversation

mdoijade
Copy link
Contributor

@mdoijade mdoijade commented Dec 30, 2022

-- 3xTF32 & DMMA cutlass based persistent FusedL2NN kernel version loosely based on grouped gemm but customized for single problem size.
-- as the value of k increases the performance benefit of this implementation gets better.
for k==64 upto 1.3x, for k ==128 upto 1.53x, k == 256, up to 1.67x.
-- for all the sizes of k this kernel out performs previous implementation.
-- attaching the results of FusedL2NN Benchmark of previous implementation with this cutlass version.

@rapids-bot
Copy link

rapids-bot bot commented Dec 30, 2022

Pull requests from external contributors require approval from a rapidsai organization member with write or admin permissions before CI can begin.

@github-actions github-actions bot added the cpp label Dec 30, 2022
@mdoijade mdoijade marked this pull request as ready for review January 4, 2023 10:43
@mdoijade mdoijade requested a review from a team as a code owner January 4, 2023 10:43
@mdoijade
Copy link
Contributor Author

mdoijade commented Jan 4, 2023

@mdoijade mdoijade changed the title WIP: Fused L2 1-NN based on cutlass 3xTF32 / DMMA Fused L2 1-NN based on cutlass 3xTF32 / DMMA Jan 4, 2023
@mdoijade
Copy link
Contributor Author

mdoijade commented Jan 4, 2023

@cjnolet
Copy link
Member

cjnolet commented Jan 19, 2023

/okay to test

@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 19, 2023
@cjnolet
Copy link
Member

cjnolet commented Jan 19, 2023

/ok to test

@mdoijade mdoijade requested review from a team as code owners February 3, 2023 09:27
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mdoijade for the updates! The PR has changed significantly since the previous review round to enable an impressive speedup. Here are the first batch of comments of my second review.

This PR adds customized cutlass headers to implement the fused L2 NN operation.

While the changeset seem to be large, it is actually much smaller, if we compare the new cutlass headers to their original version (based on the reference added to files, and after applying raft formatting to the originals). Comparing that way makes it easier to follow how the cutlass code was adapted to our needs, and reveals a clean implementation. Great work, I have only smaller comments (so far)!

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mahesh for the updates so far. Here is my second batch of comments.

@cjnolet
Copy link
Member

cjnolet commented May 15, 2023

I've lost track of where we are with this PR. Do you guys think this will make it into 23.06? (Burndown is in 3 days).

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mahesh for the updates!

@cjnolet, the PR is in a good shape, most of the issues have been addressed, it shall make it to 23.06.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mahesh for resolving the issues! The PR looks good to me.

@cjnolet
Copy link
Member

cjnolet commented May 16, 2023

/merge

@rapids-bot rapids-bot bot merged commit a1d1fd6 into rapidsai:branch-23.06 May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
Development

Successfully merging this pull request may close these issues.

6 participants