Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch mma instruction shape to 1684 from current 1688 for 3xTF32 L2/cosine kernel #1057

Merged
merged 2 commits into from
Dec 2, 2022

Conversation

mdoijade
Copy link
Contributor

-- switch mma instruction shape to 1684 from current 1688 as it is always faster for all the inputs tried from DISTANCE_BENCH for L2 and cosine distances.
-- the speedup in best case is 1.37x, and at minimum it is 1.05x faster.

@mdoijade mdoijade requested a review from a team as a code owner November 30, 2022 16:49
@rapids-bot
Copy link

rapids-bot bot commented Nov 30, 2022

Pull requests from external contributors require approval from a rapidsai organization member with write or admin permissions before CI can begin.

@github-actions github-actions bot added the cpp label Nov 30, 2022
@mdoijade
Copy link
Contributor Author

cutlass_m16n8k4_vs_m16n8k8_3xtf32_pairwise_dist_kernel.xlsx

attaching the L2/Cosine perf from DISTANCE_BENCH, for m16n8k4 vs m16n8k8, for 3xTF32(fp32) kernel

@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 1, 2022
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mahesh, LGTM!

@cjnolet
Copy link
Member

cjnolet commented Dec 2, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit bee127a into rapidsai:branch-23.02 Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

3 participants