You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the solution you'd like
Consider adding SM90 support to the Grouped GEMM kernel in cutlass. It's currently using SM80. Grouped GEMM is important for training MoE models.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Grouped GEMM using cutlass is ~30% slower than a for-loop with cuBLAS GEMM on SM90 (H100). Implementation of grouped GEMM using cutlass and cuBLAS can be found here https://github.com/tgale96/grouped_gemm/blob/main/csrc/grouped_gemm.cu
Describe the solution you'd like
Consider adding SM90 support to the Grouped GEMM kernel in cutlass. It's currently using SM80. Grouped GEMM is important for training MoE models.
The text was updated successfully, but these errors were encountered: