feat: modify group-gemm stage number #497

jeejeelee · 2024-09-13T03:31:47Z

The current group-gemm configuration raises the following error on NVIDIA 3090 :

RuntimeError: cutlass group_gemm.initialize failed: Error Internal

Modify the stage of group-gemm to 4, reduce the size of dynamic smem, so that it can be called on GPUs like the 3090.

Additionally, I also did a simple comparison on the A800. Modifying the stage to 4 can still slightly improve the performance of group-gemm.

Refer to: https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/gemm_grouped_sm80.cu

zhyncs · 2024-09-13T03:34:19Z

include/flashinfer/group_gemm/wrapper.cuh

@@ -85,7 +85,7 @@ cudaError_t CutlassSegmentGEMMWrapper(CutlassSegmentGEMMHandler* handler, DType*
        cutlass::gemm::GemmShape<16, 8, 16>,     // Instruction Shape
        cutlass::epilogue::thread::LinearCombination<DType, 8, float, float>,  // Epilogue
        cutlass::gemm::threadblock::GemmBatchedIdentityThreadblockSwizzle,     // Swizzling Operator
-        8                                                                      // Stages
+        4                                                                      // Stages


Adjust the size based on different shared memory instead of directly changing 8 to 4. cc @yzh119

The value 4 was chosen based on the configuration in cutlass, which should be compatible with most scenarios. 😄

The example of the cutlass you provided is the SM80, not the SM90.

yzh119

LGTM, @jeejeelee thanks for the PR!

I'll merge this at the moment, some todo items:

add benchmarks for group gemm
select different configurations according to input shapes and cuda arch.
add sm90 cutlass group gemm (I have a wip branch but don't have time to work on it at the moment, it will be great if some people in the community can take it over).

Modify group-gemm stage number

4dfe65a

zhyncs reviewed Sep 13, 2024

View reviewed changes

yzh119 approved these changes Sep 13, 2024

View reviewed changes

yzh119 merged commit 52dab1d into flashinfer-ai:main Sep 13, 2024

github-actions bot mentioned this pull request Sep 13, 2024

chore(main): release 0.2.0 #476

Open

jeejeelee deleted the modify-group-gemm branch September 14, 2024 03:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: modify group-gemm stage number #497

feat: modify group-gemm stage number #497

jeejeelee commented Sep 13, 2024

zhyncs Sep 13, 2024

jeejeelee Sep 13, 2024

zhyncs Sep 13, 2024

yzh119 left a comment

feat: modify group-gemm stage number #497

feat: modify group-gemm stage number #497

Conversation

jeejeelee commented Sep 13, 2024

zhyncs Sep 13, 2024

Choose a reason for hiding this comment

jeejeelee Sep 13, 2024

Choose a reason for hiding this comment

zhyncs Sep 13, 2024

Choose a reason for hiding this comment

yzh119 left a comment

Choose a reason for hiding this comment