Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: modify group-gemm stage number (#497)
The current group-gemm configuration raises the following error on NVIDIA 3090 : ```shell RuntimeError: cutlass group_gemm.initialize failed: Error Internal ``` Modify the stage of group-gemm to 4, reduce the size of dynamic smem, so that it can be called on GPUs like the 3090. Additionally, I also did a simple comparison on the A800. Modifying the stage to 4 can still slightly improve the performance of group-gemm. Refer to: https://github.com/NVIDIA/cutlass/blob/main/test/unit/gemm/device/gemm_grouped_sm80.cu
- Loading branch information