-
Notifications
You must be signed in to change notification settings - Fork 441
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PR #7751: Support cublasLt Fp8 Approx Gelu epilogue fusion.
Imported from GitHub PR #7751 Due to fast accumulation being turned on in the forward mode, the cublasLt fp8 gemm with gelu epilogue can efficiently operate with a fused kernel. Compared against the XLA-generated gelu kernel on H100, the performance demonstrates some improvement for size of [8192, 4096] x [4096, 16384] + gelu: Execution time for matmul using cublasLt and gelu (XLA): 1.28ms Execution time for matmul_gelu using cublasLt: 1.25ms Copybara import of the project: -- e8abce3 by Shu Wang <[email protected]>: Support cublasLt Fp8 Approx Gelu epilogue fusion. -- 818127c by shuw <[email protected]>: Remove F32 check -- 5ce3108 by shuw <[email protected]>: Improve based on review #1 Merging this change closes #7751 COPYBARA_INTEGRATE_REVIEW=#7751 from wenscarl:cublaslt_fp8_gelu 5ce3108 PiperOrigin-RevId: 591236441
- Loading branch information
1 parent
3f62ba1
commit 2724718
Showing
2 changed files
with
149 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters