[NVIDIA] Use separate FP8 einsum instances in MoE #80

kaixih · 2024-07-09T21:50:54Z

We found in the original version, the same fp8 einsum op is reused in the MoE by dispatch, FFN, combine gemm ops, meaning they will share the same fp8 meta params, which is wrong.

This PR fixed this issue by cloning the fp8 einsum to each of the gemm in the MoE.

cc. @zhangqiaorjc @abhinavgoel95 @hx89

kaixih · 2024-07-10T20:20:19Z

Sure. Let's move the conversation here.

@kaixih could you add a config option in MoE transformer that preserves the old behavior by default

i don't want to break existing checkpoints

I think by default the einsum_tpl is configed to be jnp.einsum and when users choose to set USE_FP8 in paxml, the fp8 einsum will be configed into the MoE layer.

kaixih · 2024-07-15T16:17:17Z

@zhangqiaorjc Please let me know if I miss your point of the above comment.

kaixih · 2024-07-19T21:34:50Z

We noticed that there might be an issue of duplicated quantization over the activation for the first gemm in the gated FFN.

I fixed it by proposing a new gated_einsum where we can control the quantization only applied once.

kaixih changed the title ~~[NVIDIA] [Draft] Use separate FP8 einsum instances in MoE~~ [NVIDIA] Use separate FP8 einsum instances in MoE Jul 9, 2024

zhangqiaorjc added the pull ready label Jul 10, 2024

zhangqiaorjc mentioned this pull request Jul 10, 2024

[NVIDIA] Add a custom layer for cudnn flash attention #53

Merged

kaixih added 2 commits July 19, 2024 20:42

Use separate einsum instances in MoE

7e6f0dd

fix duplicated quantization

cf12a06

kaixih force-pushed the fix_fp8_einsum branch from 595447c to cf12a06 Compare July 19, 2024 20:44

minor fix

9217310

Minimize duplicated code

203e9f7

copybara-service bot merged commit da4fe8d into google:main Jul 24, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Use separate FP8 einsum instances in MoE #80

[NVIDIA] Use separate FP8 einsum instances in MoE #80

kaixih commented Jul 9, 2024

kaixih commented Jul 10, 2024

kaixih commented Jul 15, 2024

kaixih commented Jul 19, 2024

[NVIDIA] Use separate FP8 einsum instances in MoE #80

[NVIDIA] Use separate FP8 einsum instances in MoE #80

Conversation

kaixih commented Jul 9, 2024

kaixih commented Jul 10, 2024

kaixih commented Jul 15, 2024

kaixih commented Jul 19, 2024