Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NVIDIA] Use separate FP8 einsum instances in MoE #80

Merged
merged 4 commits into from
Jul 24, 2024

Conversation

kaixih
Copy link
Contributor

@kaixih kaixih commented Jul 9, 2024

We found in the original version, the same fp8 einsum op is reused in the MoE by dispatch, FFN, combine gemm ops, meaning they will share the same fp8 meta params, which is wrong.

This PR fixed this issue by cloning the fp8 einsum to each of the gemm in the MoE.

cc. @zhangqiaorjc @abhinavgoel95 @hx89

@kaixih kaixih changed the title [NVIDIA] [Draft] Use separate FP8 einsum instances in MoE [NVIDIA] Use separate FP8 einsum instances in MoE Jul 9, 2024
@kaixih
Copy link
Contributor Author

kaixih commented Jul 10, 2024

Sure. Let's move the conversation here.

@kaixih could you add a config option in MoE transformer that preserves the old behavior by default

i don't want to break existing checkpoints

I think by default the einsum_tpl is configed to be jnp.einsum and when users choose to set USE_FP8 in paxml, the fp8 einsum will be configed into the MoE layer.

@kaixih
Copy link
Contributor Author

kaixih commented Jul 15, 2024

@zhangqiaorjc Please let me know if I miss your point of the above comment.

@kaixih
Copy link
Contributor Author

kaixih commented Jul 19, 2024

We noticed that there might be an issue of duplicated quantization over the activation for the first gemm in the gated FFN.

I fixed it by proposing a new gated_einsum where we can control the quantization only applied once.

@copybara-service copybara-service bot merged commit da4fe8d into google:main Jul 24, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants