[NVIDIA] Add config option to use cudnn flash attention #73

kaixih · 2024-03-22T20:54:27Z

This PR is to allow users to enable the cudnn flash attention. The PR depends on google/praxis#53.

The preliminary results for the GPT3-5B, we can observe ~30% perf improve on 8xH100 GPUs.

With this PR, users can simply set USE_CUDNN_FLASH_ATTENTION=True in their config and then the attention part will be replaced with the cudnn flash attention.

cc. @nluehr @zhangqiaorjc

kaixih · 2024-07-08T20:45:46Z

The sdpa is now in the jax public API (see this PR) and we can use it through this custom praxis layer in this PR.

Then, this PR introduced a fiddle config option: USE_CUDNN_FLASH_ATTENTION to turn it on.

cc. @abhinavgoel95 for viz.

kaixih · 2024-07-15T16:16:26Z

Gentle ping. @zhangqiaorjc

kaixih force-pushed the cudnn_attention_dev branch from b3d08ad to e3c78c2 Compare April 1, 2024 18:34

kaixih force-pushed the cudnn_attention_dev branch from e3c78c2 to a845a6e Compare April 8, 2024 18:47

Add config to use cudnn attention

4ff92cb

kaixih force-pushed the cudnn_attention_dev branch from a845a6e to 4ff92cb Compare July 8, 2024 20:37

zhangqiaorjc added the pull ready Used to import PR as CL label Jul 10, 2024

copybara-service bot merged commit 9a061ee into google:main Jul 17, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Add config option to use cudnn flash attention #73

[NVIDIA] Add config option to use cudnn flash attention #73

kaixih commented Mar 22, 2024 •

edited

Loading

kaixih commented Jul 8, 2024

kaixih commented Jul 15, 2024

[NVIDIA] Add config option to use cudnn flash attention #73

[NVIDIA] Add config option to use cudnn flash attention #73

Conversation

kaixih commented Mar 22, 2024 • edited Loading

kaixih commented Jul 8, 2024

kaixih commented Jul 15, 2024

kaixih commented Mar 22, 2024 •

edited

Loading