New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[NVIDIA] Support BNTH input formats for the fused attention #20380

Merged

copybara-service merged 1 commit into jax-ml:main from kaixih:cudnn_attention_dev

Apr 7, 2024

Contributor

kaixih commented Mar 21, 2024

For enhanced performance and API flexibility, we've extended the functionality of dot_production_attention to accommodate QKV inputs in the BNTH format.

Cjkkkk reviewed

View reviewed changes

jax/_src/cudnn/fused_attention_stablehlo.py Outdated Show resolved Hide resolved

Cjkkkk reviewed

View reviewed changes

jax/_src/cudnn/fused_attention_stablehlo.py Show resolved Hide resolved

Cjkkkk reviewed

View reviewed changes

jax/_src/cudnn/fused_attention_stablehlo.py Outdated Show resolved Hide resolved

Cjkkkk reviewed

View reviewed changes

jax/_src/cudnn/fused_attention_stablehlo.py Outdated Show resolved Hide resolved

Cjkkkk reviewed

View reviewed changes

jax/_src/cudnn/fused_attention_stablehlo.py Show resolved Hide resolved

Cjkkkk reviewed

View reviewed changes

jax/_src/cudnn/fused_attention_stablehlo.py Outdated Show resolved Hide resolved

kaixih mentioned this pull request

[NVIDIA] Add a custom layer for cudnn flash attention google/praxis#53

Merged

yashk2810 reviewed

View reviewed changes

jax/_src/cudnn/fused_attention_stablehlo.py Outdated Show resolved Hide resolved

Cjkkkk reviewed

View reviewed changes

jax/_src/cudnn/fused_attention_stablehlo.py Outdated Show resolved Hide resolved

kaixih force-pushed the cudnn_attention_dev branch from 6915fb6 to 25d0f6b Compare

March 26, 2024 20:55

Contributor Author

kaixih commented Mar 26, 2024

Just rebased the change to resolve some conflicts. Also tried to minimize the changes.

Contributor

Cjkkkk commented Mar 27, 2024

@kaixih Thanks for the PR, LGTM.

Contributor Author

kaixih commented Mar 27, 2024

@Cjkkkk Do you know who was the reviewer for such changes to your previous attn changes? Maybe we can ping him/her?

hawkinsp requested a review from superbobry

March 27, 2024 21:05

Contributor

Cjkkkk commented Mar 27, 2024

@Cjkkkk Do you know who was the reviewer for such changes to your previous attn changes? Maybe we can ping him/her?

Peter just did the work :)

kaixih force-pushed the cudnn_attention_dev branch 2 times, most recently from 7baaeec to 18e3383 Compare

April 1, 2024 19:49

superbobry approved these changes

View reviewed changes

jax/_src/cudnn/fused_attention_stablehlo.py Outdated

+                BTNH = 0
+                BNTH = 1
+              def _normalize_layout(layout_str):

Collaborator

superbobry Apr 2, 2024

How about just layout? You can use a type annotation to document the type:

def _normalize_layout(layout: str) -> AttentionLayout:
  ...

Contributor Author

kaixih Apr 3, 2024

Yes, this is better. Fixed.

jax/_src/cudnn/fused_attention_stablehlo.py Outdated

-                  and (not is_training or q_seq_len % 64 == 0 and kv_seq_len % 64 == 0):
+              def check_qkv_layout(query, key, value, layout):
+                def assert_eq(a, b, c, msg):
+                  assert a == b == c, msg + f' must be same: {a}, {b}, {c}'

Collaborator

superbobry Apr 2, 2024

I can see that the old version also used assert for argument validation, but since you are changing the validation logic slightly, I would recommend using raise instead. For example:

if q_rank != 4:
  raise ValueError(f"Q must have a ran of 4, got {q_rank}")

Contributor Author

kaixih Apr 3, 2024

Done.

jax/_src/cudnn/fused_attention_stablehlo.py Outdated

+                Initially, it determines the attention weights by processing Q and K,
+                subsequently combining the outcomes using K. Throughout this function, we
+                utilize the following uppercase letters to represent specific parameters of
+                JTensor:

Collaborator

superbobry Apr 2, 2024

What's a JTensor? Perhaps you meant "array"?

Contributor Author

kaixih Apr 3, 2024

Done.

google-ml-butler bot added kokoro:force-run pull ready labels

kokoro-team removed the kokoro:force-run label

Collaborator

superbobry commented Apr 3, 2024

Can you squash the commits, please?


          Support BNTH input formats

0489eee

kaixih force-pushed the cudnn_attention_dev branch from 1a6866f to 0489eee Compare

April 3, 2024 20:48

Contributor Author

kaixih commented Apr 3, 2024 •

edited

Loading

Can you squash the commits, please?

Done. Also removed the trailing spaces pointed out by the failed lint tests.

Contributor Author

kaixih commented Apr 3, 2024

Now, it seems all tests pass.
@superbobry PTAL.

Contributor

Cjkkkk commented Apr 5, 2024

@superbobry Hi, any updates on this?

copybara-service bot merged commit 9a931af into jax-ml:main

9 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels