register custom op for flash attn and use from torch.ops #7536

youkaichao · 2024-08-15T00:17:18Z

No description provided.

github-actions · 2024-08-15T00:17:32Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

vllm/attention/backends/flash_attn.py

tests/kernels/test_flash_attn.py

youkaichao · 2024-08-16T00:13:23Z

vllm/attention/backends/flash_attn.py

+from vllm_flash_attn import flash_attn_with_kvcache as _flash_attn_with_kvcache
+
+
+@torch.library.custom_op("vllm::flash_attn_varlen_func", mutates_args=[])


confirmed with @WoosukKwon , these two functions do not mutate the input.

youkaichao · 2024-08-16T00:14:12Z

tests/kernels/test_flash_attn.py

+                              cache_seqlens=kv_lens_tensor,
+                              softcap=soft_cap if soft_cap is not None else 0,
+                          ),
+                          test_utils=("test_faketensor", ))


we can add test_schema later, after solving the OOM issue.

currently, I will get OOM when I test the schema, even though I'm using H100 80GB.

I think part of testing the schema involves copying all the inputs and doubling checking that they are (or aren't) mutated in agreement with the op schema. I don't know if that's what is causing the OOMs here tho.

bnellnm

lgtm!

…t#7536)

…t#7536) Signed-off-by: Alvant <[email protected]>

…t#7536)

youkaichao added 7 commits August 14, 2024 16:48

register

8649a5b

use

c2c8ca6

use

679b18a

manually mutate all

94a39cc

manually mutate all tensors

bdbbe76

add tests

5b64f2d

add tests

9d97f7b

youkaichao added 5 commits August 14, 2024 17:20

change import

506eed5

update tests

d9105aa

change args

f827ad3

change import

f0fe288

rename

8c322b0

bnellnm reviewed Aug 15, 2024

View reviewed changes

vllm/attention/backends/flash_attn.py Outdated Show resolved Hide resolved

bnellnm reviewed Aug 15, 2024

View reviewed changes

vllm/attention/backends/flash_attn.py Outdated Show resolved Hide resolved

bnellnm reviewed Aug 15, 2024

View reviewed changes

vllm/attention/backends/flash_attn.py Show resolved Hide resolved

bnellnm reviewed Aug 15, 2024

View reviewed changes

tests/kernels/test_flash_attn.py Show resolved Hide resolved

youkaichao added 4 commits August 15, 2024 15:23

fix register fake

755dbaf

add opcheck

fc2a4c2

fix alibi_slopes

495d2f0

update mutates_args

76c5cec

youkaichao commented Aug 16, 2024

View reviewed changes

youkaichao requested a review from bnellnm August 16, 2024 00:14

bnellnm approved these changes Aug 16, 2024

View reviewed changes

youkaichao added 2 commits August 15, 2024 21:46

add schema tests

45bb131

reduce number of heads to avoid OOM

ee8d426

youkaichao merged commit 54bd9a0 into vllm-project:main Aug 16, 2024
27 of 29 checks passed

youkaichao deleted the fa_registration branch August 16, 2024 05:38

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

register custom op for flash attn and use from torch.ops (vllm-projec…

518c49f

…t#7536)

zifeitong pushed a commit to zifeitong/vllm that referenced this pull request Aug 20, 2024

register custom op for flash attn and use from torch.ops (vllm-projec…

5e5e825

…t#7536)

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024

register custom op for flash attn and use from torch.ops (vllm-projec…

93678a7

…t#7536)

omrishiv pushed a commit to omrishiv/vllm that referenced this pull request Aug 26, 2024

register custom op for flash attn and use from torch.ops (vllm-projec…

8386de5

…t#7536)

youkaichao mentioned this pull request Oct 3, 2024

[misc] add forward context for attention #9029

Merged

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

register custom op for flash attn and use from torch.ops (vllm-projec…

873b662

…t#7536) Signed-off-by: Alvant <[email protected]>

KuntaiDu pushed a commit to KuntaiDu/vllm that referenced this pull request Nov 20, 2024

register custom op for flash attn and use from torch.ops (vllm-projec…

ed1c5b3

…t#7536)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

register custom op for flash attn and use from torch.ops #7536

register custom op for flash attn and use from torch.ops #7536

youkaichao commented Aug 15, 2024

github-actions bot commented Aug 15, 2024

youkaichao Aug 16, 2024

youkaichao Aug 16, 2024

bnellnm Aug 16, 2024

bnellnm left a comment

		from vllm_flash_attn import flash_attn_with_kvcache as _flash_attn_with_kvcache


		@torch.library.custom_op("vllm::flash_attn_varlen_func", mutates_args=[])

register custom op for flash attn and use from torch.ops #7536

register custom op for flash attn and use from torch.ops #7536

Conversation

youkaichao commented Aug 15, 2024

github-actions bot commented Aug 15, 2024

youkaichao Aug 16, 2024

Choose a reason for hiding this comment

youkaichao Aug 16, 2024

Choose a reason for hiding this comment

bnellnm Aug 16, 2024

Choose a reason for hiding this comment

bnellnm left a comment

Choose a reason for hiding this comment