About FlashAttention #69

ZekaiGalaxy · 2024-02-29T09:59:05Z

Thank you for your great work :) !

Here is my question: I tried to follow the instructions but failed in the flash-attention related steps. According to this issue, V100 are not supported.

So I wonder the efficiency gain without the flash-attention module, or are there any methods to surpass the above issue and achieve comparable performance on V100s?

Thank you!

oahzxl · 2024-02-29T10:28:49Z

memory efficient attention from xformers may be a good choice

bhack · 2024-03-08T02:38:19Z

Can I ask you why you are still using the external flash attention?
torch.nn.functional.scaled_dot_product_attention has already a flashattention-2 implementation:
https://pytorch.org/docs/2.2/generated/torch.nn.functional.scaled_dot_product_attention.html

oahzxl · 2024-03-21T05:09:29Z

its implemtation sometimes is slower than flashattn on device < H100

bhack · 2024-03-21T11:10:12Z

Probably we could monitor this
pytorch/pytorch#120642

oahzxl · 2024-09-12T16:39:19Z

integrate into torch now!

KKZ20 added the question Further information is requested label Mar 4, 2024

oahzxl closed this as completed Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About FlashAttention #69

About FlashAttention #69

ZekaiGalaxy commented Feb 29, 2024

oahzxl commented Feb 29, 2024

bhack commented Mar 8, 2024

oahzxl commented Mar 21, 2024

bhack commented Mar 21, 2024

oahzxl commented Sep 12, 2024

About FlashAttention #69

About FlashAttention #69

Comments

ZekaiGalaxy commented Feb 29, 2024

oahzxl commented Feb 29, 2024

bhack commented Mar 8, 2024

oahzxl commented Mar 21, 2024

bhack commented Mar 21, 2024

oahzxl commented Sep 12, 2024