Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About FlashAttention #69

Closed
ZekaiGalaxy opened this issue Feb 29, 2024 · 5 comments
Closed

About FlashAttention #69

ZekaiGalaxy opened this issue Feb 29, 2024 · 5 comments
Labels
question Further information is requested

Comments

@ZekaiGalaxy
Copy link

Thank you for your great work :) !

Here is my question: I tried to follow the instructions but failed in the flash-attention related steps. According to this issue, V100 are not supported.

So I wonder the efficiency gain without the flash-attention module, or are there any methods to surpass the above issue and achieve comparable performance on V100s?

Thank you!

@oahzxl
Copy link
Collaborator

oahzxl commented Feb 29, 2024

memory efficient attention from xformers may be a good choice

@KKZ20 KKZ20 added the question Further information is requested label Mar 4, 2024
@bhack
Copy link

bhack commented Mar 8, 2024

Can I ask you why you are still using the external flash attention?
torch.nn.functional.scaled_dot_product_attention has already a flashattention-2 implementation:
https://pytorch.org/docs/2.2/generated/torch.nn.functional.scaled_dot_product_attention.html

@oahzxl
Copy link
Collaborator

oahzxl commented Mar 21, 2024

its implemtation sometimes is slower than flashattn on device < H100

@bhack
Copy link

bhack commented Mar 21, 2024

Probably we could monitor this
pytorch/pytorch#120642

@oahzxl
Copy link
Collaborator

oahzxl commented Sep 12, 2024

integrate into torch now!

@oahzxl oahzxl closed this as completed Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants