-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About FlashAttention #69
Comments
memory efficient attention from xformers may be a good choice |
Can I ask you why you are still using the external flash attention? |
its implemtation sometimes is slower than flashattn on device < H100 |
Probably we could monitor this |
integrate into torch now! |
Thank you for your great work :) !
Here is my question: I tried to follow the instructions but failed in the flash-attention related steps. According to this issue, V100 are not supported.
So I wonder the efficiency gain without the flash-attention module, or are there any methods to surpass the above issue and achieve comparable performance on V100s?
Thank you!
The text was updated successfully, but these errors were encountered: