Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote wheels as alternative to pip install flash-attn #1297

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

simonw
Copy link

@simonw simonw commented Oct 25, 2024

I fell into the trap of trying to run pip install flash-attn when it would have been much faster to use a wheel from the releases page.

@simonw
Copy link
Author

simonw commented Oct 25, 2024

The problem with https://github.com/Dao-AILab/flash-attention/releases is that it shows 83 options and you have to be a DEEP Python/PyTorch/Linux expert to correctly pick the right one.

So ideally the README would include instructions (or a link to instructions) on how best to decide which wheel to use. Even better would be a little Python program you can run that tells you which one to use.

@simonw
Copy link
Author

simonw commented Oct 25, 2024

Here's a Python one-liner I got Claude to knock up which is almost certainly NOT correct but at least illustrates the concept:

python -c "
import sys, torch, platform;
py_ver = f'cp{sys.version_info.major}{sys.version_info.minor}';
cuda_ver = 'cu123' if torch.version.cuda and torch.version.cuda.startswith('12.3') else 'cu118';
abi = 'TRUE' if hasattr(sys, '_emscripten_info') or platform.libc_ver()[0] == 'glibc' else 'FALSE';
print(f'flash_attn-2.6.3+{cuda_ver}torch{torch.__version__.split(\"+\")[0]}cxx11abi{abi}-{py_ver}-{py_ver}-linux_x86_64.whl')
"

Outputs something like this:

flash_attn-2.6.3+cu118torch2.0.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Someone who knows what they're doing could write a version of this that works!

@simonw
Copy link
Author

simonw commented Oct 25, 2024

Huh, weird, it looks like setup.py is meant to automatically find them!

flash-attention/setup.py

Lines 54 to 56 in c1d146c

BASE_WHEEL_URL = (
"https://github.com/Dao-AILab/flash-attention/releases/download/{tag_name}/{wheel_name}"
)

I wonder why that didn't work for me running a Google Colab notebook with a A100.

This PR is probably no good as it stands, but additional documentation to help people maximize the chance of using a pre-built wheel would definitely be useful.

@bheilbrun
Copy link

I wonder why that didn't work for me running a Google Colab notebook with a A100.

Ah, I wonder if you caught the torch 2.5.0 upgrade. I just created a new colab and it has torch 2.5.0, a version for which flash-attn doesn't have a wheel.

Thanks for surfacing these links, it helped me power through a similar issue (cuda 12.5 in my case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants