Dao-AILab / flash-attention Public

Notifications You must be signed in to change notification settings
Fork 1.4k
Star 14.5k

Code
Issues 595
Pull requests 46
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: Dao-AILab/flash-attention

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

595 Open 545 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Is there any way to compile the codes with nvcc debug flag(-G)?

#1364 opened Dec 2, 2024 by Dev-Jahn

Triton Issues for Rotary flash_attn.layers.rotary.apply_rotary_emb_qkv_

#1362 opened Nov 29, 2024 by albertotono

Need tests/__init__.py for hopper/test_flash_attn.py

#1360 opened Nov 28, 2024 by hancheolcho

How to get attention score? "return_attn_probs=True" is not work.

#1357 opened Nov 25, 2024 by UnableToUseGit

How to assign ROCm architecture during pip installing

#1356 opened Nov 25, 2024 by deeptimhe

Does flash-attn support FP8 inference on L40-48G?

#1355 opened Nov 25, 2024 by LinJianping

Flashdecoding with appendKV might incorrect

#1354 opened Nov 24, 2024 by DD-DuDa

Unable to cast Python instance of type <class 'torch._subclasses.fake_tensor.FakeTensor'> to C++ type

#1351 opened Nov 21, 2024 by zwhe99

How could I use a query to calculate the attention with multiple k-v

#1350 opened Nov 21, 2024 by DongyuXu77

Question of the equation in Flash Attention 2 Paper

#1349 opened Nov 21, 2024 by jeffrey-sunh1

Issue with installing flash attention import flash_attn_2_cuda as flash_attn_cuda

#1348 opened Nov 20, 2024 by hahmad2008

FA3 Failed to initialize the TMA descriptor

#1343 opened Nov 20, 2024 by li-yi-dong

Assistance on implementing Flash Attention 2 for Turing

#1342 opened Nov 17, 2024 by samuelzxu

[Bug]: Perf slump after updating flash-attn 2.7.0 (with torch.compile using)

#1341 opened Nov 16, 2024 by Mnb66

Building a wheel for torch 2.5.0-2.5.1 with Python 3.10 and CUDA 12.4 on Windows has failed.

#1340 opened Nov 16, 2024 by lldacing

where can i download the whl for torch2.5 win10?

#1339 opened Nov 16, 2024 by czcz1024

v2.6.3's flash_attn_varlen_func runs faster than v2.7.0.post2's flash_Attn_varlen_func on H100

#1338 opened Nov 16, 2024 by complexfilter

2.6.3 is faster than 2.7.0 for flash-attn v2 CUDA fwd/bwd

#1335 opened Nov 14, 2024 by ds-kczerski

how to disable flash atten in python?

#1334 opened Nov 14, 2024 by hiyyg

Not possible to script flash_attn_2_cuda.varlen_fwd function

#1332 opened Nov 13, 2024 by ArtyoMKos

Prebuild flash attention wheels for linux!

#1330 opened Nov 12, 2024 by kunibald413

CUDA 12.1 Python 3.10 PyTorch 2.5.1 安装版本

#1327 opened Nov 11, 2024 by wuxi-dixi

Question about flashdecoding with appendKV

#1325 opened Nov 10, 2024 by DD-DuDa

In FA3, which specific Layout_K SW is used for smemO?

#1324 opened Nov 10, 2024 by ziyuhuang123

CUDA 12.6 Performance Issue

#1323 opened Nov 9, 2024 by rchardx

Previous 1 2 3 4 5 … 23 24 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly