ALiBi for the non-flash code path #858

Markus28 · 2024-02-29T16:25:37Z

Currently, ALiBi is only implemented for the flash code-path. This PR implements ALiBi also for the non-flash code path.

Here is a short script to verify that the behaviour matches the behaviour of the implementation with flash attention. To run this, one needs to merge the changes from #846

import torch
from flash_attn.modules.mha import get_alibi_slopes , FlashSelfAttention, SelfAttention

CAUSAL = False
qkv = torch.randn((16, 55, 3, 4, 32)).to('cuda', torch.float16)

module = SelfAttention(alibi_slopes=torch.tensor(get_alibi_slopes(4)), causal=CAUSAL).to('cuda', torch.float16)
module_flash = FlashSelfAttention(alibi_slopes=torch.tensor(get_alibi_slopes(4)), causal=CAUSAL).to('cuda', torch.float16)
result = module(qkv)
result_flash = module_flash(qkv)
print((result - result_flash) / result_flash * 100)

Markus28 added 4 commits February 29, 2024 16:50

feat: started implementing ALiBi for non-flash attention

92632dc

fixed buffer registration, refactoring of variable names

732beaa

feat: some further refactoring

e156a15

fix: don't move linear biases to device

4eb887a

Markus28 changed the title ~~feat: ALiBi for the non-flash code path~~ ALiBi for the non-flash code path Feb 29, 2024

Markus28 marked this pull request as ready for review February 29, 2024 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ALiBi for the non-flash code path #858

ALiBi for the non-flash code path #858

Markus28 commented Feb 29, 2024

ALiBi for the non-flash code path #858

Are you sure you want to change the base?

ALiBi for the non-flash code path #858

Conversation

Markus28 commented Feb 29, 2024