masked-attention 算法详解 - Zhang #201

2024-11-27T08:18:06Z

giscus[bot]
bot Nov 27, 2024

masked-attention 算法详解 - Zhang

从事 LLM 推理部署、视觉算法开发、模型压缩部署以及算法SDK开发工作,终身学习践行者。TransformerCasual Mask 机制的本质是为了构建下三角的注意力分数矩阵，从而实现因果模型只关注当前 token 与之前 token 的注意力关系，而不理会它与后续 token 的关系，即只

https://www.armcvai.cn/2024-11-10/masked-attention.html

yemyhdtrc6088 · 2024-11-27T08:18:10Z

yemyhdtrc6088
Nov 27, 2024 — with giscus

最近刚好在开发chunked prefill，刚好了解了一下attention_mask，在prompt被分块时，attention_mask的处理与正常还不一样。
1.现在在prefill阶段时，都是将多条prompt合并成一条input 推理，假设有三条prompt，长度分别是4 5 6,那对应的attention_maks 就是这样的[
[0,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[0,0,1,1,1,1,1,1,1,1,1,1,1,1,1],
[0,0,0,1,1,1,1,1,1,1,1,1,1,1,1],
[0,0,0,0,1,1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,0,1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,0,0,1,1,1,1,1,1,1,1,1],
[1,1,1,1,0,0,0,1,1,1,1,1,1,1,1],
[1,1,1,1,0,0,0,0,1,1,1,1,1,1,1],
[1,1,1,1,0,0,0,0,0,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,0,1,1,1,1,1]
[1,1,1,1,1,1,1,1,1,0,0,1,1,1,1],
[1,1,1,1,1,1,1,1,1,0,0,0,1,1,1],
[1,1,1,1,1,1,1,1,1,0,0,0,0,1,1],
[1,1,1,1,1,1,1,1,1,0,0,0,0,0,1],
[1,1,1,1,1,1,1,1,1,0,0,0,0,0,0],
]
2.在prompt被分块后，在一次prefill时需要读取上一个分块的kv_cache，同时也需要设置相应的attention_mask，假设一个分块的大小是8，
第一次可以处理p1 4个token,p2 4个token,对应mask是这样的[
[0,1,1,1,1,1,1,1],
[0,0,1,1,1,1,1,1],
[0,0,0,1,1,1,1,1],
[0,0,0,0,1,1,1,1],
[1,1,1,1,0,1,1,1],
[1,1,1,1,0,0,1,1],
[1,1,1,1,0,0,0,1],
[1,1,1,1,0,0,0,0],
]，这样p1里的句子只能看到他本身的token，看不到p2的token;
第二次prefill，处理p2剩余的1个token和p3的6个token，对应的mask是这样的[
[0,0,0,0,0,1,1,1,1,1,1],
[1,1,1,1,1,0,1,1,1,1,1],
[1,1,1,1,1,0,0,1,1,1,1],
[1,1,1,1,1,0,0,0,1,1,1],
[1,1,1,1,1,0,0,0,0,1,1],
[1,1,1,1,1,0,0,0,0,0,1],
[1,1,1,1,1,0,0,0,0,0,0],
]，第一行前4个0对应的是p2在第一次prefill时被处理的4个token，这样虽然分块后，一样可以和前面的token计算注意力

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

masked-attention 算法详解 - Zhang #201

{{title}}

Replies: 1 comment

{{title}}

Select a reply

masked-attention 算法详解 - Zhang #201

giscus[bot] bot Nov 27, 2024

masked-attention 算法详解 - Zhang

Replies: 1 comment

yemyhdtrc6088 Nov 27, 2024 — with giscus

giscus[bot]
bot Nov 27, 2024

yemyhdtrc6088
Nov 27, 2024 — with giscus