llama1-3 模型结构详解 - Zhang #209
Replies: 2 comments
-
d 输入 tokens 的数量,大小为 batch_size * seq_len |
Beta Was this translation helpful? Give feedback.
0 replies
-
MQA 和 GQA如果这样可以减少KV内存,不就意味着Q和KV的隐藏层长度是不一样的,一般的attention里面,QKV的linea layer是一样大小的,MQA 和 GQA的话,是不是Q和KV的linear layer是不一样大小的。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
llama1-3 模型结构详解 - Zhang
从事 LLM 推理部署、视觉算法开发、模型压缩部署以及算法SDK开发工作,终身学习践行者。Transformerllama1-3 模型结构代码如何实现,模型结构分析。
https://www.armcvai.cn/2024-10-21/llama1-3-model.html
Beta Was this translation helpful? Give feedback.
All reactions