Improvements of KVCache
and refactoring of subclasses of classes in model.py
#1867
Labels
enhancement
New feature or request
Studying your codebase (trying to learn about transformers in depth), I noted a few things that can be improved:
KVCache
: Second dimension of buffers should always ben_query_groups
. If1 < n_query_groups < n_head
, you are wasting memory. Easy to fix.KVCache
:forward
returns tensors with final dimensionmax_seq_length
. This is wasteful for the subsequence dot production attention computation. Can shorten this to a length that just covers all positions ininput_pos
. Relatively easy to fix.adapter.py
,adapter_v2.py
,lora.py
does a lot of copy and paste, which makes changing anything inmodel.py
hard. I'd refactor that, so that as much common code only lives inmodel.py
.Let me know if this makes sense.
Thanks for doing this project. I really understand the details of transformer models better now.
The text was updated successfully, but these errors were encountered: