[Misc] Enhance attention selector #4751

WoosukKwon · 2024-05-10T22:08:17Z

This PR is to provide more information (such as block size and kv cache dtype) to attention backend selector so that it can be used to find the appropriate attention backend. Also, the PR moves kv_cache_dtype from AttentionMetadata to Attention.

This PR is a prerequisite for #3648

rkooo567

LGTM! One question regarding why we change the interface of get_attn_backend!

rkooo567 · 2024-05-13T14:43:58Z

vllm/attention/layer.py

@@ -29,10 +30,22 @@ def __init__(
        num_kv_heads: Optional[int] = None,
        alibi_slopes: Optional[List[float]] = None,
        sliding_window: Optional[int] = None,
+        cache_config: Optional[CacheConfig] = None,


QQ: when is this None? (should we just not allow None here? Since cache config is supposed to be created by default?)

Good point. In most situations, cache_config isn't None. However, I wanted to provide the flexibility to initialize the model without cache_config, which can be particularly useful in niche scenarios such as testing the model loader. For instance, some tests in test_tensorizer only use the HF config to initialize the model, without setting up a CacheConfig or ModelConfig. Additionally, allowing cache_config to be optional helps maintain consistency with the HF model interface, where a model can be instantiated solely with the HF config.
I think this adjustment makes the setup more versatile and aligns better with existing practices.

rkooo567 · 2024-05-13T14:45:49Z

vllm/attention/selector.py

        from vllm.attention.backends.flashinfer import FlashInferBackend
        return FlashInferBackend
    else:
        raise ValueError("Invalid attention backend.")


-def _which_attn_to_use(dtype: torch.dtype) -> _Backend:
+def _which_attn_to_use(
+    num_heads: int,


is this change necessary? (seems like most of args are not used?)

Good question! It's actually for the PR #3648 and future PRs where we need to consider block sizes and KV cache dtypes in selecting the backend.

vllm/attention/layer.py

WoosukKwon added 5 commits May 10, 2024 21:43

Enhance attention selector

21945e3

Fix

72d5155

Revert

c49d015

Fix CPU

8a8bb1c

Fix

1ff4fbd

WoosukKwon requested review from LiuXiaoxuanPKU and rkooo567 May 10, 2024 22:08

WoosukKwon added 8 commits May 10, 2024 22:32

Fix

adf545a

Fix CPU

974a4f8

Fix

8a629e5

Fix

d622b3e

Fix Llama

d27c139

Fix

e4fa494

yapf

e2a4ba0

Update models

ee71445

WoosukKwon requested a review from cadedaniel May 11, 2024 02:06

WoosukKwon added 3 commits May 11, 2024 20:26

Fix

974ed4d

Fix

ec72063

Fix

180acaa

rkooo567 approved these changes May 13, 2024

View reviewed changes

WoosukKwon added 3 commits May 13, 2024 15:04

Merge branch 'main' into attn-selector

4a19d96

Add comment

1c2ad0a

Remove kv_cache_dtype

8cfb402

WoosukKwon merged commit 0fca3cd into main May 13, 2024
47 of 48 checks passed

WoosukKwon deleted the attn-selector branch May 13, 2024 17:47

WoosukKwon mentioned this pull request May 16, 2024

[Bugfix] Fix FP8 KV cache support #4869

Merged

robertgshaw2-redhat pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024

[Misc] Enhance attention selector (vllm-project#4751)

c944527

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

[Misc] Enhance attention selector (vllm-project#4751)

98d62a2

garycaokai pushed a commit to garycaokai/vllm that referenced this pull request Jun 12, 2024

merge vllm-project#4893 vllm-project#4751

5054a19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] Enhance attention selector #4751

[Misc] Enhance attention selector #4751

WoosukKwon commented May 10, 2024

rkooo567 left a comment

rkooo567 May 13, 2024

WoosukKwon May 13, 2024

rkooo567 May 13, 2024

WoosukKwon May 13, 2024

[Misc] Enhance attention selector #4751

[Misc] Enhance attention selector #4751

Conversation

WoosukKwon commented May 10, 2024

rkooo567 left a comment

Choose a reason for hiding this comment

rkooo567 May 13, 2024

Choose a reason for hiding this comment

WoosukKwon May 13, 2024

Choose a reason for hiding this comment

rkooo567 May 13, 2024

Choose a reason for hiding this comment

WoosukKwon May 13, 2024

Choose a reason for hiding this comment