-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Qwen2-VL-7B with sglang Performance Degradation #3041
Comments
cc @yizhang2077 can you take a look on this? |
Yeah. This could somehow happen since we do not have benchmarking in our CI for VLM? 🤔 @merrymercy |
Could you give me a command or script for reproduction please? |
@yileld Could you share your command and codes for reproduction? |
@yizhang2077 So I think this is not only an exceptional case, other datasets can also be used for comparison. |
ok, thanks for your test, I will try it asap |
thanks so much! |
Thanks, any progress please let me know. |
@YerongLi will take on this. Thanks! |
@yileld I take a test for qwen2-vl in mme bench,I find even though there is some difference with vllm, the score is close. It is confusing, can you try latest version again? I will try to eval MMMUval. |
Can you share the version of python packages of my list? And GPU, mine is A800. |
I also test on A800, tp=1
test method
|
OK I find out thats because I didnt add '--chat-template qwen2-vl', but what template I will use if I didnt add it? |
It may use default template. I think in qwen2vl case you must add '--chat-template qwen2-vl' since qwen2vl image token is special from others, and qwen2vl code pad input by using image token. Do you think this issue can be close? |
Checklist
Describe the bug
As #2112 mentioned, Qwen2-VL with sglang Performance is bad.
So I tested in ChartQA_TEST dataset with sglang and vllm, and the score is really different.
(I also test mme bench and MMMU dataset, in the reply below.)
This is sglang.
and this is vllm
By the way, dont use vllm version of 0.6.3.post1. The score will drop and speed is slow.
Reproduction
tested of ChartQA_TEST dataset
Environment
vllm 0.6.4.post1
vllm-flash-attn 2.6.1
flashinfer 0.1.6+cu121torch2.4
sglang 0.4.1.post7
torch 2.5.1
torchao 0.8.0
torchvision 0.20.1
transformers 4.46.2
triton 3.1.0
no flash-attention used. Dont use vllm version of 0.6.3.post1. The score will drop and speed is slow.
The text was updated successfully, but these errors were encountered: