[Bug] Qwen2-VL-7B with sglang Performance Degradation #3041

yileld · 2025-01-22T05:39:22Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

As #2112 mentioned, Qwen2-VL with sglang Performance is bad.
So I tested in ChartQA_TEST dataset with sglang and vllm, and the score is really different.
(I also test mme bench and MMMU dataset, in the reply below.)

This is sglang.

and this is vllm

By the way, dont use vllm version of 0.6.3.post1. The score will drop and speed is slow.

Reproduction

tested of ChartQA_TEST dataset

Environment

vllm 0.6.4.post1
vllm-flash-attn 2.6.1
flashinfer 0.1.6+cu121torch2.4
sglang 0.4.1.post7
torch 2.5.1
torchao 0.8.0
torchvision 0.20.1
transformers 4.46.2
triton 3.1.0

no flash-attention used. Dont use vllm version of 0.6.3.post1. The score will drop and speed is slow.

merrymercy · 2025-01-22T23:00:38Z

cc @yizhang2077 can you take a look on this?

zhaochenyang20 · 2025-01-23T08:47:53Z

Yeah. This could somehow happen since we do not have benchmarking in our CI for VLM? 🤔 @merrymercy

yizhang2077 · 2025-01-23T17:24:48Z

Reproduction

tested of ChartQA_TEST dataset

Could you give me a command or script for reproduction please？

zhaochenyang20 · 2025-01-23T18:18:17Z

@yileld Could you share your command and codes for reproduction?

yileld · 2025-01-24T02:46:07Z

@yileld Could you share your command and codes for reproduction?

@yizhang2077
Evaluation code is a project, not a script, its not convenient to provide.
Besides, I tested MMMUval get 43.56, as Qwen2-VL official is 54.1.
And MME bench, also very different.

So I think this is not only an exceptional case, other datasets can also be used for comparison.

yizhang2077 · 2025-01-24T03:15:03Z

ok, thanks for your test, I will try it asap

zhaochenyang20 · 2025-01-24T03:26:04Z

thanks so much!

yileld · 2025-01-24T05:34:17Z

ok, thanks for your test, I will try it asap

Thanks, any progress please let me know.

zhaochenyang20 · 2025-01-24T08:20:00Z

@YerongLi will take on this. Thanks!

yizhang2077 · 2025-01-25T11:38:26Z

@yileld I take a test for qwen2-vl in mme bench，I find even though there is some difference with vllm, the score is close. It is confusing, can you try latest version again? I will try to eval MMMUval.

yileld · 2025-01-25T11:47:12Z

@yileld I take a test for qwen2-vl in mme bench，I find even though there is some difference with vllm, the score is close. It is confusing, can you try latest version again?

Can you share the version of python packages of my list? And GPU, mine is A800.

yizhang2077 · 2025-01-25T11:59:45Z

Can you share the version of python packages of my list? And GPU, mine is A800.

I also test on A800, tp=1
python env:

vllm 0.6.4.post1
vllm-flash-attn 2.6.1
torch 2.5.1+cu124
flashinfer 0.1.6+cu121torch2.4
torchao 0.5.0
sglang 0.4.1.post7 (latest main)
transformers 4.45.2
triton 3.1.0

test method

launch server python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct/ --chat-template qwen2-vl
rearange image python get_images.py, use eval tool python3 eval.py and python calculation.py --results_dir Qwen2-VL as following here (need little modification about eval.py)

yileld · 2025-01-26T02:23:46Z

Can you share the version of python packages of my list? And GPU, mine is A800.

I also test on A800, tp=1 python env:

vllm 0.6.4.post1

vllm-flash-attn 2.6.1

torch 2.5.1+cu124

flashinfer 0.1.6+cu121torch2.4

torchao 0.5.0

sglang 0.4.1.post7 (latest main)

transformers 4.45.2

triton 3.1.0

test method

launch server python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct/ --chat-template qwen2-vl

rearange image python get_images.py, use eval tool python3 eval.py and python calculation.py --results_dir Qwen2-VL as following here (need little modification about eval.py)

OK I find out thats because I didnt add '--chat-template qwen2-vl', but what template I will use if I didnt add it?

yizhang2077 · 2025-01-26T02:30:37Z

OK I find out thats because I didnt add '--chat-template qwen2-vl', but what template I will use if I didnt add it

It may use default template. I think in qwen2vl case you must add '--chat-template qwen2-vl' since qwen2vl image token is special from others, and qwen2vl code pad input by using image token. Do you think this issue can be close?

zhaochenyang20 self-assigned this Jan 23, 2025

zhaochenyang20 added the high priority label Jan 23, 2025

yizhang2077 self-assigned this Jan 23, 2025

yizhang2077 closed this as completed Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Qwen2-VL-7B with sglang Performance Degradation #3041

[Bug] Qwen2-VL-7B with sglang Performance Degradation #3041

yileld commented Jan 22, 2025 •

edited

Loading

merrymercy commented Jan 22, 2025

zhaochenyang20 commented Jan 23, 2025

yizhang2077 commented Jan 23, 2025 •

edited

Loading

Reproduction

zhaochenyang20 commented Jan 23, 2025

yileld commented Jan 24, 2025 •

edited

Loading

yizhang2077 commented Jan 24, 2025

zhaochenyang20 commented Jan 24, 2025

yileld commented Jan 24, 2025

zhaochenyang20 commented Jan 24, 2025

yizhang2077 commented Jan 25, 2025 •

edited

Loading

yileld commented Jan 25, 2025

yizhang2077 commented Jan 25, 2025 •

edited

Loading

yileld commented Jan 26, 2025

yizhang2077 commented Jan 26, 2025 •

edited

Loading

[Bug] Qwen2-VL-7B with sglang Performance Degradation #3041

[Bug] Qwen2-VL-7B with sglang Performance Degradation #3041

Comments

yileld commented Jan 22, 2025 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

merrymercy commented Jan 22, 2025

zhaochenyang20 commented Jan 23, 2025

yizhang2077 commented Jan 23, 2025 • edited Loading

Reproduction

zhaochenyang20 commented Jan 23, 2025

yileld commented Jan 24, 2025 • edited Loading

yizhang2077 commented Jan 24, 2025

zhaochenyang20 commented Jan 24, 2025

yileld commented Jan 24, 2025

zhaochenyang20 commented Jan 24, 2025

yizhang2077 commented Jan 25, 2025 • edited Loading

yileld commented Jan 25, 2025

yizhang2077 commented Jan 25, 2025 • edited Loading

yileld commented Jan 26, 2025

yizhang2077 commented Jan 26, 2025 • edited Loading

yileld commented Jan 22, 2025 •

edited

Loading

yizhang2077 commented Jan 23, 2025 •

edited

Loading

yileld commented Jan 24, 2025 •

edited

Loading

yizhang2077 commented Jan 25, 2025 •

edited

Loading

yizhang2077 commented Jan 25, 2025 •

edited

Loading

yizhang2077 commented Jan 26, 2025 •

edited

Loading