Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Qwen2-VL-7B with sglang Performance Degradation #3041

Closed
5 tasks done
yileld opened this issue Jan 22, 2025 · 14 comments
Closed
5 tasks done

[Bug] Qwen2-VL-7B with sglang Performance Degradation #3041

yileld opened this issue Jan 22, 2025 · 14 comments
Assignees

Comments

@yileld
Copy link
Contributor

yileld commented Jan 22, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

As #2112 mentioned, Qwen2-VL with sglang Performance is bad.
So I tested in ChartQA_TEST dataset with sglang and vllm, and the score is really different.
(I also test mme bench and MMMU dataset, in the reply below.)

This is sglang.

Image

and this is vllm

Image

By the way, dont use vllm version of 0.6.3.post1. The score will drop and speed is slow.

Reproduction

tested of ChartQA_TEST dataset

Environment

vllm 0.6.4.post1
vllm-flash-attn 2.6.1
flashinfer 0.1.6+cu121torch2.4
sglang 0.4.1.post7
torch 2.5.1
torchao 0.8.0
torchvision 0.20.1
transformers 4.46.2
triton 3.1.0

no flash-attention used. Dont use vllm version of 0.6.3.post1. The score will drop and speed is slow.

@merrymercy
Copy link
Contributor

cc @yizhang2077 can you take a look on this?

@zhaochenyang20
Copy link
Collaborator

Yeah. This could somehow happen since we do not have benchmarking in our CI for VLM? 🤔 @merrymercy

@zhaochenyang20 zhaochenyang20 self-assigned this Jan 23, 2025
@yizhang2077 yizhang2077 self-assigned this Jan 23, 2025
@yizhang2077
Copy link
Collaborator

yizhang2077 commented Jan 23, 2025

Reproduction

tested of ChartQA_TEST dataset

Could you give me a command or script for reproduction please?

@zhaochenyang20
Copy link
Collaborator

@yileld Could you share your command and codes for reproduction?

@yileld
Copy link
Contributor Author

yileld commented Jan 24, 2025

@yileld Could you share your command and codes for reproduction?

@yizhang2077
Evaluation code is a project, not a script, its not convenient to provide.
Besides, I tested MMMUval get 43.56, as Qwen2-VL official is 54.1.
And MME bench, also very different.

Image

So I think this is not only an exceptional case, other datasets can also be used for comparison.

@yizhang2077
Copy link
Collaborator

ok, thanks for your test, I will try it asap

@zhaochenyang20
Copy link
Collaborator

thanks so much!

@yileld
Copy link
Contributor Author

yileld commented Jan 24, 2025

ok, thanks for your test, I will try it asap

Thanks, any progress please let me know.

@zhaochenyang20
Copy link
Collaborator

@YerongLi will take on this. Thanks!

@yizhang2077
Copy link
Collaborator

yizhang2077 commented Jan 25, 2025

@yileld I take a test for qwen2-vl in mme bench,I find even though there is some difference with vllm, the score is close. It is confusing, can you try latest version again? I will try to eval MMMUval.

Image

@yileld
Copy link
Contributor Author

yileld commented Jan 25, 2025

@yileld I take a test for qwen2-vl in mme bench,I find even though there is some difference with vllm, the score is close. It is confusing, can you try latest version again?

Image

Can you share the version of python packages of my list? And GPU, mine is A800.

@yizhang2077
Copy link
Collaborator

yizhang2077 commented Jan 25, 2025

Can you share the version of python packages of my list? And GPU, mine is A800.

I also test on A800, tp=1
python env:

  • vllm 0.6.4.post1
  • vllm-flash-attn 2.6.1
  • torch 2.5.1+cu124
  • flashinfer 0.1.6+cu121torch2.4
  • torchao 0.5.0
  • sglang 0.4.1.post7 (latest main)
  • transformers 4.45.2
  • triton 3.1.0

test method

  • launch server python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct/ --chat-template qwen2-vl
  • rearange image python get_images.py, use eval tool python3 eval.py and python calculation.py --results_dir Qwen2-VL as following here (need little modification about eval.py)

@yileld
Copy link
Contributor Author

yileld commented Jan 26, 2025

Can you share the version of python packages of my list? And GPU, mine is A800.

I also test on A800, tp=1 python env:

  • vllm 0.6.4.post1
  • vllm-flash-attn 2.6.1
  • torch 2.5.1+cu124
  • flashinfer 0.1.6+cu121torch2.4
  • torchao 0.5.0
  • sglang 0.4.1.post7 (latest main)
  • transformers 4.45.2
  • triton 3.1.0

test method

  • launch server python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct/ --chat-template qwen2-vl
  • rearange image python get_images.py, use eval tool python3 eval.py and python calculation.py --results_dir Qwen2-VL as following here (need little modification about eval.py)

OK I find out thats because I didnt add '--chat-template qwen2-vl', but what template I will use if I didnt add it?

@yizhang2077
Copy link
Collaborator

yizhang2077 commented Jan 26, 2025

OK I find out thats because I didnt add '--chat-template qwen2-vl', but what template I will use if I didnt add it

It may use default template. I think in qwen2vl case you must add '--chat-template qwen2-vl' since qwen2vl image token is special from others, and qwen2vl code pad input by using image token. Do you think this issue can be close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants