[Performance]: Qwen2-VL-7B AWQ model performance #9863

zzf2grx · 2024-10-31T01:47:17Z

Proposal to improve performance

Hi~ I find the inference time of Qwen2-VL-7B AWQ is not improved too much compared to Qwen2-VL-7B. Do you have any suggestions about improving performance. Thank you!

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

DarkLight1337 · 2024-10-31T04:11:41Z

I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.

zzf2grx · 2024-11-01T02:40:35Z

I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.

But in lmdeploy, awq quantization models are about 2x fast compared to fp models. Is there any method to improve the speed of awq or other quantization models?

DarkLight1337 · 2024-11-01T04:14:15Z

I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.

But in lmdeploy, awq quantization models are about 2x fast compared to fp models. Is there any method to improve the speed of awq or other quantization models?

This is only a problem for Qwen2-VL in particular, because their image preprocessing is very slow. It should not be a problem for other AWQ models.

zzf2grx · 2024-11-01T09:35:51Z

I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.

But in lmdeploy, awq quantization models are about 2x fast compared to fp models. Is there any method to improve the speed of awq or other quantization models?

This is only a problem for Qwen2-VL in particular, because their image preprocessing is very slow. It should not be a problem for other AWQ models.

So is there any advice on how to improve the speed of image preprocessing?

DarkLight1337 · 2024-11-01T09:41:00Z

So is there any advice on how to improve the speed of image preprocessing?

You can try passing smaller images to the model.

zzf2grx added the performance Performance-related issues label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance]: Qwen2-VL-7B AWQ model performance #9863

[Performance]: Qwen2-VL-7B AWQ model performance #9863

zzf2grx commented Oct 31, 2024

DarkLight1337 commented Oct 31, 2024 •

edited

Loading

zzf2grx commented Nov 1, 2024

DarkLight1337 commented Nov 1, 2024

zzf2grx commented Nov 1, 2024

DarkLight1337 commented Nov 1, 2024

[Performance]: Qwen2-VL-7B AWQ model performance #9863

[Performance]: Qwen2-VL-7B AWQ model performance #9863

Comments

zzf2grx commented Oct 31, 2024

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

DarkLight1337 commented Oct 31, 2024 • edited Loading

zzf2grx commented Nov 1, 2024

DarkLight1337 commented Nov 1, 2024

zzf2grx commented Nov 1, 2024

DarkLight1337 commented Nov 1, 2024

DarkLight1337 commented Oct 31, 2024 •

edited

Loading