Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]: Qwen2-VL-7B AWQ model performance #9863

Open
1 task done
zzf2grx opened this issue Oct 31, 2024 · 5 comments
Open
1 task done

[Performance]: Qwen2-VL-7B AWQ model performance #9863

zzf2grx opened this issue Oct 31, 2024 · 5 comments
Labels
performance Performance-related issues

Comments

@zzf2grx
Copy link

zzf2grx commented Oct 31, 2024

Proposal to improve performance

Hi~ I find the inference time of Qwen2-VL-7B AWQ is not improved too much compared to Qwen2-VL-7B. Do you have any suggestions about improving performance. Thank you!

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@zzf2grx zzf2grx added the performance Performance-related issues label Oct 31, 2024
@DarkLight1337
Copy link
Member

DarkLight1337 commented Oct 31, 2024

I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.

@zzf2grx
Copy link
Author

zzf2grx commented Nov 1, 2024

I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.

But in lmdeploy, awq quantization models are about 2x fast compared to fp models. Is there any method to improve the speed of awq or other quantization models?

@DarkLight1337
Copy link
Member

I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.

But in lmdeploy, awq quantization models are about 2x fast compared to fp models. Is there any method to improve the speed of awq or other quantization models?

This is only a problem for Qwen2-VL in particular, because their image preprocessing is very slow. It should not be a problem for other AWQ models.

@zzf2grx
Copy link
Author

zzf2grx commented Nov 1, 2024

I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details.

But in lmdeploy, awq quantization models are about 2x fast compared to fp models. Is there any method to improve the speed of awq or other quantization models?

This is only a problem for Qwen2-VL in particular, because their image preprocessing is very slow. It should not be a problem for other AWQ models.

So is there any advice on how to improve the speed of image preprocessing?

@DarkLight1337
Copy link
Member

So is there any advice on how to improve the speed of image preprocessing?

You can try passing smaller images to the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related issues
Projects
None yet
Development

No branches or pull requests

2 participants