-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance]: Qwen2-VL-7B AWQ model performance #9863
Comments
I think the inference time may be dominated by the preprocessing, so it might not be related to the model itself. See #9238 for more details. |
But in lmdeploy, awq quantization models are about 2x fast compared to fp models. Is there any method to improve the speed of awq or other quantization models? |
This is only a problem for Qwen2-VL in particular, because their image preprocessing is very slow. It should not be a problem for other AWQ models. |
So is there any advice on how to improve the speed of image preprocessing? |
You can try passing smaller images to the model. |
Proposal to improve performance
Hi~ I find the inference time of Qwen2-VL-7B AWQ is not improved too much compared to Qwen2-VL-7B. Do you have any suggestions about improving performance. Thank you!
Report of performance regression
No response
Misc discussion on performance
No response
Your current environment (if you think it is necessary)
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: