-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: LoRA support for Mixtral GPTQ and AWQ #5540
Comments
:) |
@StrikerRUS has the PR you mentioned handled your use case? |
@hmellor Nope. LoRA adapters still cannot be used with quantized Mixtral models. vllm/vllm/model_executor/models/mixtral.py Lines 294 to 300 in 27902d4
Even after adding that attribute and adjusting method arguments, vLLM crashes with an error about tensor shape mismatch. I guess some further work should be done to bring the LoRA support. |
I am facing a similar issue. Did you find any workaround @StrikerRUS ? |
@ksjadeja Switched to Llama3.1 😄 |
@hmellor Do you think this is going to get picked up by someone? |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
🚀 The feature, motivation and pitch
Please consider adding support for GPTQ and AWQ quantized Mixtral models.
I guess that after #4012 it's technically possible.
Alternatives
No response
Additional context
My Docker compose:
Error log:
The text was updated successfully, but these errors were encountered: