-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Add quantization docs #3410
Conversation
docs/backend/quantization.md
Outdated
--port 30000 --host 0.0.0.0 | ||
``` | ||
|
||
Our team is working on supporting more online quantization methods. We will soon support methods including but not limited to `["awq", "gptq", "marlin", "gptq_marlin", "awq_marlin", "bitsandbytes", "gguf"]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this means online quantization? Loading offline awq weights is already supported.
Thanks. I will give credit to you, james and fan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move it to reference and add it in index.rst
.
@zhaochenyang20 Added |
FYI In the upcoming release, we will default to using sgl-kernel's W8A8 Int8 and FP8 instead of vLLM's W8A8. We have achieved best performance across on all sm80, sm89 and sm90. |
Great. Wait, we need to change this a bit |
Motivation
Re-opens #3253 with reviews addressed.
Modifications
Checklist