Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Add quantization docs #3410

Merged
merged 27 commits into from
Feb 9, 2025
Merged

Conversation

Edenzzzz
Copy link
Contributor

@Edenzzzz Edenzzzz commented Feb 8, 2025

Motivation

Re-opens #3253 with reviews addressed.

Modifications

Checklist

  • Format your code according to the Code Formatting with Pre-Commit.
  • Add unit tests as outlined in the Running Unit Tests.
  • Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented Feb 8, 2025

cc @zhaochenyang20

--port 30000 --host 0.0.0.0
```

Our team is working on supporting more online quantization methods. We will soon support methods including but not limited to `["awq", "gptq", "marlin", "gptq_marlin", "awq_marlin", "bitsandbytes", "gguf"]`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means online quantization? Loading offline awq weights is already supported.

@zhaochenyang20
Copy link
Collaborator

Thanks. I will give credit to you, james and fan.

Copy link
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move it to reference and add it in index.rst.

@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented Feb 9, 2025

@zhaochenyang20 Added

docs/references/quantization.md Outdated Show resolved Hide resolved
docs/references/quantization.md Outdated Show resolved Hide resolved
@zhyncs zhyncs merged commit 0af1d23 into sgl-project:main Feb 9, 2025
10 of 11 checks passed
@Edenzzzz Edenzzzz deleted the quantization_docs branch February 9, 2025 18:19
@zhyncs
Copy link
Member

zhyncs commented Feb 9, 2025

FYI In the upcoming release, we will default to using sgl-kernel's W8A8 Int8 and FP8 instead of vLLM's W8A8. We have achieved best performance across on all sm80, sm89 and sm90.

@zhaochenyang20
Copy link
Collaborator

Great. Wait, we need to change this a bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants