[Docs] Add quantization docs #3410

Edenzzzz · 2025-02-08T23:26:28Z

Motivation

Re-opens #3253 with reviews addressed.

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

Edenzzzz · 2025-02-08T23:26:54Z

cc @zhaochenyang20

Edenzzzz · 2025-02-09T01:17:20Z

docs/backend/quantization.md

+    --port 30000 --host 0.0.0.0
+```
+
+Our team is working on supporting more online quantization methods. We will soon support methods including but not limited to `["awq", "gptq", "marlin", "gptq_marlin", "awq_marlin", "bitsandbytes", "gguf"]`


I think this means online quantization? Loading offline awq weights is already supported.

zhaochenyang20 · 2025-02-09T07:26:07Z

Thanks. I will give credit to you, james and fan.

zhaochenyang20

We should move it to reference and add it in index.rst.

Edenzzzz · 2025-02-09T18:11:15Z

@zhaochenyang20 Added

docs/references/quantization.md

zhyncs · 2025-02-09T18:22:13Z

FYI In the upcoming release, we will default to using sgl-kernel's W8A8 Int8 and FP8 instead of vLLM's W8A8. We have achieved best performance across on all sm80, sm89 and sm90.

zhaochenyang20 · 2025-02-09T18:29:41Z

Great. Wait, we need to change this a bit

yinfan98 and others added 18 commits February 1, 2025 22:17

Create quantization.md

29b580f

Create quantization.ipynb

0fc00f8

Update quantization.ipynb

24ee864

Update quantization.ipynb

89c390b

Update quantization.ipynb

1412ec9

Update quantization.ipynb

9777806

Update quantization.ipynb

020ef7a

Update quantization.ipynb

b2a713a

Update quantization.ipynb

2a8c33b

Update quantization.ipynb

1e771e7

Update quantization.ipynb

e55d75e

Update quantization.ipynb

3cc21e4

Update quantization.ipynb

1245321

Update quantization.ipynb

1fe89de

Update quantization.ipynb

e4f3253

Merge branch 'main' into quantize-docs

7bd7d2a

Cleanup docs

53e02d5

Merge branch 'main' into quantization_docs

304f589

Edenzzzz commented Feb 9, 2025

View reviewed changes

Merge branch 'main' into quantization_docs

7c63f7a

Edenzzzz added 2 commits February 9, 2025 07:51

Merge branch 'main' into quantization_docs

c5b19bb

Merge branch 'main' into quantization_docs

35a94b9

zhaochenyang20 reviewed Feb 9, 2025

View reviewed changes

add to index.rst:

cae3b27

Merge branch 'main' into quantization_docs

d26365b

zhyncs reviewed Feb 9, 2025

View reviewed changes

docs/references/quantization.md Outdated Show resolved Hide resolved

zhyncs reviewed Feb 9, 2025

View reviewed changes

docs/references/quantization.md Outdated Show resolved Hide resolved

zhyncs added 2 commits February 10, 2025 02:14

upd

26b51bb

upd

708bdc9

zhyncs reviewed Feb 9, 2025

View reviewed changes

docs/references/quantization.md Outdated Show resolved Hide resolved

docs/references/quantization.md Outdated Show resolved Hide resolved

zhyncs added 2 commits February 10, 2025 02:15

upd

d315399

upd

24167bd

zhyncs approved these changes Feb 9, 2025

View reviewed changes

zhyncs merged commit 0af1d23 into sgl-project:main Feb 9, 2025
10 of 11 checks passed

Edenzzzz deleted the quantization_docs branch February 9, 2025 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Add quantization docs #3410

[Docs] Add quantization docs #3410

Edenzzzz commented Feb 8, 2025 •

edited

Loading

Edenzzzz commented Feb 8, 2025

Edenzzzz Feb 9, 2025

zhaochenyang20 commented Feb 9, 2025

zhaochenyang20 left a comment

Edenzzzz commented Feb 9, 2025

zhyncs commented Feb 9, 2025

zhaochenyang20 commented Feb 9, 2025

[Docs] Add quantization docs #3410

[Docs] Add quantization docs #3410

Conversation

Edenzzzz commented Feb 8, 2025 • edited Loading

Motivation

Modifications

Checklist

Edenzzzz commented Feb 8, 2025

Edenzzzz Feb 9, 2025

Choose a reason for hiding this comment

zhaochenyang20 commented Feb 9, 2025

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Edenzzzz commented Feb 9, 2025

zhyncs commented Feb 9, 2025

zhaochenyang20 commented Feb 9, 2025

Edenzzzz commented Feb 8, 2025 •

edited

Loading