Update backend.md

sgl-project · Nov 29, 2024 · e7cccbe · e7cccbe
1 parent b7038fe
commit e7cccbe
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/docs/backend/backend.md b/docs/backend/backend.md
@@ -80,7 +80,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --chunked-prefill-size 4096
 ```
 - To enable torch.compile acceleration, add `--enable-torch-compile`. It accelerates small models on small batch sizes. This does not work for FP8 currently.
-- To enable torchao quantization, add `--torchao-config int4wo-128`. It supports various quantization strategies.
+- To enable torchao quantization, add `--torchao-config int4wo-128`. It supports other [quantization strategies (INT8/FP8)](https://github.com/sgl-project/sglang/blob/9a00e6f453e764c0b286e2a62f652a1202c0bf9c/python/sglang/srt/server_args.py#L671) as well.
 - To enable fp8 weight quantization, add `--quantization fp8` on a fp16 checkpoint or directly load a fp8 checkpoint without specifying any arguments.
 - To enable fp8 kv cache quantization, add `--kv-cache-dtype fp8_e5m2`.
 - If the model does not have a chat template in the Hugging Face tokenizer, you can specify a [custom chat template](../references/custom_chat_template.md).