From e7cccbeeb2d63abe31db3cd6cc720216ecb7c4d6 Mon Sep 17 00:00:00 2001
From: Lianmin Zheng <lianminzheng@gmail.com>
Date: Thu, 28 Nov 2024 23:14:06 -0800
Subject: [PATCH] Update backend.md

---
 docs/backend/backend.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/backend/backend.md b/docs/backend/backend.md
index a2995455f3d..8f34eb7ce56 100644
--- a/docs/backend/backend.md
+++ b/docs/backend/backend.md
@@ -80,7 +80,7 @@ python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct
 python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --chunked-prefill-size 4096
 ```
 - To enable torch.compile acceleration, add `--enable-torch-compile`. It accelerates small models on small batch sizes. This does not work for FP8 currently.
-- To enable torchao quantization, add `--torchao-config int4wo-128`. It supports various quantization strategies.
+- To enable torchao quantization, add `--torchao-config int4wo-128`. It supports other [quantization strategies (INT8/FP8)](https://github.com/sgl-project/sglang/blob/9a00e6f453e764c0b286e2a62f652a1202c0bf9c/python/sglang/srt/server_args.py#L671) as well.
 - To enable fp8 weight quantization, add `--quantization fp8` on a fp16 checkpoint or directly load a fp8 checkpoint without specifying any arguments.
 - To enable fp8 kv cache quantization, add `--kv-cache-dtype fp8_e5m2`.
 - If the model does not have a chat template in the Hugging Face tokenizer, you can specify a [custom chat template](../references/custom_chat_template.md).