intel · WeiweiZhang1 · Nov 22, 2024 · Nov 22, 2024 · Nov 22, 2024 · Nov 22, 2024
diff --git a/README.md b/README.md
@@ -26,14 +26,13 @@ more accuracy data and recipes across various models.
 <div align="left">
 
 ## What's New
-* [2024/11] We provide some tips and tricks for LLM&VLM quantization, please check out [this file](./docs/tips_and_tricks.md).
+* [2024/11] We provide experimental support for VLLM quantization, please check out [MLLM README](./auto_round/mllm/README.md)
+* [2024/11] We provide some tips and tricks for LLM&VLM quantization, please check out [this file](./docs/tips_and_tricks.md)
 * [2024/10] AutoRound has been integrated to [torch/ao](https://github.com/pytorch/ao), check out
   their [release note](https://github.com/pytorch/ao/releases/tag/v0.6.1)
 * [2024/10] Important update: We now support full-range symmetric quantization and have made it the default
   configuration. This configuration is typically better or comparable to asymmetric quantization and significantly
   outperforms other symmetric variants, especially at low bit-widths like 2-bit, check out [some accuracy data](./docs/full_range_sym.md).
-* [2024/09] AutoRound format supports several LVM models, check out the
-  examples [Qwen2-Vl](./examples/multimodal-modeling/Qwen-VL),[Phi-3-vision](./examples/multimodal-modeling/Phi-3-vision), [Llava](./examples/multimodal-modeling/Llava)
 * [2024/08] AutoRound format supports Intel Gaudi2 devices. Please refer
   to [Intel/Qwen2-7B-int4-inc](https://huggingface.co/Intel/Qwen2-7B-int4-inc).
 * [2024/08] AutoRound introduces several experimental features, including fast tuning of norm/bias parameters (for 2-bit
@@ -317,7 +316,11 @@ release most of the models ourselves.
 | meta-llama/Llama-3.2-11B-Vision | [recipe](./docs/Llama-3.2-11B-Vision-Instruct_sym.md)                                                                                                                                                                                                                                   |
 | microsoft/Phi-3.5-vision-instruct | [recipe](./docs/Phi-3.5-vision-instruct_sym.md)                                                                                                                                                                                                                                   |
 | liuhaotian/llava-v1.5-7b | [recipe](./docs/llava-v1.5-7b_sym.md)                                                                                                                                                                                                                                   |
-| Qwen/Qwen2.5-7B-Instruct | [recipe](./docs/Qwen2.5-7B-Instruct_sym.md)                                                                                                                                                                                                                                   |
+| Qwen/Qwen2.5-7B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-7B-Instruct-AutoRound-GPTQ-asym-4bit), [recipe](./docs/Qwen2.5-7B-Instruct_sym.md)                                                                                                                                                                                                                                                                    |
+| Qwen/Qwen2.5-14B-Instruct |[recipe](./docs/Qwen2.5-14B-Instruct_sym.md)                                                                                                                                                                                                                                                                    |
+| Qwen/Qwen2.5-32B-Instruct |[recipe](./docs/Qwen2.5-32B-Instruct_sym.md)                                                                                                                                                                                                                                                                    |
+| Qwen/Qwen2.5-Coder-32B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-Coder-32B-Instruct-AutoRound-GPTQ-4bit)                                                                                                                                                                                                                                   |
+| Qwen/Qwen2.5-72B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-4bit),  [model-kaitchup-autogptq-int2*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-2bit), [recipe](./docs/Qwen2.5-72B-Instruct_sym.md)                                                                                                                                                                                                                                                            |
 | meta-llama/Meta-Llama-3.1-70B-Instruct | [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-70B-Instruct-int4-inc)                                                                                                                                                                                                                                   |
 | meta-llama/Meta-Llama-3.1-8B-Instruct  | [model-kaitchup-autogptq-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-asym), [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-sym), [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-8B-Instruct-int4-inc) |
 | meta-llama/Meta-Llama-3.1-8B           | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-autoround-gptq-4bit-sym)                                                                                                                                                                                                |
@@ -373,3 +376,4 @@ If you find AutoRound useful for your research, please cite our paper:
 }
 ```
 
+
diff --git a/auto_round/mllm/README.md b/auto_round/mllm/README.md
@@ -25,6 +25,14 @@ AutoRound uses the text module of MLLM (LLM component) as the main quantization
     autoround.save_quantized(output_dir, format='auto_round', inplace=True)
 ```
 
+- `dataset`: the dataset for quantization training. current support NeelNanda/pile-10k,llava_conv_58k,llava_instruct_80k and llava_instruct_150k. It can be a custom one. Please note that the effectiveness of the Llava calibration dataset has only been validated on five models so far.
+
+- `quant_nontext_module`: whether to quantize non-text module, e.g. vision component. 
+
+- `extra_data_dir`:dataset dir for storing images/audio/videos, default to None. Can be a dir path or multiple dir path with format as 'image=path_to_image,video=path_to_video,audio=path_to_audio' By default, it will search in the relative path, and if not find, will automatic download.
+
+for more hyperparameters introduction, please refer [Homepage Detailed Hyperparameters](../../README.md#api-usage-gaudi2cpugpu)
+
 <details>
 <summary style="font-size:17px;">Basic Usage (Gaudi2/CPU/GPU)</summary>
     A user guide detailing the full list of supported arguments is provided by calling ```auto-round-mllm -h``` on the terminal. Alternatively, you can use ```auto_round_mllm``` instead of ```auto-round-mllm```. Set the format you want in `format` and
@@ -40,11 +48,6 @@ AutoRound uses the text module of MLLM (LLM component) as the main quantization
         --output_dir ./tmp_autoround
 ```
 
-- `dataset`: the dataset for quantization training. current support NeelNanda/pile-10k,llava_conv_58k,llava_instruct_80k. It can be a custom one.
-
-- `quant_nontext_module`: whether to quantize non-text module, e.g. vision component. 
-
-- `extra_data_dir`:dataset dir for storing images/audio/videos, default to None. Can be a dir path or multiple dir path with format as 'image=path_to_image,video=path_to_video,audio=path_to_audio' By default, it will search in the relative path, and if not find, will automatic download.
 
 </details>
 
@@ -56,19 +59,6 @@ For mllm, we used **text-only** calibration dataset (NeelNanda/pile-10k) as our
 
 Through argument --dataset(text file), user can use other datasets such as "liuhaotian/llava_conv_58k" "liuhaotian/llava_instruct_80k", "liuhaotian/llava_instruct_150k" or a file path to use local file.
 
-
-### Support List
-
-The llava calibration dataset supports the five existing MLLMs. 
-
-|Model          |Eval Lib   |calibration dataset|Feasibility of quantification|
-|---------------|-----------|-------------------|--------------------|
-|Qwen/Qwen2-VL-Instruct            |vlmeval    |llava         |✔                   |
-|meta-llama/Llama-3.2-11B-Vision   |vlmeval/lmms_eval  |llava              |✔                   |
-|microsoft/Phi-3.5-vision-instruct |vlmeval    |llava         |✔                   |
-|liuhaotian/llava-v1.5-7b          |lmms_eval  |llava         |✔                   |
-|THUDM/cogvlm2-llama3-chat-19B     |lmms_eval  |llava         |✔                   |
-
 </details>
 
 
@@ -78,15 +68,17 @@ The llava calibration dataset supports the five existing MLLMs.
 
 ### Support Matrix
 
-The design of the MLLM model API is not uniform, and some models do not support the quantization nontext module. Quantization of the vision components of Llama-3.2-11B-Vision, Phi-3.5-vision-instruct and llava-v1.5-7b is currently supported.
+For typical VLLMs, we assume that the default quantization, which excludes quantizing the visual component, is supported. The design of vision components in MLLM model APIs is not standardized, and some models do not support the quantization of non-text modules.
 
-|Model          |Eval Lib   |quant nontext module|
-|---------------|-----------|-------------------|
-|Qwen/Qwen2-VL-Instruct            |vlmeval    |-                    |
-|meta-llama/Llama-3.2-11B-Vision   |lmms_eval  |✔                   |
-|microsoft/Phi-3.5-vision-instruct |vlmeval    |✔                   |
-|liuhaotian/llava-v1.5-7b          |lmms_eval  |-                    |
-|THUDM/cogvlm2-llama3-chat-19B     |lmms_eval  |✔                   |
+Currently, the quantization of vision components is supported for Llama-3.2-11B-Vision, Phi-3.5-Vision-Instruct, and Llava-v1.5-7B.
+
+| Model        | Eval Lib  | calibration dataset | quant nontext module |
+|--------------|-----------|---------------------|----------------------|
+| Qwen2-VL     | vlmeval   | pile/llava          | -                    |
+| Llama-Vision | lmms_eval | llava               | ✔                    |
+| Phi3-Vision  | vlmeval   | pile/llava          | ✔                    |
+| Llava-v1.5   | lmms_eval | pile/llava          | -                    |
+| CogVLM2      | lmms_eval | pile/llava          | ✔                    |
 
 
 
@@ -140,3 +132,4 @@ For more details on quantization, inference, evaluation, and environment, see th
 
 
 
+