Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add qwen2.5 recipe and refine readme #338

Merged
merged 4 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,13 @@ more accuracy data and recipes across various models.
<div align="left">

## What's New
* [2024/11] We provide some tips and tricks for LLM&VLM quantization, please check out [this file](./docs/tips_and_tricks.md).
* [2024/11] We provide experimental support for VLLM quantization, please check out [MLLM README](./auto_round/mllm/README.md)
* [2024/11] We provide some tips and tricks for LLM&VLM quantization, please check out [this file](./docs/tips_and_tricks.md)
* [2024/10] AutoRound has been integrated to [torch/ao](https://github.com/pytorch/ao), check out
their [release note](https://github.com/pytorch/ao/releases/tag/v0.6.1)
* [2024/10] Important update: We now support full-range symmetric quantization and have made it the default
configuration. This configuration is typically better or comparable to asymmetric quantization and significantly
outperforms other symmetric variants, especially at low bit-widths like 2-bit, check out [some accuracy data](./docs/full_range_sym.md).
* [2024/09] AutoRound format supports several LVM models, check out the
examples [Qwen2-Vl](./examples/multimodal-modeling/Qwen-VL),[Phi-3-vision](./examples/multimodal-modeling/Phi-3-vision), [Llava](./examples/multimodal-modeling/Llava)
* [2024/08] AutoRound format supports Intel Gaudi2 devices. Please refer
to [Intel/Qwen2-7B-int4-inc](https://huggingface.co/Intel/Qwen2-7B-int4-inc).
* [2024/08] AutoRound introduces several experimental features, including fast tuning of norm/bias parameters (for 2-bit
Expand Down Expand Up @@ -317,7 +316,11 @@ release most of the models ourselves.
| meta-llama/Llama-3.2-11B-Vision | [recipe](./docs/Llama-3.2-11B-Vision-Instruct_sym.md) |
| microsoft/Phi-3.5-vision-instruct | [recipe](./docs/Phi-3.5-vision-instruct_sym.md) |
| liuhaotian/llava-v1.5-7b | [recipe](./docs/llava-v1.5-7b_sym.md) |
| Qwen/Qwen2.5-7B-Instruct | [recipe](./docs/Qwen2.5-7B-Instruct_sym.md) |
| Qwen/Qwen2.5-7B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-7B-Instruct-AutoRound-GPTQ-asym-4bit), [recipe](./docs/Qwen2.5-7B-Instruct_sym.md) |
| Qwen/Qwen2.5-14B-Instruct |[recipe](./docs/Qwen2.5-14B-Instruct_sym.md) |
| Qwen/Qwen2.5-32B-Instruct |[recipe](./docs/Qwen2.5-32B-Instruct_sym.md) |
| Qwen/Qwen2.5-Coder-32B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-Coder-32B-Instruct-AutoRound-GPTQ-4bit) |
| Qwen/Qwen2.5-72B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-4bit), [model-kaitchup-autogptq-int2*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-2bit), [recipe](./docs/Qwen2.5-72B-Instruct_sym.md) |
| meta-llama/Meta-Llama-3.1-70B-Instruct | [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-70B-Instruct-int4-inc) |
| meta-llama/Meta-Llama-3.1-8B-Instruct | [model-kaitchup-autogptq-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-asym), [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-sym), [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-8B-Instruct-int4-inc) |
| meta-llama/Meta-Llama-3.1-8B | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-autoround-gptq-4bit-sym) |
Expand Down Expand Up @@ -373,3 +376,4 @@ If you find AutoRound useful for your research, please cite our paper:
}
```


45 changes: 19 additions & 26 deletions auto_round/mllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,14 @@ AutoRound uses the text module of MLLM (LLM component) as the main quantization
autoround.save_quantized(output_dir, format='auto_round', inplace=True)
```

- `dataset`: the dataset for quantization training. current support NeelNanda/pile-10k,llava_conv_58k,llava_instruct_80k and llava_instruct_150k. It can be a custom one. Please note that the effectiveness of the Llava calibration dataset has only been validated on five models so far.

- `quant_nontext_module`: whether to quantize non-text module, e.g. vision component.

- `extra_data_dir`:dataset dir for storing images/audio/videos, default to None. Can be a dir path or multiple dir path with format as 'image=path_to_image,video=path_to_video,audio=path_to_audio' By default, it will search in the relative path, and if not find, will automatic download.

for more hyperparameters introduction, please refer [Homepage Detailed Hyperparameters](../../README.md#api-usage-gaudi2cpugpu)

<details>
<summary style="font-size:17px;">Basic Usage (Gaudi2/CPU/GPU)</summary>
A user guide detailing the full list of supported arguments is provided by calling ```auto-round-mllm -h``` on the terminal. Alternatively, you can use ```auto_round_mllm``` instead of ```auto-round-mllm```. Set the format you want in `format` and
Expand All @@ -40,11 +48,6 @@ AutoRound uses the text module of MLLM (LLM component) as the main quantization
--output_dir ./tmp_autoround
```

- `dataset`: the dataset for quantization training. current support NeelNanda/pile-10k,llava_conv_58k,llava_instruct_80k. It can be a custom one.

- `quant_nontext_module`: whether to quantize non-text module, e.g. vision component.

- `extra_data_dir`:dataset dir for storing images/audio/videos, default to None. Can be a dir path or multiple dir path with format as 'image=path_to_image,video=path_to_video,audio=path_to_audio' By default, it will search in the relative path, and if not find, will automatic download.

</details>

Expand All @@ -56,19 +59,6 @@ For mllm, we used **text-only** calibration dataset (NeelNanda/pile-10k) as our

Through argument --dataset(text file), user can use other datasets such as "liuhaotian/llava_conv_58k" "liuhaotian/llava_instruct_80k", "liuhaotian/llava_instruct_150k" or a file path to use local file.


### Support List

The llava calibration dataset supports the five existing MLLMs.

|Model |Eval Lib |calibration dataset|Feasibility of quantification|
|---------------|-----------|-------------------|--------------------|
|Qwen/Qwen2-VL-Instruct |vlmeval |llava |✔ |
|meta-llama/Llama-3.2-11B-Vision |vlmeval/lmms_eval |llava |✔ |
|microsoft/Phi-3.5-vision-instruct |vlmeval |llava |✔ |
|liuhaotian/llava-v1.5-7b |lmms_eval |llava |✔ |
|THUDM/cogvlm2-llama3-chat-19B |lmms_eval |llava |✔ |

</details>


Expand All @@ -78,15 +68,17 @@ The llava calibration dataset supports the five existing MLLMs.

### Support Matrix

The design of the MLLM model API is not uniform, and some models do not support the quantization nontext module. Quantization of the vision components of Llama-3.2-11B-Vision, Phi-3.5-vision-instruct and llava-v1.5-7b is currently supported.
For typical VLLMs, we assume that the default quantization, which excludes quantizing the visual component, is supported. The design of vision components in MLLM model APIs is not standardized, and some models do not support the quantization of non-text modules.

|Model |Eval Lib |quant nontext module|
|---------------|-----------|-------------------|
|Qwen/Qwen2-VL-Instruct |vlmeval |- |
|meta-llama/Llama-3.2-11B-Vision |lmms_eval |✔ |
|microsoft/Phi-3.5-vision-instruct |vlmeval |✔ |
|liuhaotian/llava-v1.5-7b |lmms_eval |- |
|THUDM/cogvlm2-llama3-chat-19B |lmms_eval |✔ |
Currently, the quantization of vision components is supported for Llama-3.2-11B-Vision, Phi-3.5-Vision-Instruct, and Llava-v1.5-7B.

| Model | Eval Lib | calibration dataset | quant nontext module |
|--------------|-----------|---------------------|----------------------|
| Qwen2-VL | vlmeval | pile/llava | - |
| Llama-Vision | lmms_eval | llava | ✔ |
| Phi3-Vision | vlmeval | pile/llava | ✔ |
| Llava-v1.5 | lmms_eval | pile/llava | - |
| CogVLM2 | lmms_eval | pile/llava | ✔ |



Expand Down Expand Up @@ -140,3 +132,4 @@ For more details on quantization, inference, evaluation, and environment, see th




Loading