From aa4a5bc86c29575684ba19836ee34a8bed9b2237 Mon Sep 17 00:00:00 2001 From: wenhuach21 <wenhua.cheng@intel.com> Date: Mon, 25 Nov 2024 09:14:23 +0800 Subject: [PATCH 1/6] fix typo --- README.md | 18 +++++++++--------- auto_round/mllm/README.md | 10 +++++----- ...md => Llama-3.2-11B-Vision-Instruct-sym.md} | 2 +- ...t_sym.md => Phi-3.5-vision-instruct-sym.md} | 2 +- ...ruct_sym.md => Qwen2-VL-7B-Instruct-sym.md} | 0 ...ruct_sym.md => Qwen2.5-14B-Instruct-sym.md} | 0 ...ruct_sym.md => Qwen2.5-32B-Instruct-sym.md} | 2 +- ...ruct_sym.md => Qwen2.5-72B-Instruct-sym.md} | 0 ...truct_sym.md => Qwen2.5-7B-Instruct-sym.md} | 0 ...B_sym.md => cogvlm2-llama3-chat-19B-sym.md} | 2 +- ...ava-v1.5-7b_sym.md => llava-v1.5-7b-sym.md} | 2 +- 11 files changed, 19 insertions(+), 19 deletions(-) rename docs/{Llama-3.2-11B-Vision-Instruct_sym.md => Llama-3.2-11B-Vision-Instruct-sym.md} (99%) rename docs/{Phi-3.5-vision-instruct_sym.md => Phi-3.5-vision-instruct-sym.md} (99%) rename docs/{Qwen2-VL-7B-Instruct_sym.md => Qwen2-VL-7B-Instruct-sym.md} (100%) rename docs/{Qwen2.5-14B-Instruct_sym.md => Qwen2.5-14B-Instruct-sym.md} (100%) rename docs/{Qwen2.5-32B-Instruct_sym.md => Qwen2.5-32B-Instruct-sym.md} (99%) rename docs/{Qwen2.5-72B-Instruct_sym.md => Qwen2.5-72B-Instruct-sym.md} (100%) rename docs/{Qwen2.5-7B-Instruct_sym.md => Qwen2.5-7B-Instruct-sym.md} (100%) rename docs/{cogvlm2-llama3-chat-19B_sym.md => cogvlm2-llama3-chat-19B-sym.md} (99%) rename docs/{llava-v1.5-7b_sym.md => llava-v1.5-7b-sym.md} (99%) diff --git a/README.md b/README.md index fb272d9f..27aaaa04 100644 --- a/README.md +++ b/README.md @@ -312,16 +312,16 @@ release most of the models ourselves. Model | Supported | |----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| THUDM/cogvlm2-llama3-chinese-chat-19B | [recipe](./docs/cogvlm2-llama3-chat-19B_sym.md) | -| Qwen/Qwen2-VL-Instruct | [recipe](./docs/Qwen2-VL-7B-Instruct_sym.md) | -| meta-llama/Llama-3.2-11B-Vision | [recipe](./docs/Llama-3.2-11B-Vision-Instruct_sym.md) | -| microsoft/Phi-3.5-vision-instruct | [recipe](./docs/Phi-3.5-vision-instruct_sym.md) | -| liuhaotian/llava-v1.5-7b | [recipe](./docs/llava-v1.5-7b_sym.md) | -| Qwen/Qwen2.5-7B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-7B-Instruct-AutoRound-GPTQ-asym-4bit), [recipe](./docs/Qwen2.5-7B-Instruct_sym.md) | -| Qwen/Qwen2.5-14B-Instruct |[recipe](./docs/Qwen2.5-14B-Instruct_sym.md) | -| Qwen/Qwen2.5-32B-Instruct |[recipe](./docs/Qwen2.5-32B-Instruct_sym.md) | +| THUDM/cogvlm2-llama3-chinese-chat-19B | [recipe](./docs/cogvlm2-llama3-chat-19B-sym) | +| Qwen/Qwen2-VL-Instruct | [recipe](./docs/Qwen2-VL-7B-Instruct-sym) | +| meta-llama/Llama-3.2-11B-Vision | [recipe](./docs/Llama-3.2-11B-Vision-Instruct-sym) | +| microsoft/Phi-3.5-vision-instruct | [recipe](./docs/Phi-3.5-vision-instruct-sym) | +| liuhaotian/llava-v1.5-7b | [recipe](./docs/llava-v1.5-7b-sym) | +| Qwen/Qwen2.5-7B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-7B-Instruct-AutoRound-GPTQ-asym-4bit), [recipe](./docs/Qwen2.5-7B-Instruct-sym) | +| Qwen/Qwen2.5-14B-Instruct |[recipe](./docs/Qwen2.5-14B-Instruct-sym) | +| Qwen/Qwen2.5-32B-Instruct |[recipe](./docs/Qwen2.5-32B-Instruct-sym) | | Qwen/Qwen2.5-Coder-32B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-Coder-32B-Instruct-AutoRound-GPTQ-4bit) | -| Qwen/Qwen2.5-72B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-4bit), [model-kaitchup-autogptq-int2*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-2bit), [recipe](./docs/Qwen2.5-72B-Instruct_sym.md) | +| Qwen/Qwen2.5-72B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-4bit), [model-kaitchup-autogptq-int2*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-2bit), [recipe](./docs/Qwen2.5-72B-Instruct-sym) | | meta-llama/Meta-Llama-3.1-70B-Instruct | [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-70B-Instruct-int4-inc) | | meta-llama/Meta-Llama-3.1-8B-Instruct | [model-kaitchup-autogptq-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-asym), [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-sym), [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-8B-Instruct-int4-inc) | | meta-llama/Meta-Llama-3.1-8B | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-autoround-gptq-4bit-sym) | diff --git a/auto_round/mllm/README.md b/auto_round/mllm/README.md index 80078715..ca3ef68e 100644 --- a/auto_round/mllm/README.md +++ b/auto_round/mllm/README.md @@ -125,11 +125,11 @@ from auto_round import AutoRoundConfig ## must import for auto-round format For more details on quantization, inference, evaluation, and environment, see the following recipe: -- [Qwen2-VL-7B-Instruct](../../docs/Qwen2-VL-7B-Instruct_sym.md) -- [Llama-3.2-11B-Vision](../../docs/Llama-3.2-11B-Vision-Instruct_sym.md) -- [Phi-3.5-vision-instruct](../../docs/Phi-3.5-vision-instruct_sym.md) -- [llava-v1.5-7b](../../docs/llava-v1.5-7b_sym.md) -- [cogvlm2-llama3-chat-19B](../../docs/cogvlm2-llama3-chat-19B_sym.md) +- [Qwen2-VL-7B-Instruct](../../docs/Qwen2-VL-7B-Instruct-sym) +- [Llama-3.2-11B-Vision](../../docs/Llama-3.2-11B-Vision-Instruct-sym) +- [Phi-3.5-vision-instruct](../../docs/Phi-3.5-vision-instruct-sym) +- [llava-v1.5-7b](../../docs/llava-v1.5-7b-sym) +- [cogvlm2-llama3-chat-19B](../../docs/cogvlm2-llama3-chat-19B-sym) diff --git a/docs/Llama-3.2-11B-Vision-Instruct_sym.md b/docs/Llama-3.2-11B-Vision-Instruct-sym.md similarity index 99% rename from docs/Llama-3.2-11B-Vision-Instruct_sym.md rename to docs/Llama-3.2-11B-Vision-Instruct-sym.md index 86a17ab9..c41a853a 100644 --- a/docs/Llama-3.2-11B-Vision-Instruct_sym.md +++ b/docs/Llama-3.2-11B-Vision-Instruct-sym.md @@ -106,7 +106,7 @@ auto-round-mllm --eval --model Intel/Llama-3.2-11B-Vision-Instruct-inc-private - ### Generate the model Here is the sample command to reproduce the model. ```bash -pip install auto_round +pip install auto-round auto-round-mllm --model meta-llama/Llama-3.2-11B-Vision-Instruct \ --device 0 \ diff --git a/docs/Phi-3.5-vision-instruct_sym.md b/docs/Phi-3.5-vision-instruct-sym.md similarity index 99% rename from docs/Phi-3.5-vision-instruct_sym.md rename to docs/Phi-3.5-vision-instruct-sym.md index bb1f9423..3141f00c 100644 --- a/docs/Phi-3.5-vision-instruct_sym.md +++ b/docs/Phi-3.5-vision-instruct-sym.md @@ -118,7 +118,7 @@ auto-round-mllm --eval --model Intel/Qwen2-VL-7B-Instruct-inc-private --tasks MM ### Generate the model Here is the sample command to reproduce the model. ```bash -pip install auto_round +pip install auto-round auto-round-mllm --model microsoft/Phi-3.5-vision-instruct \ --device 0 \ diff --git a/docs/Qwen2-VL-7B-Instruct_sym.md b/docs/Qwen2-VL-7B-Instruct-sym.md similarity index 100% rename from docs/Qwen2-VL-7B-Instruct_sym.md rename to docs/Qwen2-VL-7B-Instruct-sym.md diff --git a/docs/Qwen2.5-14B-Instruct_sym.md b/docs/Qwen2.5-14B-Instruct-sym.md similarity index 100% rename from docs/Qwen2.5-14B-Instruct_sym.md rename to docs/Qwen2.5-14B-Instruct-sym.md diff --git a/docs/Qwen2.5-32B-Instruct_sym.md b/docs/Qwen2.5-32B-Instruct-sym.md similarity index 99% rename from docs/Qwen2.5-32B-Instruct_sym.md rename to docs/Qwen2.5-32B-Instruct-sym.md index 7d7f24fb..277b2ab2 100644 --- a/docs/Qwen2.5-32B-Instruct_sym.md +++ b/docs/Qwen2.5-32B-Instruct-sym.md @@ -141,7 +141,7 @@ auto-round --model "Intel/Qwen2.5-32B-Instruct-int4-inc" --eval --eval_bs 16 -- Here is the sample command to generate the model. -For symmetric quantization, we found overflow/NAN will occur for some backends, so better fallback some layers. auto_round requires version >0.4.1 +For symmetric quantization, we found overflow/NAN will occur for some backends, so better fallback some layers. auto_round requires version > 0.3.1 ```bash auto-round \ diff --git a/docs/Qwen2.5-72B-Instruct_sym.md b/docs/Qwen2.5-72B-Instruct-sym.md similarity index 100% rename from docs/Qwen2.5-72B-Instruct_sym.md rename to docs/Qwen2.5-72B-Instruct-sym.md diff --git a/docs/Qwen2.5-7B-Instruct_sym.md b/docs/Qwen2.5-7B-Instruct-sym.md similarity index 100% rename from docs/Qwen2.5-7B-Instruct_sym.md rename to docs/Qwen2.5-7B-Instruct-sym.md diff --git a/docs/cogvlm2-llama3-chat-19B_sym.md b/docs/cogvlm2-llama3-chat-19B-sym.md similarity index 99% rename from docs/cogvlm2-llama3-chat-19B_sym.md rename to docs/cogvlm2-llama3-chat-19B-sym.md index bb1601e0..b58b3e87 100644 --- a/docs/cogvlm2-llama3-chat-19B_sym.md +++ b/docs/cogvlm2-llama3-chat-19B-sym.md @@ -89,7 +89,7 @@ auto-round-mllm --lmms --model Intel/cogvlm2-llama3-chat-19B-inc-private --tasks ### Generate the model Here is the sample command to reproduce the model. ```bash -pip install auto_round +pip install auto-round auto-round-mllm --model THUDM/cogvlm2-llama3-chat-19B \ --device 0 \ diff --git a/docs/llava-v1.5-7b_sym.md b/docs/llava-v1.5-7b-sym.md similarity index 99% rename from docs/llava-v1.5-7b_sym.md rename to docs/llava-v1.5-7b-sym.md index cbb614e0..633a65e0 100644 --- a/docs/llava-v1.5-7b_sym.md +++ b/docs/llava-v1.5-7b-sym.md @@ -95,7 +95,7 @@ auto-round-mllm --lmms --model Intel/llava-v1.5-7b-inc-private --tasks pope,text ### Generate the model Here is the sample command to reproduce the model. ```bash -pip install auto_round +pip install auto-round auto-round-mllm --model liuhaotian/llava-v1.5-7b \ --device 0 \ From b647731835b2ce59384592780b8e679c9a1f35ba Mon Sep 17 00:00:00 2001 From: wenhuach21 <wenhua.cheng@intel.com> Date: Mon, 25 Nov 2024 09:26:07 +0800 Subject: [PATCH 2/6] fix typo --- README.md | 84 +++++++++++++++++++++++++++---------------------------- 1 file changed, 42 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index 27aaaa04..18bb9e70 100644 --- a/README.md +++ b/README.md @@ -310,49 +310,49 @@ Please note that an asterisk (*) indicates third-party quantized models, which m different recipe. We greatly appreciate their efforts and encourage more users to share their models, as we cannot release most of the models ourselves. - Model | Supported | -|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| THUDM/cogvlm2-llama3-chinese-chat-19B | [recipe](./docs/cogvlm2-llama3-chat-19B-sym) | -| Qwen/Qwen2-VL-Instruct | [recipe](./docs/Qwen2-VL-7B-Instruct-sym) | -| meta-llama/Llama-3.2-11B-Vision | [recipe](./docs/Llama-3.2-11B-Vision-Instruct-sym) | -| microsoft/Phi-3.5-vision-instruct | [recipe](./docs/Phi-3.5-vision-instruct-sym) | -| liuhaotian/llava-v1.5-7b | [recipe](./docs/llava-v1.5-7b-sym) | -| Qwen/Qwen2.5-7B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-7B-Instruct-AutoRound-GPTQ-asym-4bit), [recipe](./docs/Qwen2.5-7B-Instruct-sym) | -| Qwen/Qwen2.5-14B-Instruct |[recipe](./docs/Qwen2.5-14B-Instruct-sym) | -| Qwen/Qwen2.5-32B-Instruct |[recipe](./docs/Qwen2.5-32B-Instruct-sym) | -| Qwen/Qwen2.5-Coder-32B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-Coder-32B-Instruct-AutoRound-GPTQ-4bit) | -| Qwen/Qwen2.5-72B-Instruct |[model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-4bit), [model-kaitchup-autogptq-int2*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-2bit), [recipe](./docs/Qwen2.5-72B-Instruct-sym) | -| meta-llama/Meta-Llama-3.1-70B-Instruct | [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-70B-Instruct-int4-inc) | + Model | Supported | +|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| THUDM/cogvlm2-llama3-chinese-chat-19B | [recipe](./docs/cogvlm2-llama3-chat-19B-sym.md) | +| Qwen/Qwen2-VL-Instruct | [recipe](./docs/Qwen2-VL-7B-Instruct-sym.md) | +| meta-llama/Llama-3.2-11B-Vision | [recipe](./docs/Llama-3.2-11B-Vision-Instruct-sym.md) | +| microsoft/Phi-3.5-vision-instruct | [recipe](./docs/Phi-3.5-vision-instruct-sym.md) | +| liuhaotian/llava-v1.5-7b | [recipe](./docs/llava-v1.5-7b-sym.md) | +| Qwen/Qwen2.5-7B-Instruct | [model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-7B-Instruct-AutoRound-GPTQ-asym-4bit), [recipe](./docs/Qwen2.5-7B-Instruct-sym.md) | +| Qwen/Qwen2.5-14B-Instruct | [recipe](./docs/Qwen2.5-14B-Instruct-sym.md) | +| Qwen/Qwen2.5-32B-Instruct | [recipe](./docs/Qwen2.5-32B-Instruct-sym.md) | +| Qwen/Qwen2.5-Coder-32B-Instruct | [model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-Coder-32B-Instruct-AutoRound-GPTQ-4bit) | +| Qwen/Qwen2.5-72B-Instruct | [model-kaitchup-autogptq-int4*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-4bit), [model-kaitchup-autogptq-int2*](https://beta-index.hf-mirror.com/kaitchup/Qwen2.5-72B-Instruct-AutoRound-GPTQ-2bit), [recipe](./docs/Qwen2.5-72B-Instruct-sym.md) | +| meta-llama/Meta-Llama-3.1-70B-Instruct | [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-70B-Instruct-int4-inc) | | meta-llama/Meta-Llama-3.1-8B-Instruct | [model-kaitchup-autogptq-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-asym), [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-sym), [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-8B-Instruct-int4-inc) | -| meta-llama/Meta-Llama-3.1-8B | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-autoround-gptq-4bit-sym) | -| Qwen/Qwen-VL | [accuracy](./examples/multimodal-modeling/Qwen-VL/README.md), [recipe](./examples/multimodal-modeling/Qwen-VL/run_autoround.sh) -| Qwen/Qwen2-7B | [model-autoround-sym-int4](https://huggingface.co/Intel/Qwen2-7B-int4-inc), [model-autogptq-sym-int4](https://huggingface.co/Intel/Qwen2-7B-int4-inc) | -| THUDM/glm-4-9b-chat | [recipe](./docs/glm-4-9b-chat-recipe.md) | -| Qwen/Qwen2-57B-A14B-Instruct | [model-autoround-sym-int4](https://huggingface.co/Intel/Qwen2-57B-A14B-Instruct-int4-inc),[model-autogptq-sym-int4](https://huggingface.co/Intel/Qwen2-57B-A14B-Instruct-int4-inc) | -| 01-ai/Yi-1.5-9B | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/Yi-1.5-9B-4bit-gptq-autoround) | -| 01-ai/Yi-1.5-9B-Chat | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/Yi-1.5-9B-Chat-4bit-gptq-autoround) | -| Intel/neural-chat-7b-v3-3 | [model-autogptq-int4](https://huggingface.co/Intel/neural-chat-7b-v3-3-int4-inc) | -| Intel/neural-chat-7b-v3-1 | [model-autogptq-int4](https://huggingface.co/Intel/neural-chat-7b-v3-1-int4-inc) | -| TinyLlama-1.1B-intermediate | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/TinyLlama-1.1B-intermediate-step-1341k-3T-autoround-lm_head-symFalse) | -| mistralai/Mistral-7B-v0.1 | [model-autogptq-lmhead-int4](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc-lmhead), [model-autogptq-int4](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc) | -| google/gemma-2b | [model-autogptq-int4](https://huggingface.co/Intel/gemma-2b-int4-inc) | -| tiiuae/falcon-7b | [model-autogptq-int4-G64](https://huggingface.co/Intel/falcon-7b-int4-inc) | -| sapienzanlp/modello-italia-9b | [model-fbaldassarri-autogptq-int4*](https://huggingface.co/fbaldassarri/modello-italia-9b-autoround-w4g128-cpu) | -| microsoft/phi-2 | [model-autoround-sym-int4](https://huggingface.co/Intel/phi-2-int4-inc) [model-autogptq-sym-int4](https://huggingface.co/Intel/phi-2-int4-inc) | -| microsoft/Phi-3.5-mini-instruct | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Phi-3.5-Mini-instruct-AutoRound-4bit) | -| microsoft/Phi-3-vision-128k-instruct | [recipe](./examples/multimodal-modeling/Phi-3-vision/run_autoround.sh) -| mistralai/Mistral-7B-Instruct-v0.2 | [accuracy](./docs/Mistral-7B-Instruct-v0.2-acc.md), [recipe](./examples/language-modeling/scripts/Mistral-7B-Instruct-v0.2.sh) | -| mistralai/Mixtral-8x7B-Instruct-v0.1 | [accuracy](./docs/Mixtral-8x7B-Instruct-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mixtral-8x7B-Instruct-v0.1.sh) | -| mistralai/Mixtral-8x7B-v0.1 | [accuracy](./docs/Mixtral-8x7B-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mixtral-8x7B-v0.1.sh) | -| meta-llama/Meta-Llama-3-8B-Instruct | [accuracy](./docs/Meta-Llama-3-8B-Instruct-acc.md), [recipe](./examples/language-modeling/scripts/Meta-Llama-3-8B-Instruct.sh) | -| google/gemma-7b | [accuracy](./docs/gemma-7b-acc.md), [recipe](./examples/language-modeling/scripts/gemma-7b.sh) | -| meta-llama/Llama-2-7b-chat-hf | [accuracy](./docs/Llama-2-7b-chat-hf-acc.md), [recipe](./examples/language-modeling/scripts/Llama-2-7b-chat-hf.sh) | -| Qwen/Qwen1.5-7B-Chat | [accuracy](./docs/Qwen1.5-7B-Chat-acc.md), [sym recipe](./examples/language-modeling/scripts/Qwen1.5-7B-Chat-sym.sh), [asym recipe ](./examples/language-modeling/scripts/Qwen1.5-7B-Chat-asym.sh) | -| baichuan-inc/Baichuan2-7B-Chat | [accuracy](./docs/baichuan2-7b-chat-acc.md), [recipe](./examples/language-modeling/scripts/baichuan2-7b-chat.sh) | -| 01-ai/Yi-6B-Chat | [accuracy](./docs/Yi-6B-Chat-acc.md), [recipe](./examples/language-modeling/scripts/Yi-6B-Chat.sh) | -| facebook/opt-2.7b | [accuracy](./docs/opt-2.7b-acc.md), [recipe](./examples/language-modeling/scripts/opt-2.7b.sh) | -| bigscience/bloom-3b | [accuracy](./docs/bloom-3B-acc.md), [recipe](./examples/language-modeling/scripts/bloom-3b.sh) | -| EleutherAI/gpt-j-6b | [accuracy](./docs/gpt-j-6B-acc.md), [recipe](./examples/language-modeling/scripts/gpt-j-6b.sh) | +| meta-llama/Meta-Llama-3.1-8B | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-autoround-gptq-4bit-sym) | +| Qwen/Qwen-VL | [accuracy](./examples/multimodal-modeling/Qwen-VL/README.md), [recipe](./examples/multimodal-modeling/Qwen-VL/run_autoround.sh) +| Qwen/Qwen2-7B | [model-autoround-sym-int4](https://huggingface.co/Intel/Qwen2-7B-int4-inc), [model-autogptq-sym-int4](https://huggingface.co/Intel/Qwen2-7B-int4-inc) | +| THUDM/glm-4-9b-chat | [recipe](./docs/glm-4-9b-chat-recipe.md) | +| Qwen/Qwen2-57B-A14B-Instruct | [model-autoround-sym-int4](https://huggingface.co/Intel/Qwen2-57B-A14B-Instruct-int4-inc),[model-autogptq-sym-int4](https://huggingface.co/Intel/Qwen2-57B-A14B-Instruct-int4-inc) | +| 01-ai/Yi-1.5-9B | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/Yi-1.5-9B-4bit-gptq-autoround) | +| 01-ai/Yi-1.5-9B-Chat | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/Yi-1.5-9B-Chat-4bit-gptq-autoround) | +| Intel/neural-chat-7b-v3-3 | [model-autogptq-int4](https://huggingface.co/Intel/neural-chat-7b-v3-3-int4-inc) | +| Intel/neural-chat-7b-v3-1 | [model-autogptq-int4](https://huggingface.co/Intel/neural-chat-7b-v3-1-int4-inc) | +| TinyLlama-1.1B-intermediate | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/TinyLlama-1.1B-intermediate-step-1341k-3T-autoround-lm_head-symFalse) | +| mistralai/Mistral-7B-v0.1 | [model-autogptq-lmhead-int4](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc-lmhead), [model-autogptq-int4](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc) | +| google/gemma-2b | [model-autogptq-int4](https://huggingface.co/Intel/gemma-2b-int4-inc) | +| tiiuae/falcon-7b | [model-autogptq-int4-G64](https://huggingface.co/Intel/falcon-7b-int4-inc) | +| sapienzanlp/modello-italia-9b | [model-fbaldassarri-autogptq-int4*](https://huggingface.co/fbaldassarri/modello-italia-9b-autoround-w4g128-cpu) | +| microsoft/phi-2 | [model-autoround-sym-int4](https://huggingface.co/Intel/phi-2-int4-inc) [model-autogptq-sym-int4](https://huggingface.co/Intel/phi-2-int4-inc) | +| microsoft/Phi-3.5-mini-instruct | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Phi-3.5-Mini-instruct-AutoRound-4bit) | +| microsoft/Phi-3-vision-128k-instruct | [recipe](./examples/multimodal-modeling/Phi-3-vision/run_autoround.sh) +| mistralai/Mistral-7B-Instruct-v0.2 | [accuracy](./docs/Mistral-7B-Instruct-v0.2-acc.md), [recipe](./examples/language-modeling/scripts/Mistral-7B-Instruct-v0.2.sh) | +| mistralai/Mixtral-8x7B-Instruct-v0.1 | [accuracy](./docs/Mixtral-8x7B-Instruct-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mixtral-8x7B-Instruct-v0.1.sh) | +| mistralai/Mixtral-8x7B-v0.1 | [accuracy](./docs/Mixtral-8x7B-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mixtral-8x7B-v0.1.sh) | +| meta-llama/Meta-Llama-3-8B-Instruct | [accuracy](./docs/Meta-Llama-3-8B-Instruct-acc.md), [recipe](./examples/language-modeling/scripts/Meta-Llama-3-8B-Instruct.sh) | +| google/gemma-7b | [accuracy](./docs/gemma-7b-acc.md), [recipe](./examples/language-modeling/scripts/gemma-7b.sh) | +| meta-llama/Llama-2-7b-chat-hf | [accuracy](./docs/Llama-2-7b-chat-hf-acc.md), [recipe](./examples/language-modeling/scripts/Llama-2-7b-chat-hf.sh) | +| Qwen/Qwen1.5-7B-Chat | [accuracy](./docs/Qwen1.5-7B-Chat-acc.md), [sym recipe](./examples/language-modeling/scripts/Qwen1.5-7B-Chat-sym.sh), [asym recipe ](./examples/language-modeling/scripts/Qwen1.5-7B-Chat-asym.sh) | +| baichuan-inc/Baichuan2-7B-Chat | [accuracy](./docs/baichuan2-7b-chat-acc.md), [recipe](./examples/language-modeling/scripts/baichuan2-7b-chat.sh) | +| 01-ai/Yi-6B-Chat | [accuracy](./docs/Yi-6B-Chat-acc.md), [recipe](./examples/language-modeling/scripts/Yi-6B-Chat.sh) | +| facebook/opt-2.7b | [accuracy](./docs/opt-2.7b-acc.md), [recipe](./examples/language-modeling/scripts/opt-2.7b.sh) | +| bigscience/bloom-3b | [accuracy](./docs/bloom-3B-acc.md), [recipe](./examples/language-modeling/scripts/bloom-3b.sh) | +| EleutherAI/gpt-j-6b | [accuracy](./docs/gpt-j-6B-acc.md), [recipe](./examples/language-modeling/scripts/gpt-j-6b.sh) | ## Integration From 8fa6d62e689433555ac4045dc2cd24487868d930 Mon Sep 17 00:00:00 2001 From: wenhuach21 <wenhua.cheng@intel.com> Date: Mon, 25 Nov 2024 09:31:38 +0800 Subject: [PATCH 3/6] fix typo --- auto_round/mllm/README.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/auto_round/mllm/README.md b/auto_round/mllm/README.md index ca3ef68e..dc7ff69b 100644 --- a/auto_round/mllm/README.md +++ b/auto_round/mllm/README.md @@ -125,11 +125,11 @@ from auto_round import AutoRoundConfig ## must import for auto-round format For more details on quantization, inference, evaluation, and environment, see the following recipe: -- [Qwen2-VL-7B-Instruct](../../docs/Qwen2-VL-7B-Instruct-sym) -- [Llama-3.2-11B-Vision](../../docs/Llama-3.2-11B-Vision-Instruct-sym) -- [Phi-3.5-vision-instruct](../../docs/Phi-3.5-vision-instruct-sym) -- [llava-v1.5-7b](../../docs/llava-v1.5-7b-sym) -- [cogvlm2-llama3-chat-19B](../../docs/cogvlm2-llama3-chat-19B-sym) +- [Qwen2-VL-7B-Instruct](../../docs/Qwen2-VL-7B-Instruct-sym.md) +- [Llama-3.2-11B-Vision](../../docs/Llama-3.2-11B-Vision-Instruct-sym.md) +- [Phi-3.5-vision-instruct](../../docs/Phi-3.5-vision-instruct-sym.md) +- [llava-v1.5-7b](../../docs/llava-v1.5-7b-sym.md) +- [cogvlm2-llama3-chat-19B](../../docs/cogvlm2-llama3-chat-19B-sym.md) From b7ee3e29bc00214af0c84bf04e70e01933a74e8e Mon Sep 17 00:00:00 2001 From: wenhuach21 <wenhua.cheng@intel.com> Date: Mon, 25 Nov 2024 09:52:22 +0800 Subject: [PATCH 4/6] update blog --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 18bb9e70..0bf08ac6 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ more accuracy data and recipes across various models. ## What's New * [2024/11] We provide experimental support for VLLM quantization, please check out [MLLM README](./auto_round/mllm/README.md) -* [2024/11] We provide some tips and tricks for LLM&VLM quantization, please check out [this file](./docs/tips_and_tricks.md) +* [2024/11] We provide some tips and tricks for LLM&VLM quantization, please check out [this blog](https://medium.com/@NeuralCompressor/10-tips-for-quantizing-llms-and-vlms-with-autoround-923e733879a7) * [2024/10] AutoRound has been integrated to [torch/ao](https://github.com/pytorch/ao), check out their [release note](https://github.com/pytorch/ao/releases/tag/v0.6.1) * [2024/10] Important update: We now support full-range symmetric quantization and have made it the default From 9e52a9e49ef8807496dd86af30d78eba61a01811 Mon Sep 17 00:00:00 2001 From: wenhuach21 <wenhua.cheng@intel.com> Date: Mon, 25 Nov 2024 09:53:57 +0800 Subject: [PATCH 5/6] update --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 0bf08ac6..9fea247f 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ AutoRound --- <div align="left"> -AutoRound is an advanced quantization algorithm for low-bits LLM inference. It's tailored for a wide range +AutoRound is an advanced quantization algorithm for low-bits LLM/VLM inference. It's tailored for a wide range of models. AutoRound adopts sign gradient descent to fine-tune rounding values and minmax values of weights in just 200 steps, which competes impressively against recent methods without introducing any additional inference overhead and keeping low @@ -26,7 +26,7 @@ more accuracy data and recipes across various models. <div align="left"> ## What's New -* [2024/11] We provide experimental support for VLLM quantization, please check out [MLLM README](./auto_round/mllm/README.md) +* [2024/11] We provide experimental support for VLLM quantization, please check out the [README](./auto_round/mllm/README.md) * [2024/11] We provide some tips and tricks for LLM&VLM quantization, please check out [this blog](https://medium.com/@NeuralCompressor/10-tips-for-quantizing-llms-and-vlms-with-autoround-923e733879a7) * [2024/10] AutoRound has been integrated to [torch/ao](https://github.com/pytorch/ao), check out their [release note](https://github.com/pytorch/ao/releases/tag/v0.6.1) From 21ea08df4a31e3b4a49289744ad86465f011efe4 Mon Sep 17 00:00:00 2001 From: wenhuach21 <wenhua.cheng@intel.com> Date: Mon, 25 Nov 2024 09:55:53 +0800 Subject: [PATCH 6/6] update --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9fea247f..7bf68812 100644 --- a/README.md +++ b/README.md @@ -76,8 +76,8 @@ pip install auto-round[hpu] ### Basic Usage (Gaudi2/CPU/GPU) -[//]: # (A user guide detailing the full list of supported arguments is provided by calling ```auto-round -h``` on the terminal.) -Alternatively, you can use ```auto_round``` instead of ```auto-round```. Set the format you want in `format` and + A user guide detailing the full list of supported arguments is provided by calling ```auto-round -h``` on the terminal. + Set the format you want in `format` and multiple formats exporting has been supported. Please check out [step-by-step-instruction](./docs/step_by_step.md) for more details about calibration dataset or evaluation. ```bash