Skip to content

Commit

Permalink
refine docs, add accuracy data, add receip and eval scripts (#226)
Browse files Browse the repository at this point in the history
* refine docs, add accuracy data, add receip and eval scripts

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update supported model list

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* add generation results, update supported model list

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* fixtypos

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* fix typo

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* follow comments

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* resort model list

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* fixtypo

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* fixtypo2

Signed-off-by: Zhang, Weiwei1 <[email protected]>

* refine table

Signed-off-by: Zhang, Weiwei1 <[email protected]>

---------

Signed-off-by: Zhang, Weiwei1 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
WeiweiZhang1 and pre-commit-ci[bot] authored Aug 27, 2024
1 parent 4455478 commit c6d8bf6
Show file tree
Hide file tree
Showing 20 changed files with 930 additions and 148 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,22 +188,24 @@ Please note that an asterisk (*) indicates third-party quantized models, which m

Model | Supported |
|--------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| meta-llama/Meta-Llama-3.1-70B-Instruct | [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-70B-Instruct-int4-inc) |
| meta-llama/Meta-Llama-3.1-70B-Instruct | [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-70B-Instruct-int4-inc) |
| meta-llama/Meta-Llama-3.1-8B-Instruct | [model-kaitchup-autogptq-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-asym), [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-sym), [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-8B-Instruct-int4-inc) |
| meta-llama/Meta-Llama-3.1-8B | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-autoround-gptq-4bit-sym) |
| Qwen/Qwen-VL | [accuracy](./examples/multimodal-modeling/Qwen-VL/README.md), [recipe](./examples/multimodal-modeling/Qwen-VL/run_autoround.sh)
| Qwen/Qwen2-7B | [model-autoround-int4](https://huggingface.co/Intel/Qwen2-7B-int4-inc) |
| Qwen/Qwen2-57B-A14B-Instruct | [model-autoround-int4](https://huggingface.co/Intel/Qwen2-57B-A14B-Instruct-int4-inc) |
| microsoft/Phi-3.5-mini-instruct | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Phi-3.5-Mini-instruct-AutoRound-4bit) |
| TinyLlama-1.1B-intermediate | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/TinyLlama-1.1B-intermediate-step-1341k-3T-autoround-lm_head-symFalse) |
| 01-ai/Yi-1.5-9B | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/Yi-1.5-9B-4bit-gptq-autoround) |
| 01-ai/Yi-1.5-9B-Chat | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/Yi-1.5-9B-Chat-4bit-gptq-autoround) |
| Intel/neural-chat-7b-v3-3 | [model-autogptq-int4](https://huggingface.co/Intel/neural-chat-7b-v3-3-int4-inc) |
| Intel/neural-chat-7b-v3-1 | [model-autogptq-int4](https://huggingface.co/Intel/neural-chat-7b-v3-1-int4-inc) |
| TinyLlama-1.1B-intermediate | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/TinyLlama-1.1B-intermediate-step-1341k-3T-autoround-lm_head-symFalse) |
| mistralai/Mistral-7B-v0.1 | [model-autogptq-lmhead-int4](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc-lmhead), [model-autogptq-int4](https://huggingface.co/Intel/Mistral-7B-v0.1-int4-inc) |
| microsoft/phi-2 | [model-autogptq-sym-int4](https://huggingface.co/Intel/phi-2-int4-inc) |
| google/gemma-2b | [model-autogptq-int4](https://huggingface.co/Intel/gemma-2b-int4-inc) |
| tiiuae/falcon-7b | [model-autogptq-int4-G64](https://huggingface.co/Intel/falcon-7b-int4-inc) |
| 01-ai/Yi-1.5-9B | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/Yi-1.5-9B-4bit-gptq-autoround) |
| 01-ai/Yi-1.5-9B-Chat | [model-LnL-AI-autogptq-int4*](https://huggingface.co/LnL-AI/Yi-1.5-9B-Chat-4bit-gptq-autoround) |
| sapienzanlp/modello-italia-9b | [model-fbaldassarri-autogptq-int4*](https://huggingface.co/fbaldassarri/modello-italia-9b-autoround-w4g128-cpu) |
| microsoft/phi-2 | [model-autogptq-sym-int4](https://huggingface.co/Intel/phi-2-int4-inc) |
| microsoft/Phi-3.5-mini-instruct | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Phi-3.5-Mini-instruct-AutoRound-4bit) |
| microsoft/Phi-3-vision-128k-instruct | [recipe](./examples/multimodal-modeling/Phi-3-vision/run_autoround.sh)
| mistralai/Mistral-7B-Instruct-v0.2 | [accuracy](./docs/Mistral-7B-Instruct-v0.2-acc.md), [recipe](./examples/language-modeling/scripts/Mistral-7B-Instruct-v0.2.sh), [example](./examples/language-modeling/) |
| mistralai/Mixtral-8x7B-Instruct-v0.1 | [accuracy](./docs/Mixtral-8x7B-Instruct-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mixtral-8x7B-Instruct-v0.1.sh), [example](./examples/language-modeling/) |
| mistralai/Mixtral-8x7B-v0.1 | [accuracy](./docs/Mixtral-8x7B-v0.1-acc.md), [recipe](./examples/language-modeling/scripts/Mixtral-8x7B-v0.1.sh), [example](./examples/language-modeling/) |
Expand Down
12 changes: 4 additions & 8 deletions examples/multimodal-modeling/Llava/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ This document presents step-by-step instructions for auto-round.

In this example, we introduce an straight-forward way to execute quantization on some popular multimodal models such as LLaVA.

Please note that LLAVA quantization is currently an **experimental feature** and does not yet support inference on various devices after export.

## Install
If you are not using Linux, do NOT proceed, see instructions for [macOS](https://github.com/haotian-liu/LLaVA/blob/main/docs/macOS.md) and [Windows](https://github.com/haotian-liu/LLaVA/blob/main/docs/Windows.md).

Expand Down Expand Up @@ -62,11 +64,11 @@ Include the flag `--adam`. Note that AdamW is less effective than sign gradient

- **Running on Intel Gaudi2**
```bash
bash run_autoround_on_gaudi.sh
bash run_autoround.sh
```

## 4. Results
Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and TextVQA dataset for evaluation. When the vision components are not involved in quantization, it is able to achieve accuracy loss within 1%. The results for LLava-7b are as follows:
Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and TextVQA dataset for evaluation. When the vision components are not involved in quantization, it is able to achieve accuracy loss within 1%. The results for fake quantized LLava-7b are as follows:
| Model | Config | Precision | Hyperparameter | Accuracy% | Relative drop |
| :----: | :----: | :----: | :----: | :----: | :----: |
| liuhaotian/llava-v1.5-7b | - | FP16 | - | 58.21 | - |
Expand Down Expand Up @@ -96,9 +98,3 @@ If you find SignRound useful for your research, please cite our paper:
```








71 changes: 68 additions & 3 deletions examples/multimodal-modeling/Phi-3-vision/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ COCO: [train2017](http://images.cocodataset.org/zips/train2017.zip), and unzip t


## 2. Run Examples
PyTorch 1.8 or higher version is needed

Enter into the examples folder and install lm-eval to run the evaluation
```bash
pip install -r requirements.txt
Expand Down Expand Up @@ -47,13 +49,75 @@ Include the flag `--adam`. Note that AdamW is less effective than sign gradient

- **Running on Intel Gaudi2**
```bash
bash run_autoround_on_gaudi.sh
bash run_autoround.sh
```


## 3. Environment
## 3. Run Inference

```python
from PIL import Image
import requests
import io
from transformers import AutoModelForCausalLM
from transformers import AutoProcessor
from auto_round.auto_quantizer import AutoHfQuantizer
quantized_model_path = "./tmp_autoround"
model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="auto", trust_remote_code=True, torch_dtype="auto", _attn_implementation='flash_attention_2') # use _attn_implementation='eager' to disable flash attention

processor = AutoProcessor.from_pretrained(quantized_model_path, trust_remote_code=True)

messages = [ \
{"role": "user", "content": "<|image_1|>\nWhat is shown in this image?"}, \
{"role": "assistant", "content": "The chart displays the percentage of respondents who agree with various statements about their preparedness for meetings. It shows five categories: 'Having clear and pre-defined goals for meetings', 'Knowing where to find the information I need for a meeting', 'Understanding my exact role and responsibilities when I'm invited', 'Having tools to manage admin tasks like note-taking or summarization', and 'Having more focus time to sufficiently prepare for meetings'. Each category has an associated bar indicating the level of agreement, measured on a scale from 0% to 100%."}, \
{"role": "user", "content": "Provide insightful questions to spark discussion."}]

url = "https://assets-c4akfrf5b4d3f4b7.z01.azurefd.net/assets/2024/04/BMDataViz_661fb89f3845e.png"
# image = Image.open(requests.get(url, stream=True).raw)
image = Image.open(io.BytesIO(requests.get(url, stream=True).content))

prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(prompt, [image], return_tensors="pt").to("cuda:0")

generation_args = {
"max_new_tokens": 50,
"temperature": 0.0,
"do_sample": False,
}

generate_ids = model.generate(**inputs, eos_token_id=processor.tokenizer.eos_token_id, **generation_args)

# remove input tokens
generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

print(response)
# 1. How does the level of agreement on each statement reflect the overall preparedness of respondents for meetings?
# 2. What are the most and least agreed-upon statements, and why might that be the case?
# 3.
```
<!--
## 4. Results
Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and lm_eval dataset for evaluation. please follow the [recipe](./run_autoround.sh) and [evaluate script](./run_eval.sh). The results for Phi-3-vision-128k-instruct are as follows:
| Metric | bf16 | INT4 |
|----------------|--------|--------|
| avg | 0.6014 | 0.5940 |
| mmlu | 0.6369 | 0.6310 |
| lambada_openai | 0.6487 | 0.6406 |
| hellaswag | 0.5585 | 0.5483 |
| winogrande | 0.7395 | 0.7451 |
| piqa | 0.7954 | 0.7889 |
| truthfulqa_mc1 | 0.3084 | 0.2987 |
| openbookqa | 0.3580 | 0.3600 |
| boolq | 0.8532 | 0.8557 |
| arc_easy | 0.8371 | 0.8346 |
| arc_challenge | 0.5572 | 0.5469 |
| cmmlu | 0.4074 | 0.3950 |
| ceval | 0.4027 | 0.4012 |
| gsm8k | 0.7157 | 0.6755 | -->

PyTorch 1.8 or higher version is needed


## Reference
Expand All @@ -72,3 +136,4 @@ If you find SignRound useful for your research, please cite our paper:




17 changes: 12 additions & 5 deletions examples/multimodal-modeling/Phi-3-vision/eval_042/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -576,6 +576,10 @@ def evaluate(
parser.add_argument(
"--eval_bs", default=1,
)
parser.add_argument(
"--device", default="cuda:0",
help="PyTorch device (e.g. cpu/cuda:0/hpu) for evaluation."
)
parser.add_argument(
"--trust_remote_code", action='store_true',
help="Whether to enable trust_remote_code"
Expand All @@ -600,17 +604,20 @@ def evaluate(
model_args += f",autogptq=True,gptq_use_triton=True"
if args.trust_remote_code:
model_args += f",trust_remote_code=True"
model_args += ",dtype=bfloat16"
test_tasks = args.tasks
if isinstance(test_tasks, str):
test_tasks = test_tasks.split(',')
model_name = args.model_name.rstrip('/')
from lm_eval.utils import make_table
result = simple_evaluate(model="hf",
model_args=model_args,
tasks=test_tasks,
batch_size=args.eval_bs)
with torch.cuda.amp.autocast():
result = simple_evaluate(model="hf",
model_args=model_args,
tasks=test_tasks,
device=args.device,
batch_size=args.eval_bs)
print(make_table(result))

print("cost time: ", time.time() - s)


1 change: 1 addition & 0 deletions examples/multimodal-modeling/Phi-3-vision/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,3 +464,4 @@ def create_data_loader(dataset, batch_size=1, data_collator=None):
from lm_eval.utils import make_table

print(make_table(res))

3 changes: 3 additions & 0 deletions examples/multimodal-modeling/Phi-3-vision/run_autoround.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ CUDA_VISIBLE_DEVICES=$device \
python3 main.py \
--model_name=$model_name \
--deployment_device 'auto_round' \
--nsamples 512 \
--model_dtype bf16 \
--image_folder /PATH/TO/coco/images/train2017 \
--question_file /PATH/TO/llava_v1_5_mix665k.json \
--output_dir "./tmp_autoround"

This file was deleted.

55 changes: 9 additions & 46 deletions examples/multimodal-modeling/Phi-3-vision/run_eval.sh
Original file line number Diff line number Diff line change
@@ -1,48 +1,11 @@
export https_proxy=http://proxy.ims.intel.com:911
export http_proxy=http://proxy.ims.intel.com:911
export HF_HOME=/home/weiweiz1/.cache/
#!/bin/bash
set -x
device=0

# Mistral-7B-Instruct-v0.2
# device=3
# Baichuan2-7B-Chat Phi-3-mini-4k-instruct
# Llama-2-7b-chat-hf
# lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,
# ceval-valid,cmmlu
# dir=/data5/zww/test_faster/
# dir=/models
# for model in Phi-3-mini-4k-instruct Meta-Llama-3-8B-Instruct
# do
# echo ${model}/default
# CUDA_VISIBLE_DEVICES=$device \
# python3 eval_042/evaluation.py --model_name ${dir}${model}_default/$model-autoround-w4g128-gpu \
# --trust_remote_code \
# --eval_bs 16 --tasks gsm8k,ceval-valid,cmmlu \
# 2>&1| tee -a /data4/zww/test_faster/rounding_${model}_rtn.txt
# echo ${model}/rtn
# done&

device=2
dir=/data4/zww/tmp/
# dir=/data5/models/
for model in Phi-3-vision-128k-instruct
do
echo ${model}
CUDA_VISIBLE_DEVICES=$device \
python3 eval_042/evaluation.py --model_name ${dir}/$model-autoround-w4g128-round \
--trust_remote_code \
--eval_bs 16 --tasks lambada_openai \
2>&1| tee -a /data4/zww/test_faster/rounding_${model}.txt
echo ${model}
done
# dir=/data5/zww/test_faster/
# for model in Phi-3-mini-4k-instruct Mistral-7B-Instruct-v0.2
# do
# echo ${model}/rtn
# CUDA_VISIBLE_DEVICES=$device \
# python3 eval_042/evaluation.py --model_name ${dir}${model}_rtn/$model-autoround-w4g128-gpu \
# --trust_remote_code \
# --eval_bs 16 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k \
# 2>&1| tee -a /data4/zww/test_faster/rounding_${model}_rtn.txt
# echo ${model}/rtn
# done
model_path='./tmp_autoround'
model=Phi-3-vision-128k-instruct

CUDA_VISIBLE_DEVICES=$device python3 eval_042/evaluation.py \
--model_name ${model_path}/${model} \
--trust_remote_code \
--eval_bs 16
66 changes: 59 additions & 7 deletions examples/multimodal-modeling/Qwen-VL/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,17 +100,68 @@ Include the flag `--adam`. Note that AdamW is less effective than sign gradient

- **Running on Intel Gaudi2**
```bash
bash run_autoround_on_gaudi.sh
bash run_autoround.sh
```

## 3. run inference

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
from transformers import set_seed
set_seed(1234)
from auto_round.auto_quantizer import AutoHfQuantizer
quantized_model_path = "./tmp_autoround"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_path, trust_remote_code=True)
# use bf16
model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="cpu", trust_remote_code=True).eval()
# use cuda device
# model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="cuda", trust_remote_code=True).eval()
query = tokenizer.from_list_format([{'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, \
{'text': 'Generate the caption in English with grounding:'}, \
])
inputs = tokenizer(query, return_tensors='pt')
inputs = inputs.to(model.device)
with torch.cuda.amp.autocast():
pred = model.generate(**inputs)
response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False)
print(response)
# <img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>Generate the caption in English with grounding:<ref> Woman</ref><box>(451,379),(731,806)</box> and<ref> her dog</ref><box>(219,424),(576,896)</box> playing on the beach<|endoftext|>
image = tokenizer.draw_bbox_on_latest_picture(response)
if image:
image.save('2.jpg')
else:
print("no box")

```


## 4. Results
Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and TextVQA dataset for evaluation. It is able to achieve accuracy loss within 1% Whether or not the visual component is quantified. The results for Qwen-VL are as follows:
| Model | Config | Precision | Hyperparameter | Accuracy% | Relative drop |
| :----: | :----: | :----: | :----: | :----: | :----: |
| Qwen/Qwen-VL | - | FP16 | - | 63.94 | - |
| Qwen/Qwen-VL | W4G128 | FP16 | with vision | 63.68 | -0.41% |
| Qwen/Qwen-VL | W4G128 | FP16 | w/o vision | 63.73 | -0.33% |
Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and TextVQA dataset for evaluation. please follow the [recipe](./run_autoround.sh) and [evaluate script](./run_eval.sh). The results for Qwen-VL are as follows:
| Metric | bf16 | INT4 |
|:----------------|:--------|:--------|
| avg | 0.5628 | 0.5589 |
| paper-avg | 0.5603 | 0.5611 |
| mmlu | 0.4828 | 0.4639 |
| lambada_openai | 0.6782 | 0.6664 |
| hellaswag | 0.5593 | 0.5487 |
| winogrande | 0.6827 | 0.6875 |
| piqa | 0.7786 | 0.7748 |
| truthfulqa_mc1 | 0.2876 | 0.2901 |
| openbookqa | 0.2880 | 0.2940 |
| boolq | 0.7012 | 0.7318 |
| arc_easy | 0.7201 | 0.7327 |
| arc_challenge | 0.4249 | 0.4206 |
| cmmlu | 0.4798 | 0.4618 |
| ceval | 0.4814 | 0.4569 |
| textVQA | 0.6402 | 0.6379 |
| scienceVQA | 0.6748 | 0.6574 |



## 5. Environment
Expand All @@ -136,3 +187,4 @@ If you find SignRound useful for your research, please cite our paper:




Empty file.
Loading

0 comments on commit c6d8bf6

Please sign in to comment.