Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine docs, add accuracy data, add receip and eval scripts #226

Merged
merged 13 commits into from
Aug 27, 2024
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,8 @@ Please note that an asterisk (*) indicates third-party quantized models, which m

Model | Supported |
|--------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| microsoft/Phi-3-vision-128k-instruct | [recipe](./examples/multimodal-modeling/Phi-3-vision/run_autoround.sh)
WeiweiZhang1 marked this conversation as resolved.
Show resolved Hide resolved
| Qwen/Qwen-VL | [accuracy](./examples/multimodal-modeling/Qwen-VL/README.md), [recipe](./examples/multimodal-modeling/Qwen-VL/run_autoround.sh)
| meta-llama/Meta-Llama-3.1-70B-Instruct | [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-70B-Instruct-int4-inc) |
| meta-llama/Meta-Llama-3.1-8B-Instruct | [model-kaitchup-autogptq-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-asym), [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-Instruct-autoround-gptq-4bit-sym), [recipe](https://huggingface.co/Intel/Meta-Llama-3.1-8B-Instruct-int4-inc) |
| meta-llama/Meta-Llama-3.1-8B | [model-kaitchup-autogptq-sym-int4*](https://huggingface.co/kaitchup/Meta-Llama-3.1-8B-autoround-gptq-4bit-sym) |
Expand Down
12 changes: 4 additions & 8 deletions examples/multimodal-modeling/Llava/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ This document presents step-by-step instructions for auto-round.

In this example, we introduce an straight-forward way to execute quantization on some popular multimodal models such as LLaVA.

Please note that LLAVA quantization is currently an **experimental feature** and does not yet support inference on various devices after export.

## Install
If you are not using Linux, do NOT proceed, see instructions for [macOS](https://github.com/haotian-liu/LLaVA/blob/main/docs/macOS.md) and [Windows](https://github.com/haotian-liu/LLaVA/blob/main/docs/Windows.md).

Expand Down Expand Up @@ -62,11 +64,11 @@ Include the flag `--adam`. Note that AdamW is less effective than sign gradient

- **Running on Intel Gaudi2**
```bash
bash run_autoround_on_gaudi.sh
bash run_autoround.sh
```

## 4. Results
Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and TextVQA dataset for evaluation. When the vision components are not involved in quantization, it is able to achieve accuracy loss within 1%. The results for LLava-7b are as follows:
Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and TextVQA dataset for evaluation. When the vision components are not involved in quantization, it is able to achieve accuracy loss within 1%. The results for fake quantized LLava-7b are as follows:
| Model | Config | Precision | Hyperparameter | Accuracy% | Relative drop |
| :----: | :----: | :----: | :----: | :----: | :----: |
| liuhaotian/llava-v1.5-7b | - | FP16 | - | 58.21 | - |
Expand Down Expand Up @@ -96,9 +98,3 @@ If you find SignRound useful for your research, please cite our paper:
```








71 changes: 68 additions & 3 deletions examples/multimodal-modeling/Phi-3-vision/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ COCO: [train2017](http://images.cocodataset.org/zips/train2017.zip), and unzip t


## 2. Run Examples
PyTorch 1.8 or higher version is needed

Enter into the examples folder and install lm-eval to run the evaluation
```bash
pip install -r requirements.txt
Expand Down Expand Up @@ -47,13 +49,75 @@ Include the flag `--adam`. Note that AdamW is less effective than sign gradient

- **Running on Intel Gaudi2**
```bash
bash run_autoround_on_gaudi.sh
bash run_autoround.sh
```


## 3. Environment
## 3. Run Inference

```python
from PIL import Image
import requests
import io
from transformers import AutoModelForCausalLM
from transformers import AutoProcessor
from auto_round.auto_quantizer import AutoHfQuantizer
quantized_model_path = "./tmp_autoround"
model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="auto", trust_remote_code=True, torch_dtype="auto", _attn_implementation='flash_attention_2') # use _attn_implementation='eager' to disable flash attention

processor = AutoProcessor.from_pretrained(quantized_model_path, trust_remote_code=True)

messages = [ \
{"role": "user", "content": "<|image_1|>\nWhat is shown in this image?"}, \
{"role": "assistant", "content": "The chart displays the percentage of respondents who agree with various statements about their preparedness for meetings. It shows five categories: 'Having clear and pre-defined goals for meetings', 'Knowing where to find the information I need for a meeting', 'Understanding my exact role and responsibilities when I'm invited', 'Having tools to manage admin tasks like note-taking or summarization', and 'Having more focus time to sufficiently prepare for meetings'. Each category has an associated bar indicating the level of agreement, measured on a scale from 0% to 100%."}, \
{"role": "user", "content": "Provide insightful questions to spark discussion."}]

url = "https://assets-c4akfrf5b4d3f4b7.z01.azurefd.net/assets/2024/04/BMDataViz_661fb89f3845e.png"
# image = Image.open(requests.get(url, stream=True).raw)
image = Image.open(io.BytesIO(requests.get(url, stream=True).content))

prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(prompt, [image], return_tensors="pt").to("cuda:0")

generation_args = {
"max_new_tokens": 50,
"temperature": 0.0,
"do_sample": False,
}

generate_ids = model.generate(**inputs, eos_token_id=processor.tokenizer.eos_token_id, **generation_args)

# remove input tokens
generate_ids = generate_ids[:, inputs['input_ids'].shape[1]:]
response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

print(response)
# 1. How does the level of agreement on each statement reflect the overall preparedness of respondents for meetings?
# 2. What are the most and least agreed-upon statements, and why might that be the case?
# 3.
```
<!--

## 4. Results
Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and lm_eval dataset for evaluation. please follow the [recipe](./run_autoround.sh) and [evaluate script](./run_eval.sh). The results for Phi-3-vision-128k-instruct are as follows:
WeiweiZhang1 marked this conversation as resolved.
Show resolved Hide resolved
| Metric | bf16 | INT4 |
|----------------|--------|--------|
| avg | 0.6014 | 0.5940 |
| mmlu | 0.6369 | 0.6310 |
| lambada_openai | 0.6487 | 0.6406 |
| hellaswag | 0.5585 | 0.5483 |
| winogrande | 0.7395 | 0.7451 |
| piqa | 0.7954 | 0.7889 |
| truthfulqa_mc1 | 0.3084 | 0.2987 |
| openbookqa | 0.3580 | 0.3600 |
| boolq | 0.8532 | 0.8557 |
| arc_easy | 0.8371 | 0.8346 |
| arc_challenge | 0.5572 | 0.5469 |
| cmmlu | 0.4074 | 0.3950 |
| ceval | 0.4027 | 0.4012 |
| gsm8k | 0.7157 | 0.6755 | -->

PyTorch 1.8 or higher version is needed


## Reference
Expand All @@ -72,3 +136,4 @@ If you find SignRound useful for your research, please cite our paper:




17 changes: 12 additions & 5 deletions examples/multimodal-modeling/Phi-3-vision/eval_042/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -576,6 +576,10 @@ def evaluate(
parser.add_argument(
"--eval_bs", default=1,
)
parser.add_argument(
"--device", default="cuda:0",
help="PyTorch device (e.g. cpu/cuda:0/hpu) for evaluation."
)
parser.add_argument(
"--trust_remote_code", action='store_true',
help="Whether to enable trust_remote_code"
Expand All @@ -600,17 +604,20 @@ def evaluate(
model_args += f",autogptq=True,gptq_use_triton=True"
if args.trust_remote_code:
model_args += f",trust_remote_code=True"
model_args += ",dtype=bfloat16"
test_tasks = args.tasks
if isinstance(test_tasks, str):
test_tasks = test_tasks.split(',')
model_name = args.model_name.rstrip('/')
from lm_eval.utils import make_table
result = simple_evaluate(model="hf",
model_args=model_args,
tasks=test_tasks,
batch_size=args.eval_bs)
with torch.cuda.amp.autocast():
result = simple_evaluate(model="hf",
model_args=model_args,
tasks=test_tasks,
device=args.device,
batch_size=args.eval_bs)
print(make_table(result))

print("cost time: ", time.time() - s)


1 change: 1 addition & 0 deletions examples/multimodal-modeling/Phi-3-vision/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,3 +464,4 @@ def create_data_loader(dataset, batch_size=1, data_collator=None):
from lm_eval.utils import make_table

print(make_table(res))

3 changes: 3 additions & 0 deletions examples/multimodal-modeling/Phi-3-vision/run_autoround.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ CUDA_VISIBLE_DEVICES=$device \
python3 main.py \
--model_name=$model_name \
--deployment_device 'auto_round' \
--nsamples 512 \
--model_dtype bf16 \
--image_folder /PATH/TO/coco/images/train2017 \
--question_file /PATH/TO/llava_v1_5_mix665k.json \
--output_dir "./tmp_autoround"

This file was deleted.

55 changes: 9 additions & 46 deletions examples/multimodal-modeling/Phi-3-vision/run_eval.sh
Original file line number Diff line number Diff line change
@@ -1,48 +1,11 @@
export https_proxy=http://proxy.ims.intel.com:911
export http_proxy=http://proxy.ims.intel.com:911
export HF_HOME=/home/weiweiz1/.cache/
#!/bin/bash
set -x
device=0

# Mistral-7B-Instruct-v0.2
# device=3
# Baichuan2-7B-Chat Phi-3-mini-4k-instruct
# Llama-2-7b-chat-hf
# lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,
# ceval-valid,cmmlu
# dir=/data5/zww/test_faster/
# dir=/models
# for model in Phi-3-mini-4k-instruct Meta-Llama-3-8B-Instruct
# do
# echo ${model}/default
# CUDA_VISIBLE_DEVICES=$device \
# python3 eval_042/evaluation.py --model_name ${dir}${model}_default/$model-autoround-w4g128-gpu \
# --trust_remote_code \
# --eval_bs 16 --tasks gsm8k,ceval-valid,cmmlu \
# 2>&1| tee -a /data4/zww/test_faster/rounding_${model}_rtn.txt
# echo ${model}/rtn
# done&

device=2
dir=/data4/zww/tmp/
# dir=/data5/models/
for model in Phi-3-vision-128k-instruct
do
echo ${model}
CUDA_VISIBLE_DEVICES=$device \
python3 eval_042/evaluation.py --model_name ${dir}/$model-autoround-w4g128-round \
--trust_remote_code \
--eval_bs 16 --tasks lambada_openai \
2>&1| tee -a /data4/zww/test_faster/rounding_${model}.txt
echo ${model}
done
# dir=/data5/zww/test_faster/
# for model in Phi-3-mini-4k-instruct Mistral-7B-Instruct-v0.2
# do
# echo ${model}/rtn
# CUDA_VISIBLE_DEVICES=$device \
# python3 eval_042/evaluation.py --model_name ${dir}${model}_rtn/$model-autoround-w4g128-gpu \
# --trust_remote_code \
# --eval_bs 16 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k \
# 2>&1| tee -a /data4/zww/test_faster/rounding_${model}_rtn.txt
# echo ${model}/rtn
# done
model_path='./tmp_autoround'
model=Phi-3-vision-128k-instruct

CUDA_VISIBLE_DEVICES=$device python3 eval_042/evaluation.py \
--model_name ${model_path}/${model} \
--trust_remote_code \
--eval_bs 16
66 changes: 59 additions & 7 deletions examples/multimodal-modeling/Qwen-VL/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,17 +100,68 @@ Include the flag `--adam`. Note that AdamW is less effective than sign gradient

- **Running on Intel Gaudi2**
```bash
bash run_autoround_on_gaudi.sh
bash run_autoround.sh
```

## 3. run inference

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
import torch
from transformers import set_seed
set_seed(1234)
from auto_round.auto_quantizer import AutoHfQuantizer
quantized_model_path = "./tmp_autoround"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_path, trust_remote_code=True)
# use bf16
model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="auto", trust_remote_code=True, bf16=True).eval()
# use fp16
# model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="auto", trust_remote_code=True, fp16=True).eval()
# use cpu only
# model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="cpu", trust_remote_code=True).eval()
# use cuda device
# model = AutoModelForCausalLM.from_pretrained(quantized_model_path, device_map="cuda", trust_remote_code=True).eval()
query = tokenizer.from_list_format([{'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'}, \
{'text': 'Generate the caption in English with grounding:'}, \
])
inputs = tokenizer(query, return_tensors='pt')
inputs = inputs.to(model.device)
with torch.cuda.amp.autocast():
pred = model.generate(**inputs)
response = tokenizer.decode(pred.cpu()[0], skip_special_tokens=False)
print(response)
# <img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>Generate the caption in English with grounding:<ref> Woman</ref><box>(451,379),(731,806)</box> and<ref> her dog</ref><box>(219,424),(576,896)</box> playing on the beach<|endoftext|>
image = tokenizer.draw_bbox_on_latest_picture(response)
if image:
image.save('2.jpg')
else:
print("no box")

```


## 4. Results
Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and TextVQA dataset for evaluation. It is able to achieve accuracy loss within 1% Whether or not the visual component is quantified. The results for Qwen-VL are as follows:
| Model | Config | Precision | Hyperparameter | Accuracy% | Relative drop |
| :----: | :----: | :----: | :----: | :----: | :----: |
| Qwen/Qwen-VL | - | FP16 | - | 63.94 | - |
| Qwen/Qwen-VL | W4G128 | FP16 | with vision | 63.68 | -0.41% |
| Qwen/Qwen-VL | W4G128 | FP16 | w/o vision | 63.73 | -0.33% |
Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and TextVQA dataset for evaluation. please follow the [recipe](./run_autoround.sh) and [evaluate script](./run_eval.sh). The results for Qwen-VL are as follows:
| Metric | bf16 | INT4 |
|----------------|--------|--------|
| avg | 0.5628 | 0.5589 |
| paper-avg | 0.5603 | 0.5611 |
| mmlu | 0.4828 | 0.4639 |
| lambada_openai | 0.6782 | 0.6664 |
| hellaswag | 0.5593 | 0.5487 |
| winogrande | 0.6827 | 0.6875 |
| piqa | 0.7786 | 0.7748 |
| truthfulqa_mc1 | 0.2876 | 0.2901 |
| openbookqa | 0.2880 | 0.2940 |
| boolq | 0.7012 | 0.7318 |
| arc_easy | 0.7201 | 0.7327 |
| arc_challenge | 0.4249 | 0.4206 |
| cmmlu | 0.4798 | 0.4618 |
| ceval | 0.4814 | 0.4569 |
| textVQA | 0.6402 | 0.6379 |
| scienceVQA | 0.6748 | 0.6574 |



## 5. Environment
Expand All @@ -136,3 +187,4 @@ If you find SignRound useful for your research, please cite our paper:




Empty file.
Loading