patch 1 for mllm #298

n1ck-guo · 2024-11-04T08:58:46Z

TODO:

add support for phi3, llava and etc.
change default config
add default dataset, download from url for both dataset and images

Signed-off-by: n1ck-guo <[email protected]>

auto_round/mllm/processor.py

…ages Signed-off-by: n1ck-guo <[email protected]>

auto_round/script/mllm.py

Signed-off-by: n1ck-guo <[email protected]>

auto_round/mllm/mllm_dataset.py

auto_round/mllm/processor.py

Signed-off-by: n1ck-guo <[email protected]>

auto_round/mllm/mllm_dataset.py

wenhuach21 · 2024-11-07T08:54:50Z

auto_round/script/mllm.py

-        self.add_argument("--dataset", type=str, default=None,
-                          help="the dataset for quantization training. It can be a custom one.")
+        self.add_argument("--dataset", type=str, default="llava_v1_5_mix665k",
+                            help="The dataset for quantization training. It can be a custom one.")


use lowercase for the first letter to follow our current style

wenhuach21 · 2024-11-07T08:55:44Z

auto_round/script/mllm.py

+            processor.chat_template = None
+        safe_serialization = True
+        if "phi3_v" in model_type:
+            safe_serialization = False


the code is tricky, better move to model config later

Signed-off-by: n1ck-guo <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: n1ck-guo <[email protected]>

into hengguo/mllm_patch

Signed-off-by: n1ck-guo <[email protected]>

auto_round/mllm/autoround_mllm.py

Signed-off-by: n1ck-guo <[email protected]>

wenhuach21 · 2024-11-11T07:20:38Z

auto_round/script/mllm.py

                      device=device_str, seed=args.seed, gradient_accumulate_steps=args.gradient_accumulate_steps,
-                      scale_dtype=args.scale_dtype, layer_config=layer_config,
+                      scale_dtype=args.scale_dtype, layer_config=layer_config, template=args.template,


rename template to prompt_template to make it more easier to understand?

This template is more like "model series template". It include processor(text/image), data collator, and special token.

test/test_basic_usage.py

auto_round/mllm/utils.py

wenhuach21 · 2024-11-11T07:25:34Z

test/test_basic_usage.py

        res = os.system(
-            f"{python_path} ../auto_round/__main__.py --model 'facebook/opt-125m' --iter 2 --nsamples 1 --format auto_gptq,auto_round --disable_eval --output_dir ./saved")
+            f"cd .. && {python_path} -m auto_round --mllm --iter 2 --nsamples 10 --format auto_round --output_dir ./saved")
        if res > 0 or res == -1:
            assert False, "cmd line test fail, please have acheck"


add another test for auto-round -h

wenhuach21 · 2024-11-11T07:28:56Z

auto_round/mllm/mllm_dataset.py

@@ -44,18 +46,26 @@ def register(dataset):
    return register


+
 @register_dataset("llava")


better add more information like liuhaotian/llava and 58k or 150k

Signed-off-by: n1ck-guo <[email protected]>

wenhuach21 · 2024-11-11T07:29:45Z

auto_round/script/mllm.py

@@ -50,8 +50,10 @@ def __init__(self, *args, **kwargs):
        self.add_argument("--asym", action='store_true',
                          help="whether to use asym quantization")

-        self.add_argument("--dataset", type=str, default=None,
-                          help="the dataset for quantization training. It can be a custom one.")
+        self.add_argument("--dataset", type=str, default="llava_conv_58k",


llava_conv_58k ==>better change to liuhaotian/llava_conv_58k

Signed-off-by: n1ck-guo <[email protected]>

update

1dc7097

Signed-off-by: n1ck-guo <[email protected]>

n1ck-guo requested review from wenhuach21 and WeiweiZhang1 November 4, 2024 08:58

wenhuach21 reviewed Nov 4, 2024

View reviewed changes

auto_round/mllm/processor.py Outdated Show resolved Hide resolved

mllm default llava, and add auto download func for both datasets & im…

a91b4e9

…ages Signed-off-by: n1ck-guo <[email protected]>

wenhuach21 reviewed Nov 5, 2024

View reviewed changes

auto_round/script/mllm.py Outdated Show resolved Hide resolved

wenhuach21 approved these changes Nov 5, 2024

View reviewed changes

n1ck-guo added 10 commits November 5, 2024 20:10

update

12a1e9a

Signed-off-by: n1ck-guo <[email protected]>

pylint

dcce391

Signed-off-by: n1ck-guo <[email protected]>

llava

a63074a

Signed-off-by: n1ck-guo <[email protected]>

support llava

eaa91e8

Signed-off-by: n1ck-guo <[email protected]>

llava json

01a2e02

Signed-off-by: n1ck-guo <[email protected]>

update

cd741ff

Signed-off-by: n1ck-guo <[email protected]>

merge main

c7ab48d

Signed-off-by: n1ck-guo <[email protected]>

use coco dataset only if no dataset provide

aa9f5bd

Signed-off-by: n1ck-guo <[email protected]>

clean

dc751c4

Signed-off-by: n1ck-guo <[email protected]>

shuffle

6f58da3

Signed-off-by: n1ck-guo <[email protected]>

wenhuach21 reviewed Nov 7, 2024

View reviewed changes

auto_round/mllm/mllm_dataset.py Show resolved Hide resolved

wenhuach21 reviewed Nov 7, 2024

View reviewed changes

auto_round/mllm/processor.py Show resolved Hide resolved

change default dataset

7eab773

Signed-off-by: n1ck-guo <[email protected]>

wenhuach21 reviewed Nov 7, 2024

View reviewed changes

auto_round/mllm/mllm_dataset.py Outdated Show resolved Hide resolved

wenhuach21 reviewed Nov 7, 2024

View reviewed changes

n1ck-guo and others added 7 commits November 7, 2024 04:29

update

a2b6f31

Signed-off-by: n1ck-guo <[email protected]>

update

b082739

Signed-off-by: n1ck-guo <[email protected]>

update

8edfcc6

Signed-off-by: n1ck-guo <[email protected]>

fix

b23473f

Signed-off-by: n1ck-guo <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a7536bb

for more information, see https://pre-commit.ci

fix

78f6e0e

Signed-off-by: n1ck-guo <[email protected]>

Merge branch 'hengguo/mllm_patch' of https://github.com/intel/auto-round

60e92ba

into hengguo/mllm_patch

n1ck-guo and others added 7 commits November 8, 2024 03:25

fix phi3

96c8621

Signed-off-by: n1ck-guo <[email protected]>

fix

71cef8b

Signed-off-by: n1ck-guo <[email protected]>

fix

9242560

Signed-off-by: n1ck-guo <[email protected]>

fix

db14b4f

Signed-off-by: n1ck-guo <[email protected]>

merge main

3cdb392

Signed-off-by: n1ck-guo <[email protected]>

update

c747c2d

Signed-off-by: n1ck-guo <[email protected]>

Merge branch 'main' into hengguo/mllm_patch

b416951