Simulated W4Afp8 Quantization #331

wenhuach21 · 2024-11-21T08:27:57Z

for model_name in "/models/Meta-Llama-3.1-8B-Instruct" "/models/Meta-Llama-3-8B-Instruct"
do
CUDA_VISIBLE_DEVICES=$device
python3 -m auto_round
--model_name $model_name
--device 0
--act_bits 8
--group_size 128
--bits 4
--tasks "lambada_openai,hellaswag,winogrande,piqa,mmlu"
--eval_bs $eval_bs
--data_type "fp8_to_int_sym"
--act_data_type "fp8"
--disable_act_dynamic
--format "fake"
2>&1 | tee -a w4_fp8_act_static.txt
done

for more information, see https://pre-commit.ci

n1ck-guo · 2024-11-28T07:35:49Z

auto_round/quantizer.py

+        self.orig_layer = orig_layer
+        self.device = device
+        self.enable_minmax_tuning = enable_minmax_tuning
+        self.enable_norm_bias_tuning = enable_norm_bias_tuning and orig_layer.bias is not None


enable_norm_bias_tuning and (orig_layer.bias is not None)

not a must, right? is not should be calculated first

Signed-off-by: yiliu30 <[email protected]>

wenhuach21 and others added 18 commits November 21, 2024 16:27

try to support fp8

23e6ff4

add files

fb17014

[pre-commit.ci] auto fixes from pre-commit.com hooks

35cd376

for more information, see https://pre-commit.ci

Merge branch 'main' into fp8

6bb1716

tiny change

97d1237

fix

6d0c582

fix nan issue and change to dynamic per token

fe41d5a

Merge branch 'main' into fp8

53ccc1c

Merge branch 'fp8' of https://github.com/intel/auto-round into fp8

cf6f719

support static quantization, the code is ugly

fe385c3

fix

73ebb24

refine a little

2b64780

update a little

237f886

refine code, fp16 model are easily gen NAN grad, need to have a study

bd8fea4

Merge branch 'main' into fp8

cba9bd2

tmp change

d907447

fix a critic bug

23d9604

refine code

b590259

wenhuach21 added the draft label Nov 27, 2024

wenhuach21 marked this pull request as draft November 27, 2024 06:10

wenhuach21 added 6 commits November 27, 2024 14:55

merge conv1d and fix conv1d exporting issue

8ba1137

Merge branch 'main' into fp8

dcaee16

tmp change

bebe8f9

Merge branch 'main' into fp8

41a7eab

fix issue

9c11187

update

ac44576

wenhuach21 changed the title ~~[WIP]try to support fp8~~ Simulated W4Afp8 Quantization Nov 28, 2024

wenhuach21 marked this pull request as ready for review November 28, 2024 02:05

wenhuach21 added 2 commits November 28, 2024 10:06

Merge branch 'main' into fp8

0c98a2e

refine a little

2c190ae

wenhuach21 added 3 commits November 28, 2024 10:12

Merge branch 'fp8' of https://github.com/intel/auto-round into fp8

2e095b8

fix preci issue

b068322

fix preci issue

80a74ae

wenhuach21 removed the draft label Nov 28, 2024

wenhuach21 added 3 commits November 28, 2024 11:08

remove debug code

07acfd6

try to fix ut

41ad46c

Merge branch 'main' into fp8

cd18f37

wenhuach21 requested review from yiliu30, WeiweiZhang1 and n1ck-guo November 28, 2024 05:27

WeiweiZhang1 approved these changes Nov 28, 2024

View reviewed changes

n1ck-guo approved these changes Nov 28, 2024

View reviewed changes

yiliu30 and others added 3 commits November 28, 2024 04:20

fix numba pack

5a94ac6

Signed-off-by: yiliu30 <[email protected]>

Merge branch 'main' into fp8

2ad66ca

fix comment

1608c97

wenhuach21 merged commit a98175f into main Nov 28, 2024
8 checks passed

wenhuach21 deleted the fp8 branch November 28, 2024 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulated W4Afp8 Quantization #331

Simulated W4Afp8 Quantization #331

wenhuach21 commented Nov 21, 2024 •

edited

Loading

n1ck-guo Nov 28, 2024

wenhuach21 Nov 28, 2024

Simulated W4Afp8 Quantization #331

Simulated W4Afp8 Quantization #331

Conversation

wenhuach21 commented Nov 21, 2024 • edited Loading

n1ck-guo Nov 28, 2024

Choose a reason for hiding this comment

wenhuach21 Nov 28, 2024

Choose a reason for hiding this comment

wenhuach21 commented Nov 21, 2024 •

edited

Loading