INT6/INT4 support for model optimization #1100

JiaojiaoYe1994 · 2023-07-20T09:07:27Z

Dear Author,

thank you so much for the project. I have read your results, and have a question regarding the implementation of the PTQ. Does it support INT4/INT6 or other?

hshen14 · 2023-07-20T09:22:35Z

yes, INT4/INT6/INT8 are supported in the context of weight-only post-training quantization.

JiaojiaoYe1994 · 2023-07-21T05:58:19Z

yes, INT4/INT6/INT8 are supported in the context of weight-only post-training quantization.

I see, for example, assume that I will use Post-train static quantization, i.e quantize activation and weight at the same time, can I apply INT4/INT6/INT8 quantization?

hshen14 · 2023-07-21T06:44:25Z

yes, INT4/INT6/INT8 are supported in the context of weight-only post-training quantization.

I see, for example, assume that I will use Post-train static quantization, i.e quantize activation and weight at the same time, can I apply INT4/INT6/INT8 quantization?

INT8 is supported for both activation and weight, while INT4/INT6 is weight only.

paul-ang · 2023-09-12T01:35:51Z

What is the appropriate workflow to quantize the weights to INT4 and the activations to INT8? Are we able to achieve this in one .fit() session?

hshen14 · 2023-09-13T13:12:17Z

What is the appropriate workflow to quantize the weights to INT4 and the activations to INT8? Are we able to achieve this in one .fit() session?

The quantization flow is quite similar (under fit), while additional config is required. See the example: https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md.

paul-ang · 2023-09-21T03:16:51Z

I tried that example, but it doesn't work unfortunately. I tried using these two configurations:

Attempt 1

qt_conf = PostTrainingQuantConfig(
        approach="weight_only",
        op_type_dict={
            ".*": {  # re.match
                "weight": {
                    "bits": 4,  # 1-8 bit
                    "group_size": -1,  # -1 (per-channel)
                    "scheme": "sym",
                    "algorithm": "RTN",
                },
            },
        },
    )

This caused an exception KeyError ('default').
Error trace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/neural_compressor/quantization.py", line 223, in fit
    strategy.traverse()
  File "/usr/local/lib/python3.9/dist-packages/neural_compressor/strategy/auto.py", line 134, in traverse
    super().traverse()
  File "/usr/local/lib/python3.9/dist-packages/neural_compressor/strategy/strategy.py", line 482, in traverse
    self._prepare_tuning()
  File "/usr/local/lib/python3.9/dist-packages/neural_compressor/strategy/strategy.py", line 378, in _prepare_tuning
    self.capability = self.capability or self.adaptor.query_fw_capability(self.model)
  File "/usr/local/lib/python3.9/dist-packages/neural_compressor/utils/utility.py", line 301, in fi
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/neural_compressor/adaptor/pytorch.py", line 4834, in query_fw_capability
    return self._get_quantizable_ops(model.model)
  File "/usr/local/lib/python3.9/dist-packages/neural_compressor/adaptor/pytorch.py", line 1141, in _get_quantizable_ops
    else copy.deepcopy(capability["default"])
KeyError: 'default'

Attempt 2

    qt_conf = PostTrainingQuantConfig(
        domain="object_detection",
        excluded_precisions=["bf16", "fp16"],
        approach="auto",
        quant_level=1,
        op_type_dict={
            "Conv": {
                "weight": {
                    "bits": 4,
                    "group_size": -1,
                    "scheme": "sym",
                    "algorithm": "RTN",
                }
            }
        },
    )

This ran successfully, but I don't think the weights were quantized to 4 bit. There were no accuracy loss and I also manually inspected the model.pt weights file.

I am using torch==2.0.1.

xin3he · 2023-09-21T03:26:45Z

@paul-ang Only linear is supported in weight-only approach. The issue you met in case 1 is because our configuration mismatch for Conv2d. I will remove Conv2d when fetching quantizable ops.

paul-ang · 2023-09-21T03:40:59Z

Does this mean that quantizing Conv2D lower than 8 bit is not supported at the moment?

xin3he · 2023-09-21T03:47:10Z

Does this mean that quantizing Conv2D lower than 8 bit is not supported at the moment?

Yes, usually Conv2d is not a memory-bounding operator so we only support linear for Large Language Models. If you can provide some implementation that used Conv2d with large weight size, we will consider supporting it in weight-only mode.

xin3he · 2023-09-21T03:52:40Z

@paul-ang Thank you for reporting the Conv2d issue. We have raised a PR to fix it.

xin3he mentioned this issue Sep 21, 2023

remove Conv2d in white list in WOQ adaptor #1274

Merged

xin3he closed this as completed Oct 31, 2023

chensuyue added a commit to chensuyue/lpot that referenced this issue Feb 21, 2024

Bump version and Docs update (intel#1100)

98d829a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INT6/INT4 support for model optimization #1100

INT6/INT4 support for model optimization #1100

JiaojiaoYe1994 commented Jul 20, 2023

hshen14 commented Jul 20, 2023

JiaojiaoYe1994 commented Jul 21, 2023

hshen14 commented Jul 21, 2023

paul-ang commented Sep 12, 2023

hshen14 commented Sep 13, 2023

paul-ang commented Sep 21, 2023

xin3he commented Sep 21, 2023 •

edited

Loading

paul-ang commented Sep 21, 2023

xin3he commented Sep 21, 2023

xin3he commented Sep 21, 2023

INT6/INT4 support for model optimization #1100

INT6/INT4 support for model optimization #1100

Comments

JiaojiaoYe1994 commented Jul 20, 2023

hshen14 commented Jul 20, 2023

JiaojiaoYe1994 commented Jul 21, 2023

hshen14 commented Jul 21, 2023

paul-ang commented Sep 12, 2023

hshen14 commented Sep 13, 2023

paul-ang commented Sep 21, 2023

Attempt 1

Attempt 2

xin3he commented Sep 21, 2023 • edited Loading

paul-ang commented Sep 21, 2023

xin3he commented Sep 21, 2023

xin3he commented Sep 21, 2023

xin3he commented Sep 21, 2023 •

edited

Loading