-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INT6/INT4 support for model optimization #1100
Comments
yes, INT4/INT6/INT8 are supported in the context of weight-only post-training quantization. |
I see, for example, assume that I will use Post-train static quantization, i.e quantize activation and weight at the same time, can I apply INT4/INT6/INT8 quantization? |
INT8 is supported for both activation and weight, while INT4/INT6 is weight only. |
What is the appropriate workflow to quantize the weights to INT4 and the activations to INT8? Are we able to achieve this in one .fit() session? |
The quantization flow is quite similar (under fit), while additional config is required. See the example: https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md. |
I tried that example, but it doesn't work unfortunately. I tried using these two configurations: Attempt 1qt_conf = PostTrainingQuantConfig(
approach="weight_only",
op_type_dict={
".*": { # re.match
"weight": {
"bits": 4, # 1-8 bit
"group_size": -1, # -1 (per-channel)
"scheme": "sym",
"algorithm": "RTN",
},
},
},
) This caused an exception KeyError ('default').
Attempt 2 qt_conf = PostTrainingQuantConfig(
domain="object_detection",
excluded_precisions=["bf16", "fp16"],
approach="auto",
quant_level=1,
op_type_dict={
"Conv": {
"weight": {
"bits": 4,
"group_size": -1,
"scheme": "sym",
"algorithm": "RTN",
}
}
},
) This ran successfully, but I don't think the weights were quantized to 4 bit. There were no accuracy loss and I also manually inspected the model.pt weights file. I am using torch==2.0.1. |
@paul-ang Only linear is supported in weight-only approach. The issue you met in case 1 is because our configuration mismatch for Conv2d. I will remove Conv2d when fetching quantizable ops. |
Does this mean that quantizing Conv2D lower than 8 bit is not supported at the moment? |
Yes, usually Conv2d is not a memory-bounding operator so we only support linear for Large Language Models. If you can provide some implementation that used Conv2d with large weight size, we will consider supporting it in weight-only mode. |
@paul-ang Thank you for reporting the Conv2d issue. We have raised a PR to fix it. |
Dear Author,
thank you so much for the project. I have read your results, and have a question regarding the implementation of the PTQ. Does it support INT4/INT6 or other?
The text was updated successfully, but these errors were encountered: