Quantization tool: support float 8 with MatMul, support float 16 weights #18043

xadupre · 2023-10-20T17:59:07Z

Description

Whenever a node QuantizeLinear or DequantizeLinear, the type of the weights before being quantize must be known to create the scale with the expected type. Another option would be to add many operator CastLike but that would push the burden to onnxruntime optimizer.

The PR tries to avoid changing the signature. To do so, it modified the scale computation to use a numpy array to store the result and not a python float. The numpy array must be of the same type than the weights to quantize.

The PR adds many assert to check the type of the scale is not a python type or a float64. This was added to make sure all the code follows the same logic. These lines were kept for the first review.

DequantizeLinear, QuantizeLinear cannot be tested with onnx==1.15. PR onnx/onnx#5709 is missing to fix shape inference. PR onnx/onnx#5473) is missing to support QLinearMatMul with float 16. That explains why some tests are disabled with float 16.

Motivation and Context

The current quantization tool assumes every weight is float 32. For large models such as LLAMA, it is usually float 16. The quantization needs to quantize such weights.

onnxruntime/test/python/quantization/test_op_matmul.py

…qdqmm

onnxruntime/test/python/quantization/op_test_utils.py

…qdqmm

onnxruntime/test/python/quantization/test_conv_dynamic.py

…qdqmm

onnxruntime/python/tools/quantization/operators/softmax.py

…qdqmm

"

…qdqmm

onnxruntime/python/tools/quantization/calibrate.py

…qdqmm

yufenglee

Please run the E2E resnet50 model to double check: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/image_classification/cpu

…ta types (#19114) ### Description - Updates `get_qnn_qdq_config()` to use new scale/zp np.array data types. - Adds missing unit test to help prevent future regression. ### Motivation and Context #18043 changed the usage of `extra_options["TensorQuantizationOverrides"]`. We need to update its use in quantization/execution_providers/qnn/quant_config.py

…hts (#18043) ### Description Whenever a node QuantizeLinear or DequantizeLinear, the type of the weights before being quantize must be known to create the scale with the expected type. Another option would be to add many operator CastLike but that would push the burden to onnxruntime optimizer. The PR tries to avoid changing the signature. To do so, it modified the scale computation to use a numpy array to store the result and not a python float. The numpy array must be of the same type than the weights to quantize. The PR adds many `assert` to check the type of the scale is not a python type or a float64. This was added to make sure all the code follows the same logic. These lines were kept for the first review. DequantizeLinear, QuantizeLinear cannot be tested with onnx==1.15. PR onnx/onnx#5709 is missing to fix shape inference. PR onnx/onnx#5473) is missing to support QLinearMatMul with float 16. That explains why some tests are disabled with float 16. ### Motivation and Context The current quantization tool assumes every weight is float 32. For large models such as LLAMA, it is usually float 16. The quantization needs to quantize such weights.

…ta types (#19114) ### Description - Updates `get_qnn_qdq_config()` to use new scale/zp np.array data types. - Adds missing unit test to help prevent future regression. ### Motivation and Context #18043 changed the usage of `extra_options["TensorQuantizationOverrides"]`. We need to update its use in quantization/execution_providers/qnn/quant_config.py

## Describe your changes PR microsoft/onnxruntime#18043 (onnxruntime) extends onnxruntime quantization tools to support float16 weights. To do so, it enforces scale and zerop_point to be strongly typed (as `numpy.array(single_value, dtype=dtype)`). scale type should always be the weight type, and zero_point type the quantized weight type. That convention is checked all along the quantization tools to make sure there is loss of information. This change was made to avoid adding new arguments in many functions to carry this information. ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link

…t#18043

…19182) ### Description Extends the code coverage to Entroy, Histogram and Distribution calibration method, fix bugs while doing it. ### Motivation and Context Bugs detected in [Olive](https://github.com/microsoft/OLive).

xadupre added 2 commits October 20, 2023 17:28

Update quantization tools to support MatMul with float 8

f09ad3c

support float16

bd47221

github-advanced-security bot found potential problems Oct 20, 2023

View reviewed changes

onnxruntime/test/python/quantization/test_op_matmul.py Fixed Show fixed Hide fixed

xadupre added 4 commits October 23, 2023 19:05

more consistent with types

23c9e39

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

7e7d0fe

…qdqmm

fix types

c073b88

fix many unit tests

35db149

github-advanced-security bot found potential problems Oct 24, 2023

View reviewed changes

onnxruntime/test/python/quantization/op_test_utils.py Fixed Show fixed Hide fixed

xadupre added 18 commits October 25, 2023 15:08

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

799f7c5

…qdqmm

fix conversion, rounding

3a14a31

new fixes

8199196

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

87e2ba1

…qdqmm

fix softmax qdq

6003524

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

aa11d25

…qdqmm

fix shape info

0e96668

update test

170e5c9

fix remaining unit tests

47b41b6

add value_info

b15ffcd

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

0163699

…qdqmm

add subtest

34b7a38

refactoring onnxruntime/test/python/quantization/test_op_matmul.py

8711792

disable f16 for old onnx package

4806a04

disable f16 unit tests

260dd59

support for Conv and float 16

d376f66

extend unit test for Conv

76d9284

fix lint

d2f9294

github-advanced-security bot found potential problems Oct 27, 2023

View reviewed changes

onnxruntime/test/python/quantization/test_conv_dynamic.py Fixed Show fixed Hide fixed

onnxruntime/test/python/quantization/test_conv_dynamic.py Fixed Show fixed Hide fixed

xadupre marked this pull request as ready for review October 27, 2023 16:56

xadupre added 2 commits October 30, 2023 10:06

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

6702b81

…qdqmm

change the disable condition

a6433e8

github-advanced-security bot found potential problems Dec 21, 2023

View reviewed changes

onnxruntime/python/tools/quantization/operators/softmax.py Fixed Show fixed Hide fixed

xadupre added 10 commits December 21, 2023 13:47

fix missing dtype

a90f9f5

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

0164a38

…qdqmm

use np arrays

e6c39f4

"

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

1c8ae86

…qdqmm

improve robustness

1185de0

fix type issue

aee75ff

fix wrong types

63a8ea9

fix one bug

1948c4a

fix dtype issue

67bab54

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

fe1d0fe

…qdqmm

yufenglee reviewed Jan 9, 2024

View reviewed changes

onnxruntime/python/tools/quantization/calibrate.py Outdated Show resolved Hide resolved

yufenglee reviewed Jan 9, 2024

View reviewed changes

onnxruntime/python/tools/quantization/calibrate.py Outdated Show resolved Hide resolved

xadupre added 2 commits January 10, 2024 13:23

better error message

1d49bc5

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

fc406f9

…qdqmm

yufenglee approved these changes Jan 11, 2024

View reviewed changes

xadupre merged commit c8399a8 into microsoft:main Jan 12, 2024
87 checks passed

adrianlizarraga mentioned this pull request Jan 12, 2024

[Quantization] Fix get_qnn_qdq_config to use new scale/zp np.array data types #19114

Merged

xadupre mentioned this pull request Jan 15, 2024

Fix quantization dtypes after ORT PR #18043 microsoft/Olive#881

Merged

5 tasks

xadupre added a commit to xadupre/onnxruntime that referenced this pull request Jan 17, 2024

Fix untyped float value in quantization tool missing from PR microsof…

6971cf0

…t#18043

fxmarty mentioned this pull request Feb 5, 2024

[ORT 1.17 regression] AttributeError: 'NoneType' object has no attribute 'HasField' #19418

Closed

xadupre mentioned this pull request Feb 7, 2024

onnxruntime 1.17.0: transformers benchmarking failing for int8 quantized inference. #19409

Open

adrianlizarraga mentioned this pull request Feb 28, 2024

[QNN Quant] Ensure 16bit tensor quant overrides set MS domain #19684

Merged

xadupre deleted the qdqmm branch November 7, 2024 10:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization tool: support float 8 with MatMul, support float 16 weights #18043

Quantization tool: support float 8 with MatMul, support float 16 weights #18043

xadupre commented Oct 20, 2023 •

edited

Loading

yufenglee left a comment

Quantization tool: support float 8 with MatMul, support float 16 weights #18043

Quantization tool: support float 8 with MatMul, support float 16 weights #18043

Conversation

xadupre commented Oct 20, 2023 • edited Loading

Description

Motivation and Context

yufenglee left a comment

Choose a reason for hiding this comment

xadupre commented Oct 20, 2023 •

edited

Loading