Skip to content

Commit

Permalink
add qdq info
Browse files Browse the repository at this point in the history
  • Loading branch information
yufenglee committed Jun 17, 2021
1 parent 24ff8d0 commit a34427f
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 1 deletion.
13 changes: 12 additions & 1 deletion docs/how-to/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,21 @@ The main difference between dynamic quantization and static quantization is how

In general, it is recommended to use dynamic quantization for RNN and transformer-based models, and static quantization for CNN models.

If both post-training quantization can not meet your accuracy goal, you can try quantization-aware training to retrain the model. ONNX Runtime does not provide retraining at this time, but you can retrain your model with the original framework and reconvert back to ONNX.
If both post-training quantization can not meet your accuracy goal, you can try quantization-aware training (QAT) to retrain the model. ONNX Runtime does not provide retraining at this time, but you can retrain your model with the original framework and reconvert back to ONNX.

## ONNX quantization representation format
There are 2 ways to represent quantized ONNX models:
- Operator Oriented. All the quantized operators have their own ONNX definitions, like QLinearConv, MatMulInteger and etc.
- Tensor Oriented, aka Quantize and DeQuantize (QDQ). This format uses DQ(Q(tensor)) to simulate the quantize and dequantize process, and QuantizeLinear and DeQuantizeLinear operators also carry the quantization parameters. Models generated like below are in QDQ format:
- Models quantized by quantize_static API below with quant_format=QuantFormat.QDQ.
- QAT models converted from Tensorflow or exported from PyTorch.
- Quantized models converted from tflite and other framework.

For the last 2 cases, you don't need to quantize the model with quantization tool. OnnxRuntime CPU EP can run them directly as quantized model. TensorRT and NNAPI EP are adding support.

Picure below shows the equivalent representation with QDQ format and Operator oriented format for quantized Conv. This [E2E](https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/quantization/E2E_example_model/image_classification/cpu/run.py) example demonstrates QDQ and Operator Oriented format.

![Changes to nodes from basic and extended optimizations](../../images/QDQ_Format.png)

## List of Supported Quantized Ops
Please refer to [registry](https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/quantization/registry.py) for the list of supported Ops.
Expand Down
Binary file added images/QDQ_Format.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a34427f

Please sign in to comment.