intel · chensuyue · May 29, 2023 · May 17, 2023 · May 17, 2023 · May 19, 2023
diff --git a/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt b/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt
@@ -2613,3 +2613,5 @@ efb
 netflix
 DeBERTa
 unilm
+aten
+hardswish
diff --git a/docs/source/export.md b/docs/source/export.md
@@ -9,32 +9,70 @@ Export
 
 4. [Appendix](#appendix)
 
-# Introduction
-Open Neural Network Exchange (ONNX) is an open standard format for representing machine learning models. Exporting FP32 PyTorch/Tensorflow models has become popular and easy to use. However, for Intel Neural Compressor, we hope to export the INT8 model into the ONNX format to achieve higher applicability in multiple frameworks.
-
-Here we briefly introduce our export API for PyTorch FP32/INT8 models. First, the INT8 ONNX model is not directly exported from the INT8 PyTorch model, but quantized after obtaining the FP32 ONNX model using the mature torch.onnx.export API. To ensure the majority of the quantization process of ONNX is consistent with PyTorch, we reuse three key pieces of information from the Neural Compressor model to perform ONNX quantization.
-
- - Quantized operations: Only operations quantized in PyTorch will be quantized in the quantization process of ONNX.
- - Scale info: Scale information is collected from the quantization process of PyTorch.
- - Weights of quantization aware training(QAT): For quantization aware training, the updated weights are passed to the ONNX model.
+## Introduction
+Open Neural Network Exchange (ONNX) is an open standard format for representing machine learning models. Exporting FP32 PyTorch/Tensorflow models has become popular and easy to use. For Intel Neural Compressor, we hope to export the INT8 model into the ONNX format to achieve higher applicability in multiple frameworks.
 
+Here is the workflow of our export API for PyTorch/Tensorflow FP32/INT8 model.
 <a target="_blank" href="./imgs/export.png" text-align:center>
     <center> 
-        <img src="./imgs/export.png" alt="Architecture" width=650 height=200> 
+        <img src="./imgs/export.png" alt="Architecture" width=700 height=200> 
     </center>
 </a>
 
-# Supported Framework Model Matrix
+## Supported Framework Model Matrix
+
+<table>
+<thead>
+  <tr>
+    <th>Framework</th>
+    <th>model type</th>
+    <th>exported ONNX model type</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td rowspan="4">PyTorch</td>
+    <td>FP32</td>
+    <td>FP32</td>
+  </tr>
+  <tr>
+    <td>Post-Training Static Quantized INT8</td>
+    <td>QLinear/QDQ INT8</td>
+  </tr>
+  <tr>
+    <td>Post-Training Dynamic Quantized INT8</td>
+    <td>/</td>
+  </tr>
+  <tr>
+    <td>Quantization-aware Training INT8</td>
+    <td>QLinear/QDQ INT8</td>
+  </tr>
+  <tr>
+    <td rowspan="3">TensorFlow</td>
+    <td>FP32</td>
+    <td>FP32</td>
+  </tr>
+  <tr>
+    <td>Post-Training Static Quantized INT8</td>
+    <td>QDQ INT8</td>
+  </tr>
+  <tr>
+    <td>Quantization-aware Training INT8</td>
+    <td>QDQ INT8</td>
+  </tr>
+</tbody>
+</table>
+
+> **Note**: Follow this step to export a post training dynamic quantized ONNX model from PyTorch model: \
+        1. export FP32 PyTorch model to FP32 ONNX model.  \
+        2. use FP32 ONNX model as the input model for post training dynamic quantization.
 
-| Export | PyTorch | TensorFlow |
-| :---: | :---: |:----------:|
-| FP32 Model -> FP32 ONNX Model | &#10004; |  &#10004;  |
-| INT8 Model -> INT8 QDQ ONNX Model | &#10004; |  &#10004;  |
-| INT8 Model -> INT8 QLinear ONNX Model | &#10004; | :x: |
+## Examples
 
-# Examples
+### PyTorch Model
+
+#### FP32 Model Export
 
-## FP32 Model Export
 ```python
 from neural_compressor.experimental.common import Model
 from neural_compressor.config import Torch2ONNXConfig
@@ -50,15 +88,15 @@ fp32_onnx_config = Torch2ONNXConfig(
 inc_model.export('fp32-model.onnx', fp32_onnx_config)
 ```
 
-## INT8 Model Export
+#### INT8 Model Export
 
 ```python
 # q_model is a Neural Compressor model after performing quantization.
 from neural_compressor.config import Torch2ONNXConfig
 int8_onnx_config = Torch2ONNXConfig(
     dtype="int8",
     opset_version=14,
-    quant_format="QDQ", # or QLinear
+    quant_format="QLinear", # or QDQ
     example_inputs=torch.randn(1, 3, 224, 224),
     input_names=['input'],
     output_names=['output'],
@@ -71,122 +109,57 @@ q_model.export('int8-model.onnx', int8_onnx_config)
  - [Image recognition](/examples/pytorch/image_recognition/torchvision_models/export/fx/)
  - [Text classification](/examples/pytorch/nlp/huggingface_models/text-classification/export/fx/)
 
-# Appendix
-
-Since there is a known quantization gap between PyTorch 'nn.Linear' module and ONNX 'MatMul + Add' subgraph, we provide three recipes.
-
-For different recipes and ONNX INT8 model formats, 'nn.quantized.Linear' will be exported to the following subgraph:
-
-
-<table class="docutils">
- <thead>
-   <tr>
-     <th align="center">Recipe</th>
-     <th align="center">QDQ</th>
-     <th align="center">QLinear</th>
-   </tr>
- </thead>
- <tbody>
-   <tr>
-     <td align="center">QDQ_OP_FP32_BIAS</td>
-     <td>
-<pre>
-     QuantizeLinear
-           |
-    DequantizeLinear
-           |             
-         MatMul
-           |
-          Add
-</pre>
-     </td>
-     <td>
-<pre>
-   QuantizeLinear
-         |
-MatMulIntegerToFloat
-         |
-        Add 
-</pre>
-     </td>
-   </tr>
-   <tr>
-     <td align="center">QDQ_OP_INT32_BIAS</td>
-     <td>
-<pre>
-     QuantizeLinear
-           |
-     MatMulInteger
-           |
-          Add
-           |
-          Cast
-           |
-          Mul
-</pre>
-     </td>
-     <td>
-<pre>
-   QuantizeLinear
-         |
-    MatMulInteger
-         |
-        Add
-         |
-        Cast
-         |
-        Mul
-</pre>
-     </td>
-   </tr>
-   <tr>
-     <td align="center">QDQ_OP_FP32_BIAS_QDQ</td>
-     <td>
-<pre>
-     QuantizeLinear
-           |
-    DequantizeLinear   
-           |
-         MatMul
-           |
-          Add
-           |
-     QuantizeLinear
-           |
-    DequantizeLinear
-</pre>
-     </td>
-     <td>
-<pre>
-   QuantizeLinear
-         |
-MatMulIntegerToFloat
-         |
-        Add
-         |
-   QuantizeLinear
-         |
-  DequantizeLinear
-</pre>
-     </td>
-   </tr>
- </tbody>
-</table>
+### Tensorflow Model
+
+#### FP32 Model Export
+
+```python
+from neural_compressor.experimental.common import Model
+from neural_compressor.config import TF2ONNXConfig
+inc_model = Model(model)
+config = TF2ONNXConfig(dtype='fp32')
+inc_model.export('fp32-model.onnx', config)
+```
+
+### INT8 Model Export
 
-The default recipe is `QDQ_OP_FP32_BIAS`. If the accuracy of the exported ONNX INT8 model cannot meet your criterion, we recommend you try recipe `QDQ_OP_INT32_BIAS` and `QDQ_OP_FP32_BIAS_QDQ` as follows:
 ```python
 # q_model is a Neural Compressor model after performing quantization.
-from neural_compressor.config import Torch2ONNXConfig
-int8_onnx_config = Torch2ONNXConfig(
-    dtype="int8",
-    opset_version=14,
-    quant_format="QDQ", # or QLinear
-    example_inputs=torch.randn(1, 3, 224, 224),
-    input_names=['input'],
-    output_names=['output'],
-    dynamic_axes={"input": {0: "batch_size"},
-                    "output": {0: "batch_size"}},
-    recipe='QDQ_OP_INT32_BIAS', # or QDQ_OP_FP32_BIAS_QDQ
-)
-q_model.export('int8-model.onnx', int8_onnx_config)
+from neural_compressor.config import TF2ONNXConfig
+config = TF2ONNXConfig(dtype='int8')
+q_model.export('int8-model.onnx', config)
 ```
+
+> **Note**: Some export examples of computer vision task exist in examples. Users can leverage them to verify the accuracy and performance of the exported ONNX model.
+ - [resnet50_v1_5](/examples/tensorflow/image_recognition/tensorflow_models/resnet50_v1_5/export)
+ - [resnet50_v1](/examples/tensorflow/image_recognition/tensorflow_models/resnet50_v1/export)
+ - [vgg16](/examples/tensorflow/image_recognition/tensorflow_models/vgg16/export)
+ - [ssd_mobilenet_v1](/examples/tensorflow/object_detection/tensorflow_models/ssd_mobilenet_v1/export)
+ - [mobilenet_v2](/examples/tensorflow/image_recognition/tensorflow_models/mobilenet_v2/export)
+ - [faster_rcnn_resnet50](examples/tensorflow/object_detection/tensorflow_models/faster_rcnn_resnet50/export)
+
+## Appendix
+
+### Supported quantized ops
+
+This table lists the TorchScript operators that are supported by ONNX export with torch v2.0. Refer to this [link](https://pytorch.org/docs/stable/onnx_supported_aten_ops.html) for more supported/unsupported ops.
+
+| Operator                     | opset_version(s) |
+| ---------------------------- | ---------------- |
+| ``quantized::add``           | Since opset 10   |
+| ``quantized::add_relu``      | Since opset 10   |
+| ``quantized::cat``           | Since opset 10   |
+| ``quantized::conv1d_relu``   | Since opset 10   |
+| ``quantized::conv2d``        | Since opset 10   |
+| ``quantized::conv2d_relu``   | Since opset 10   |
+| ``quantized::group_norm``    | Since opset 10   |
+| ``quantized::hardswish``     | Since opset 10   |
+| ``quantized::instance_norm`` | Since opset 10   |
+| ``quantized::layer_norm``    | Since opset 10   |
+| ``quantized::leaky_relu``    | Since opset 10   |
+| ``quantized::linear``        | Since opset 10   |
+| ``quantized::mul``           | Since opset 10   |
+| ``quantized::sigmoid``       | Since opset 10   |
+
+> **Note**: The export function may fail due to unsupported operations. Please fallback unsupported quantized ops by setting 'op_type_dict' or 'op_name_dict' in 'QuantizationAwareTrainingConfig' or 'PostTrainingQuantConfig' config. Fallback examples please refer to [Text classification](/examples/pytorch/nlp/huggingface_models/text-classification/export/fx/)
+
diff --git a/docs/source/imgs/export.png b/docs/source/imgs/export.png
diff --git a/examples/pytorch/image_recognition/torchvision_models/export/fx/main.py b/examples/pytorch/image_recognition/torchvision_models/export/fx/main.py
@@ -3,9 +3,6 @@
 import random
 import shutil
 import time
-import warnings
-import sys
-
 import torch
 import torch.nn as nn
 import torch.nn.parallel
@@ -202,7 +199,7 @@ def eval_func(model):
         int8_onnx_config = Torch2ONNXConfig(
             dtype="int8",
             opset_version=14,
-            quant_format="QDQ",
+            quant_format=args.quant_format,
             example_inputs=torch.randn(1, 3, 224, 224),
             input_names=['input'],
             output_names=['output'],

diff --git a/examples/pytorch/nlp/huggingface_models/text-classification/export/fx/run_glue.py b/examples/pytorch/nlp/huggingface_models/text-classification/export/fx/run_glue.py
@@ -535,6 +535,7 @@ def eval_func(model):
     if model_args.export_dtype == 'int8':
         from neural_compressor.quantization import fit
         from neural_compressor.config import PostTrainingQuantConfig, TuningCriterion
+        from neural_compressor.utils.constant import FP32
         tuning_criterion = TuningCriterion(
             strategy="mse_v2",
             strategy_kwargs={"confidence_batches": 1},
@@ -544,6 +545,7 @@ def eval_func(model):
             approach="static", 
             quant_level=1,
             tuning_criterion=tuning_criterion,
+            op_type_dict={"Embedding":FP32},
             calibration_sampling_size=[300],
         )
         q_model = fit(model, conf=conf, calib_dataloader=eval_dataloader, eval_func=eval_func)

diff --git a/neural_compressor/adaptor/pytorch.py b/neural_compressor/adaptor/pytorch.py
@@ -829,6 +829,12 @@ def __init__(self, framework_specific_info):
                 if not self.benchmark:
                     assert False, "Unsupport approach: {}".format(self.approach)
 
+        # TODO: will be removed once 'op_type_dict' and 'op_name_dicts' 
+        # for quant_aware_training can be handled in strategy
+        if self.approach == 'quant_aware_training':
+            self.qat_optype_wise = framework_specific_info.get('qat_optype_wise', None)
+            self.qat_op_wise = framework_specific_info.get('qat_op_wise', None)
+
         self.fp32_results = []
         self.fp32_preds_as_label = False
 
@@ -3608,6 +3614,7 @@ def _pre_hook_for_qat(self, dataloader=None):
         quantizable_ops = []
         tmp_model = self.fuse_fx_model(self.model, is_qat=True)
         self._get_quantizable_ops_recursively(tmp_model, '', quantizable_ops)
+        self._remove_fallback_ops_for_qat(quantizable_ops)
         bf16_ops = []
         if self.version.release >= Version("1.11.0").release and self.use_bf16 and \
             (CpuInfo().bf16 or os.getenv('FORCE_BF16') == '1'): # pragma: no cover
@@ -3719,6 +3726,29 @@ def _post_hook_for_qat(self):
         self._dump_model_op_stats(self.model._model, self.model.q_config, self.approach)
         torch_utils.util.get_embedding_contiguous(self.model._model)
 
+    def _get_fallback_ops_for_qat(self):
+        # get fallback ops for quant aware training approach
+        fallback_ops = {'op_wise': [], 'optype_wise': []}
+        if self.qat_optype_wise is not None:
+            for optype, optype_config in self.qat_optype_wise.items():
+                if 'weight' in optype_config and optype_config['weight']['dtype'] == ['fp32']:
+                    fallback_ops['optype_wise'].append(optype)
+        if self.qat_op_wise is not None:
+            for op, op_config in self.qat_op_wise.items():
+                if 'weight' in op_config and op_config['weight']['dtype'] == ['fp32']:
+                    fallback_ops['op_wise'].append(op)
+        return fallback_ops
+
+    def _remove_fallback_ops_for_qat(self, quantizable_ops):
+        # remove fallback ops from quantizable_ops for quant aware training approach
+        fallback_ops = self._get_fallback_ops_for_qat()
+        remove_ops = []
+        for (op_name, op_type) in quantizable_ops:
+            if op_name in fallback_ops['op_wise'] or op_type in fallback_ops['optype_wise']:
+                remove_ops.append((op_name, op_type))
+        for (op_name, op_type) in remove_ops:
+            quantizable_ops.remove((op_name, op_type))
+
     def train(self, model, dataloader, optimizer_tuple, criterion_tuple, hooks, **kwargs):
         """Execute the train process on the specified model.
 

diff --git a/neural_compressor/config.py b/neural_compressor/config.py
@@ -1936,7 +1936,6 @@ def __init__(
        input_names=None,
        output_names=None,
        dynamic_axes=None,
-       recipe='QDQ_OP_FP32_BIAS',
        **kwargs,
     ):
         """Init a Torch2ONNXConfig object."""
@@ -1949,7 +1948,6 @@ def __init__(
             output_names=output_names,
             dynamic_axes=dynamic_axes,
         )
-        self.recipe = recipe
         self.kwargs = kwargs
-Original file line number
+Diff line change
@@ Expand Up / @@ -2613,3 +2613,5 @@ efb @@
     netflix
     DeBERTa
     unilm
+    aten
+    hardswish