Update after review #3

natke · Dec 7, 2021 · aad47ac · aad47ac
1 parent 01314a2
commit aad47ac
Show file tree

Hide file tree

Showing 11 changed files with 157 additions and 30 deletions.
diff --git a/docs/build/custom.md b/docs/build/custom.md
@@ -23,7 +23,7 @@ To build a custom ONNX Runtime package, the [build from source](./index.md) inst
 * TOC placeholder
 {:toc}
 
-## Reduce operator set
+## Reduce operator kernels
 
 To reduce the compiled binary size of ONNX Runtime, the operator kernels included in the build can be reduced to just those required by your model/s.
 
@@ -40,15 +40,15 @@ The operators that are included are specified at build time, in a [configuration
 
 **`--enable_reduced_operator_type_support`**
 
-* Enables [operator type reduction](../reference/ort-format-model-conversion.md#enable-type-reduction). Requires ONNX Runtime version 1.7 or higher and for type reduction to have been enabled during model conversion
+* Enables [operator type reduction](../reference/ort-model-format.md#enable-type-reduction). Requires ONNX Runtime version 1.7 or higher and for type reduction to have been enabled during model conversion
 
 If the configuration file is created using ORT format models, the input/output types that individual operators require can be tracked if `--enable_type_reduction` is specified. This can be used to further reduce the build size if `--enable_reduced_operator_type_support` is specified when building ORT.
 
 ONNX format models are not guaranteed to include the required per-node type information, so cannot be used with this option.
 
 ## Minimal build
 
-ONNX Runtime can be built to further minimize the binary size, by only including support for loading and executing models in [ORT format](../reference/ort-format-model-conversion.md), and not ONNX format.
+ONNX Runtime can be built to further minimize the binary size, by only including support for loading and executing models in [ORT format](../reference/ort-model-format.md), and not ONNX format.
 
 **`--minimal_build`**
 
@@ -63,7 +63,6 @@ A minimal build has the following limitations:
     - Execution providers that compile nodes are optionally supported
       - currently this is limited to the NNAPI and CoreML Execution Providers
 
-We do not currently offer backwards compatibility guarantees for ORT format models, as we will be expanding the capabilities in the short term and may need to update the internal format in an incompatible manner to accommodate these changes. You may need to regenerate the ORT format models to use with a future version of ONNX Runtime. Once the feature set stabilizes we will provide backwards compatibility guarantees.
 
 ## Other customizations
 

diff --git a/docs/build/web.md b/docs/build/web.md
@@ -77,7 +77,7 @@ To get all build artifacts of ONNX Runtime WebAssembly, it needs 4 times of buil
 
 ### Minimal Build Support
 
-ONNX Runtime WebAssembly can be built with flag `--minimal_build`. This will generate smaller artifacts and also have a less runtime memory usage. An ORT format model is required. A detailed instruction will come soon. See also [ORT format Conversion](../reference/ort-format-model-conversion.md).
+ONNX Runtime WebAssembly can be built with flag `--minimal_build`. This will generate smaller artifacts and also have a less runtime memory usage. An ORT format model is required. A detailed instruction will come soon. See also [ORT format Conversion](../reference/ort-model-format.md).
 
 ### FAQ
 

diff --git a/docs/performance/mobile-performance-tuning.md b/docs/performance/mobile-performance-tuning.md
@@ -48,7 +48,7 @@ _Layout_ optimizations may be hardware specific and involve internal conversions
 
 ### Outcome of optimizations when creating an optimized ORT format model
 
-Below is an example of the changes that occur in _basic_ and _extended_ optimizations when applied to the MNIST model with only the CPU EP enabled. The optimization level is specified when [creating the ORT format model](../reference/ort-format-model-conversion.md#optimization-level).
+Below is an example of the changes that occur in _basic_ and _extended_ optimizations when applied to the MNIST model with only the CPU EP enabled. The optimization level is specified when [creating the ORT format model](../reference/ort-model-format.md#optimization-level).
 
   - At the _basic_ level we combine the Conv and Add nodes (the addition is done via the 'B' input to Conv), we combine the MatMul and Add into a single Gemm node (the addition is done via the 'C' input to Gemm), and constant fold to remove one of the Reshape nodes.
     - `python <ORT repository root>/tools/python/convert_onnx_models_to_ort.py --optimization_level basic /dir_with_mnist_onnx_model`
@@ -121,7 +121,7 @@ To create an NNAPI-aware ORT format model please follow these steps.
             pip install -U build\Windows\RelWithDebIfo\RelWithDebIfo\dist\onnxruntime_noopenmp-1.7.0-cp37-cp37m-win_amd64.whl
         ```
 
-3. Create an NNAPI-aware ORT format model by running `convert_onnx_models_to_ort.py` as per the [standard instructions](../reference/ort-format-model-conversion.md), with NNAPI enabled (`--use_nnapi`), and the optimization level set to _extended_ or _all_ (e.g. `--optimization_level extended`). This will allow higher level optimizations to run on any nodes that NNAPI can not handle.
+3. Create an NNAPI-aware ORT format model by running `convert_onnx_models_to_ort.py` as per the [standard instructions](../reference/ort-model-format.md), with NNAPI enabled (`--use_nnapi`), and the optimization level set to _extended_ or _all_ (e.g. `--optimization_level extended`). This will allow higher level optimizations to run on any nodes that NNAPI can not handle.
       ```
       python <ORT repository root>/tools/python/convert_onnx_models_to_ort.py --use_nnapi --optimization_level extended /models
       ```

diff --git a/docs/reference/build-web-app.md b/docs/reference/build-web-app.md
@@ -47,7 +47,7 @@ You need to understand your web app's scenario and get an ONNX model that is app
 
 ONNX models can be obtained from the [ONNX model zoo](https://github.com/onnx/models), converted from PyTorch or TensorFlow, and many other places.
 
-You can [convert the ONNX format model to ORT format model](./ort-format-model-conversion.md), for optimized binary size, faster initialization and peak memory usage.
+You can [convert the ONNX format model to ORT format model](./ort-model-format.md), for optimized binary size, faster initialization and peak memory usage.
 
 You can [perform a model-specific custom build](../build/custom.md) to further optimize binary size.
 

diff --git a/docs/reference/operators/mobile_package_op_type_support_1.10.md b/docs/reference/operators/mobile_package_op_type_support_1.10.md
@@ -0,0 +1,139 @@
+---
+title: ORT 1.10 Mobile Package Operators
+parent: Operators
+grand_parent: Reference
+---
+
+
+# ONNX Runtime 1.10 Mobile Pre-Built Package Operator and Type Support
+
+## Supported operators and types
+
+The supported operators and types are based on what is required to support float32 and quantized versions of popular models. The full list of input models used to determine this list is available [here](https://github.com/microsoft/onnxruntime/blob/master/tools/ci_build/github/android/mobile_package.required_operators.readme.txt)
+
+## Supported data input types
+
+  - float
+  - int8_t
+  - uint8_t
+
+NOTE: Operators used to manipulate dimensions and indices will support int32 and int64.
+
+## Supported Operators
+
+|Operator|Opsets|
+|--------|------|
+|**ai.onnx**||
+|ai.onnx:Abs|12, 13, 14, 15|
+|ai.onnx:Add|12, 13, 14, 15|
+|ai.onnx:And|12, 13, 14, 15|
+|ai.onnx:ArgMax|12, 13, 14, 15|
+|ai.onnx:ArgMin|12, 13, 14, 15|
+|ai.onnx:AveragePool|12, 13, 14, 15|
+|ai.onnx:Cast|12, 13, 14, 15|
+|ai.onnx:Ceil|12, 13, 14, 15|
+|ai.onnx:Clip|12, 13, 14, 15|
+|ai.onnx:Concat|12, 13, 14, 15|
+|ai.onnx:ConstantOfShape|12, 13, 14, 15|
+|ai.onnx:Conv|12, 13, 14, 15|
+|ai.onnx:ConvTranspose|12, 13, 14, 15|
+|ai.onnx:Cos|12, 13, 14, 15|
+|ai.onnx:CumSum|12, 13, 14, 15|
+|ai.onnx:DepthToSpace|12, 13, 14, 15|
+|ai.onnx:DequantizeLinear|12, 13, 14, 15|
+|ai.onnx:Div|12, 13, 14, 15|
+|ai.onnx:DynamicQuantizeLinear|12, 13, 14, 15|
+|ai.onnx:Elu|12, 13, 14, 15|
+|ai.onnx:Equal|12, 13, 14, 15|
+|ai.onnx:Erf|12, 13, 14, 15|
+|ai.onnx:Exp|12, 13, 14, 15|
+|ai.onnx:Expand|12, 13, 14, 15|
+|ai.onnx:Flatten|12, 13, 14, 15|
+|ai.onnx:Floor|12, 13, 14, 15|
+|ai.onnx:Gather|12, 13, 14, 15|
+|ai.onnx:GatherND|12, 13, 14, 15|
+|ai.onnx:Gemm|12, 13, 14, 15|
+|ai.onnx:GlobalAveragePool|12, 13, 14, 15|
+|ai.onnx:Greater|12, 13, 14, 15|
+|ai.onnx:GreaterOrEqual|12, 13, 14, 15|
+|ai.onnx:HardSigmoid|12, 13, 14, 15|
+|ai.onnx:Identity|12, 13, 14, 15|
+|ai.onnx:If|12, 13, 14, 15|
+|ai.onnx:InstanceNormalization|12, 13, 14, 15|
+|ai.onnx:LRN|12, 13, 14, 15|
+|ai.onnx:LayerNormalization|1|
+|ai.onnx:LeakyRelu|12, 13, 14, 15|
+|ai.onnx:Less|12, 13, 14, 15|
+|ai.onnx:LessOrEqual|12, 13, 14, 15|
+|ai.onnx:Log|12, 13, 14, 15|
+|ai.onnx:LogSoftmax|12, 13, 14, 15|
+|ai.onnx:Loop|12, 13, 14, 15|
+|ai.onnx:MatMul|12, 13, 14, 15|
+|ai.onnx:MatMulInteger|12, 13, 14, 15|
+|ai.onnx:Max|12, 13, 14, 15|
+|ai.onnx:MaxPool|12, 13, 14, 15|
+|ai.onnx:Mean|12, 13, 14, 15|
+|ai.onnx:Min|12, 13, 14, 15|
+|ai.onnx:Mul|12, 13, 14, 15|
+|ai.onnx:Neg|12, 13, 14, 15|
+|ai.onnx:NonMaxSuppression|12, 13, 14, 15|
+|ai.onnx:NonZero|12, 13, 14, 15|
+|ai.onnx:Not|12, 13, 14, 15|
+|ai.onnx:Or|12, 13, 14, 15|
+|ai.onnx:PRelu|12, 13, 14, 15|
+|ai.onnx:Pad|12, 13, 14, 15|
+|ai.onnx:Pow|12, 13, 14, 15|
+|ai.onnx:QLinearConv|12, 13, 14, 15|
+|ai.onnx:QLinearMatMul|12, 13, 14, 15|
+|ai.onnx:QuantizeLinear|12, 13, 14, 15|
+|ai.onnx:Range|12, 13, 14, 15|
+|ai.onnx:Reciprocal|12, 13, 14, 15|
+|ai.onnx:ReduceMax|12, 13, 14, 15|
+|ai.onnx:ReduceMean|12, 13, 14, 15|
+|ai.onnx:ReduceMin|12, 13, 14, 15|
+|ai.onnx:ReduceProd|12, 13, 14, 15|
+|ai.onnx:ReduceSum|12, 13, 14, 15|
+|ai.onnx:Relu|12, 13, 14, 15|
+|ai.onnx:Reshape|12, 13, 14, 15|
+|ai.onnx:Resize|12, 13, 14, 15|
+|ai.onnx:ReverseSequence|12, 13, 14, 15|
+|ai.onnx:Round|12, 13, 14, 15|
+|ai.onnx:Scan|12, 13, 14, 15|
+|ai.onnx:ScatterND|12, 13, 14, 15|
+|ai.onnx:Shape|12, 13, 14, 15|
+|ai.onnx:Sigmoid|12, 13, 14, 15|
+|ai.onnx:Sin|12, 13, 14, 15|
+|ai.onnx:Size|12, 13, 14, 15|
+|ai.onnx:Slice|12, 13, 14, 15|
+|ai.onnx:Softmax|12, 13, 14, 15|
+|ai.onnx:SpaceToDepth|12, 13, 14, 15|
+|ai.onnx:Split|12, 13, 14, 15|
+|ai.onnx:Sqrt|12, 13, 14, 15|
+|ai.onnx:Squeeze|12, 13, 14, 15|
+|ai.onnx:Sub|12, 13, 14, 15|
+|ai.onnx:Sum|12, 13, 14, 15|
+|ai.onnx:Tanh|12, 13, 14, 15|
+|ai.onnx:ThresholdedRelu|12, 13, 14, 15|
+|ai.onnx:Tile|12, 13, 14, 15|
+|ai.onnx:TopK|12, 13, 14, 15|
+|ai.onnx:Transpose|12, 13, 14, 15|
+|ai.onnx:Unique|12, 13, 14, 15|
+|ai.onnx:Unsqueeze|12, 13, 14, 15|
+|ai.onnx:Where|12, 13, 14, 15|
+|||
+|**com.microsoft**||
+|com.microsoft:DynamicQuantizeMatMul|1|
+|com.microsoft:FusedConv|1|
+|com.microsoft:FusedGemm|1|
+|com.microsoft:FusedMatMul|1|
+|com.microsoft:Gelu|1|
+|com.microsoft:MatMulIntegerToFloat|1|
+|com.microsoft:NhwcMaxPool|1|
+|com.microsoft:QLinearAdd|1|
+|com.microsoft:QLinearAveragePool|1|
+|com.microsoft:QLinearConv|1|
+|com.microsoft:QLinearGlobalAveragePool|1|
+|com.microsoft:QLinearLeakyRelu|1|
+|com.microsoft:QLinearMul|1|
+|com.microsoft:QLinearSigmoid|1|
+|||
diff --git a/docs/reference/operators/mobile_package_op_type_support_1.9.md b/docs/reference/operators/mobile_package_op_type_support_1.9.md
@@ -5,7 +5,7 @@ grand_parent: Reference
 redirect_from: /docs/reference/mobile/prebuilt-package/mobile_package_op_type_support_1.9
 ---
 
-# ONNX Runtime Mobile Pre-Built Package Operator and Type Support
+# ONNX Runtime Mobile 1.9 Pre-Built Package Operator and Type Support
 
 ## Supported operators and types
 

diff --git a/.../reference/ort-format-model-conversion.md → docs/reference/ort-model-format.md b/.../reference/ort-format-model-conversion.md → docs/reference/ort-model-format.md
@@ -207,6 +207,7 @@ Ort::Session session(env, <path to model>, session_options);
 ```
 
 Java API
+
 ```java
 SessionOptions session_options = new SessionOptions();
 session_options.addConfigEntry("session.load_model_format", "ORT");
@@ -216,6 +217,7 @@ OrtSession session = env.createSession(<path to model>, session_options);
 ```
 
 JavaScript API
+
 ```js
 import * as ort from "onnxruntime-web";
 
@@ -226,7 +228,7 @@ const session = await ort.InferenceSession.create("<path to model>");
 
 If a session is created using an input byte array containing the ORT format model data, by default we will copy the model bytes at the time of session creation to ensure the model bytes buffer is valid.
 
-You may also enable the option to use the model bytes directly by setting the Session Options config `session.use_ort_model_bytes_directly` to `1`, this may reduce the peak memory usage of ONNX Runtime Mobile, you will need to guarantee that the model bytes are valid throughout the lifespan of the ORT session using the model bytes. For ONNX Runtime Web, this option is set by default.
+You may also enable the option to use the model bytes directly by setting the Session Options config `session.use_ort_model_bytes_directly` to `1`.  This may reduce the peak memory usage of ONNX Runtime Mobile, but you will need to guarantee that the model bytes are valid throughout the lifespan of the ORT session. For ONNX Runtime Web, this option is set by default.
 
 C++ API
 ```c++

diff --git a/docs/reference/reduced-operator-config-file.md b/docs/reference/reduced-operator-config-file.md
@@ -11,7 +11,7 @@ nav_order: 3
 
 The reduced operator config file is an input to the ONNX Runtime build-from-source script. It specifies which operators are included in the runtime. A reduced set of operators in ONNX Runtime permits a smaller build binary size. A smaller runtime is used in constrained environments, such as mobile and web deployments.
 
-This article shows you how to generate the reduced operator config file using the `create_reduced_build_config.py` script. You can also generate the reduced operator config file by [converting ONNX models to ORT format](./ort-format-model-conversion.md).
+This article shows you how to generate the reduced operator config file using the `create_reduced_build_config.py` script. You can also generate the reduced operator config file by [converting ONNX models to ORT format](./ort-model-format.md).
 
 ## Contents
 {: .no_toc}

diff --git a/docs/tutorials/mobile/deploy-ios.md b/docs/tutorials/mobile/deploy-ios.md
@@ -65,7 +65,7 @@ This example is heavily based on [Google Tensorflow lite - Object Detection Exam
 
    > Conversion of this model is a two part process. The original model is in tflite format. This is firstly converted to ONNX format using the [tf2onnx converter](https://github.com/onnx/tensorflow-onnx).
    >
-   > The model is then converted into ORT format using the [onnx to ort converter](../../reference/ort-format-model-conversion.md).
+   > The model is then converted into ORT format using the [onnx to ort converter](../../reference/ort-model-format.md).
    >
    > As well as generating the model in ORT format, the conversion script also outputs an [operator config file](../../reference/reduced-operator-config-file.md)
 

diff --git a/docs/tutorials/mobile/index.md b/docs/tutorials/mobile/index.md
@@ -32,31 +32,18 @@ ONNX Runtime gives you a variety of options to add machine learning to your mobi
 
    ONNX models can be obtained from the [ONNX model zoo](https://github.com/onnx/models), converted from PyTorch or TensorFlow, and many other places.
 
-   Once you have sourced or converted the model into ONNX format, there is a further step required to optimize the model for mobile deployments. [Convert the model to ORT format](../../reference/ort-format-model-conversion.md) for optimized model binary size, faster initialization and peak memory usage.
+   Once you have sourced or converted the model into ONNX format, there is a further step required to optimize the model for mobile deployments. [Convert the model to ORT format](../../reference/ort-model-format.md) for optimized model binary size, faster initialization and peak memory usage.
 
 3. How do I bootstrap my app development?
 
    If you are starting from scratch, bootstrap your mobile application according in your mobile framework XCode or Android Development Kit. TODO check this.
 
    a. Add the ONNX Runtime dependency
-   b. Consume the onnxruntime-web API in your application
+   b. Consume the onnxruntime API in your application
    c. Add pre and post processing appropriate to your application and model
 
 4. How do I optimize my application?
 
-   The libraries in step 1 can be optimized to meet memory and processing demands.
+   The execution environment on mobile devices has fixed memory and disk storage. Therefore, it is essential that any AI execution library is optimized to consume minimum resources in terms of disk footprint, memory and network usage (both model size and binary size).
 
-   The size of the ONNX Runtime itself can reduced by [building a custom package](../../build/custom.md) that only includes support for your specific model/s.
-
-## Helpful resources
-
-
-TODO
-
-Can this be included anywhere:
-
-The execution environment on mobile devices has fixed memory and disk storage. Therefore, it is essential that any AI execution library is optimized to consume minimum resources in terms of disk footprint, memory and network usage (both model size and binary size).
-
-ONNX Runtime Mobile uses the ORT formatted model which enables us to create a [custom ORT build](../build/custom.md) that minimizes the binary size and reduces memory usage for client side inference. The ORT formatted model file is generated from the regular ONNX model using the `onnxruntime` python package. The custom build does this primarily by only including specified operators and types in the build, as well as trimming down dependencies per custom needs.
-
-An ONNX model must be converted to an ORT format model to be used with minimal build in ONNX Runtime Mobile.
+   ONNX Runtime Mobile uses the ORT model format which enables us to create a [custom ORT build](../../build/custom.md) that minimizes the binary size and reduces memory usage for client side inference. The ORT model format file is generated from the regular ONNX model using the `onnxruntime` python package. The custom build does this primarily by only including specified operators and types in the build, as well as trimming down dependencies per custom needs.
diff --git a/docs/tutorials/web/index.md b/docs/tutorials/web/index.md
@@ -72,6 +72,6 @@ For more detail on the steps below, see the [build a web application](../../refe
 
    The libraries and models mentioned in the previous steps can be optimized to meet memory and processing demands.
 
-   a. Models in ONNX format can be [converted to ORT format](../../reference/ort-format-model-conversion.md), for optimized model binary size, faster initialization and peak memory usage.
+   a. Models in ONNX format can be [converted to ORT format](../../reference/ort-model-format.md), for optimized model binary size, faster initialization and peak memory usage.
 
    b. The size of the ONNX Runtime itself can reduced by [building a custom package](../../build/custom.md) that only includes support for your specific model/s.
-Original file line number
+Diff line change
@@ Expand Up @@
     ### Minimal Build Support
-    ONNX Runtime WebAssembly can be built with flag `--minimal_build`. This will generate smaller artifacts and also have a less runtime memory usage. An ORT format model is required. A detailed instruction will come soon. See also [ORT format Conversion](../reference/ort-format-model-conversion.md).
+    ONNX Runtime WebAssembly can be built with flag `--minimal_build`. This will generate smaller artifacts and also have a less runtime memory usage. An ORT format model is required. A detailed instruction will come soon. See also [ORT format Conversion](../reference/ort-model-format.md).
     ### FAQ
@@ Expand Down @@