Skip to content

Commit

Permalink
Update after review #3
Browse files Browse the repository at this point in the history
  • Loading branch information
natke committed Dec 7, 2021
1 parent 01314a2 commit aad47ac
Show file tree
Hide file tree
Showing 11 changed files with 157 additions and 30 deletions.
7 changes: 3 additions & 4 deletions docs/build/custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ To build a custom ONNX Runtime package, the [build from source](./index.md) inst
* TOC placeholder
{:toc}

## Reduce operator set
## Reduce operator kernels

To reduce the compiled binary size of ONNX Runtime, the operator kernels included in the build can be reduced to just those required by your model/s.

Expand All @@ -40,15 +40,15 @@ The operators that are included are specified at build time, in a [configuration

**`--enable_reduced_operator_type_support`**

* Enables [operator type reduction](../reference/ort-format-model-conversion.md#enable-type-reduction). Requires ONNX Runtime version 1.7 or higher and for type reduction to have been enabled during model conversion
* Enables [operator type reduction](../reference/ort-model-format.md#enable-type-reduction). Requires ONNX Runtime version 1.7 or higher and for type reduction to have been enabled during model conversion

If the configuration file is created using ORT format models, the input/output types that individual operators require can be tracked if `--enable_type_reduction` is specified. This can be used to further reduce the build size if `--enable_reduced_operator_type_support` is specified when building ORT.

ONNX format models are not guaranteed to include the required per-node type information, so cannot be used with this option.

## Minimal build

ONNX Runtime can be built to further minimize the binary size, by only including support for loading and executing models in [ORT format](../reference/ort-format-model-conversion.md), and not ONNX format.
ONNX Runtime can be built to further minimize the binary size, by only including support for loading and executing models in [ORT format](../reference/ort-model-format.md), and not ONNX format.

**`--minimal_build`**

Expand All @@ -63,7 +63,6 @@ A minimal build has the following limitations:
- Execution providers that compile nodes are optionally supported
- currently this is limited to the NNAPI and CoreML Execution Providers

We do not currently offer backwards compatibility guarantees for ORT format models, as we will be expanding the capabilities in the short term and may need to update the internal format in an incompatible manner to accommodate these changes. You may need to regenerate the ORT format models to use with a future version of ONNX Runtime. Once the feature set stabilizes we will provide backwards compatibility guarantees.

## Other customizations

Expand Down
2 changes: 1 addition & 1 deletion docs/build/web.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ To get all build artifacts of ONNX Runtime WebAssembly, it needs 4 times of buil

### Minimal Build Support

ONNX Runtime WebAssembly can be built with flag `--minimal_build`. This will generate smaller artifacts and also have a less runtime memory usage. An ORT format model is required. A detailed instruction will come soon. See also [ORT format Conversion](../reference/ort-format-model-conversion.md).
ONNX Runtime WebAssembly can be built with flag `--minimal_build`. This will generate smaller artifacts and also have a less runtime memory usage. An ORT format model is required. A detailed instruction will come soon. See also [ORT format Conversion](../reference/ort-model-format.md).

### FAQ

Expand Down
4 changes: 2 additions & 2 deletions docs/performance/mobile-performance-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ _Layout_ optimizations may be hardware specific and involve internal conversions

### Outcome of optimizations when creating an optimized ORT format model

Below is an example of the changes that occur in _basic_ and _extended_ optimizations when applied to the MNIST model with only the CPU EP enabled. The optimization level is specified when [creating the ORT format model](../reference/ort-format-model-conversion.md#optimization-level).
Below is an example of the changes that occur in _basic_ and _extended_ optimizations when applied to the MNIST model with only the CPU EP enabled. The optimization level is specified when [creating the ORT format model](../reference/ort-model-format.md#optimization-level).

- At the _basic_ level we combine the Conv and Add nodes (the addition is done via the 'B' input to Conv), we combine the MatMul and Add into a single Gemm node (the addition is done via the 'C' input to Gemm), and constant fold to remove one of the Reshape nodes.
- `python <ORT repository root>/tools/python/convert_onnx_models_to_ort.py --optimization_level basic /dir_with_mnist_onnx_model`
Expand Down Expand Up @@ -121,7 +121,7 @@ To create an NNAPI-aware ORT format model please follow these steps.
pip install -U build\Windows\RelWithDebIfo\RelWithDebIfo\dist\onnxruntime_noopenmp-1.7.0-cp37-cp37m-win_amd64.whl
```
3. Create an NNAPI-aware ORT format model by running `convert_onnx_models_to_ort.py` as per the [standard instructions](../reference/ort-format-model-conversion.md), with NNAPI enabled (`--use_nnapi`), and the optimization level set to _extended_ or _all_ (e.g. `--optimization_level extended`). This will allow higher level optimizations to run on any nodes that NNAPI can not handle.
3. Create an NNAPI-aware ORT format model by running `convert_onnx_models_to_ort.py` as per the [standard instructions](../reference/ort-model-format.md), with NNAPI enabled (`--use_nnapi`), and the optimization level set to _extended_ or _all_ (e.g. `--optimization_level extended`). This will allow higher level optimizations to run on any nodes that NNAPI can not handle.
```
python <ORT repository root>/tools/python/convert_onnx_models_to_ort.py --use_nnapi --optimization_level extended /models
```
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/build-web-app.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ You need to understand your web app's scenario and get an ONNX model that is app

ONNX models can be obtained from the [ONNX model zoo](https://github.com/onnx/models), converted from PyTorch or TensorFlow, and many other places.

You can [convert the ONNX format model to ORT format model](./ort-format-model-conversion.md), for optimized binary size, faster initialization and peak memory usage.
You can [convert the ONNX format model to ORT format model](./ort-model-format.md), for optimized binary size, faster initialization and peak memory usage.

You can [perform a model-specific custom build](../build/custom.md) to further optimize binary size.

Expand Down
139 changes: 139 additions & 0 deletions docs/reference/operators/mobile_package_op_type_support_1.10.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
title: ORT 1.10 Mobile Package Operators
parent: Operators
grand_parent: Reference
---


# ONNX Runtime 1.10 Mobile Pre-Built Package Operator and Type Support

## Supported operators and types

The supported operators and types are based on what is required to support float32 and quantized versions of popular models. The full list of input models used to determine this list is available [here](https://github.com/microsoft/onnxruntime/blob/master/tools/ci_build/github/android/mobile_package.required_operators.readme.txt)

## Supported data input types

- float
- int8_t
- uint8_t

NOTE: Operators used to manipulate dimensions and indices will support int32 and int64.

## Supported Operators

|Operator|Opsets|
|--------|------|
|**ai.onnx**||
|ai.onnx:Abs|12, 13, 14, 15|
|ai.onnx:Add|12, 13, 14, 15|
|ai.onnx:And|12, 13, 14, 15|
|ai.onnx:ArgMax|12, 13, 14, 15|
|ai.onnx:ArgMin|12, 13, 14, 15|
|ai.onnx:AveragePool|12, 13, 14, 15|
|ai.onnx:Cast|12, 13, 14, 15|
|ai.onnx:Ceil|12, 13, 14, 15|
|ai.onnx:Clip|12, 13, 14, 15|
|ai.onnx:Concat|12, 13, 14, 15|
|ai.onnx:ConstantOfShape|12, 13, 14, 15|
|ai.onnx:Conv|12, 13, 14, 15|
|ai.onnx:ConvTranspose|12, 13, 14, 15|
|ai.onnx:Cos|12, 13, 14, 15|
|ai.onnx:CumSum|12, 13, 14, 15|
|ai.onnx:DepthToSpace|12, 13, 14, 15|
|ai.onnx:DequantizeLinear|12, 13, 14, 15|
|ai.onnx:Div|12, 13, 14, 15|
|ai.onnx:DynamicQuantizeLinear|12, 13, 14, 15|
|ai.onnx:Elu|12, 13, 14, 15|
|ai.onnx:Equal|12, 13, 14, 15|
|ai.onnx:Erf|12, 13, 14, 15|
|ai.onnx:Exp|12, 13, 14, 15|
|ai.onnx:Expand|12, 13, 14, 15|
|ai.onnx:Flatten|12, 13, 14, 15|
|ai.onnx:Floor|12, 13, 14, 15|
|ai.onnx:Gather|12, 13, 14, 15|
|ai.onnx:GatherND|12, 13, 14, 15|
|ai.onnx:Gemm|12, 13, 14, 15|
|ai.onnx:GlobalAveragePool|12, 13, 14, 15|
|ai.onnx:Greater|12, 13, 14, 15|
|ai.onnx:GreaterOrEqual|12, 13, 14, 15|
|ai.onnx:HardSigmoid|12, 13, 14, 15|
|ai.onnx:Identity|12, 13, 14, 15|
|ai.onnx:If|12, 13, 14, 15|
|ai.onnx:InstanceNormalization|12, 13, 14, 15|
|ai.onnx:LRN|12, 13, 14, 15|
|ai.onnx:LayerNormalization|1|
|ai.onnx:LeakyRelu|12, 13, 14, 15|
|ai.onnx:Less|12, 13, 14, 15|
|ai.onnx:LessOrEqual|12, 13, 14, 15|
|ai.onnx:Log|12, 13, 14, 15|
|ai.onnx:LogSoftmax|12, 13, 14, 15|
|ai.onnx:Loop|12, 13, 14, 15|
|ai.onnx:MatMul|12, 13, 14, 15|
|ai.onnx:MatMulInteger|12, 13, 14, 15|
|ai.onnx:Max|12, 13, 14, 15|
|ai.onnx:MaxPool|12, 13, 14, 15|
|ai.onnx:Mean|12, 13, 14, 15|
|ai.onnx:Min|12, 13, 14, 15|
|ai.onnx:Mul|12, 13, 14, 15|
|ai.onnx:Neg|12, 13, 14, 15|
|ai.onnx:NonMaxSuppression|12, 13, 14, 15|
|ai.onnx:NonZero|12, 13, 14, 15|
|ai.onnx:Not|12, 13, 14, 15|
|ai.onnx:Or|12, 13, 14, 15|
|ai.onnx:PRelu|12, 13, 14, 15|
|ai.onnx:Pad|12, 13, 14, 15|
|ai.onnx:Pow|12, 13, 14, 15|
|ai.onnx:QLinearConv|12, 13, 14, 15|
|ai.onnx:QLinearMatMul|12, 13, 14, 15|
|ai.onnx:QuantizeLinear|12, 13, 14, 15|
|ai.onnx:Range|12, 13, 14, 15|
|ai.onnx:Reciprocal|12, 13, 14, 15|
|ai.onnx:ReduceMax|12, 13, 14, 15|
|ai.onnx:ReduceMean|12, 13, 14, 15|
|ai.onnx:ReduceMin|12, 13, 14, 15|
|ai.onnx:ReduceProd|12, 13, 14, 15|
|ai.onnx:ReduceSum|12, 13, 14, 15|
|ai.onnx:Relu|12, 13, 14, 15|
|ai.onnx:Reshape|12, 13, 14, 15|
|ai.onnx:Resize|12, 13, 14, 15|
|ai.onnx:ReverseSequence|12, 13, 14, 15|
|ai.onnx:Round|12, 13, 14, 15|
|ai.onnx:Scan|12, 13, 14, 15|
|ai.onnx:ScatterND|12, 13, 14, 15|
|ai.onnx:Shape|12, 13, 14, 15|
|ai.onnx:Sigmoid|12, 13, 14, 15|
|ai.onnx:Sin|12, 13, 14, 15|
|ai.onnx:Size|12, 13, 14, 15|
|ai.onnx:Slice|12, 13, 14, 15|
|ai.onnx:Softmax|12, 13, 14, 15|
|ai.onnx:SpaceToDepth|12, 13, 14, 15|
|ai.onnx:Split|12, 13, 14, 15|
|ai.onnx:Sqrt|12, 13, 14, 15|
|ai.onnx:Squeeze|12, 13, 14, 15|
|ai.onnx:Sub|12, 13, 14, 15|
|ai.onnx:Sum|12, 13, 14, 15|
|ai.onnx:Tanh|12, 13, 14, 15|
|ai.onnx:ThresholdedRelu|12, 13, 14, 15|
|ai.onnx:Tile|12, 13, 14, 15|
|ai.onnx:TopK|12, 13, 14, 15|
|ai.onnx:Transpose|12, 13, 14, 15|
|ai.onnx:Unique|12, 13, 14, 15|
|ai.onnx:Unsqueeze|12, 13, 14, 15|
|ai.onnx:Where|12, 13, 14, 15|
|||
|**com.microsoft**||
|com.microsoft:DynamicQuantizeMatMul|1|
|com.microsoft:FusedConv|1|
|com.microsoft:FusedGemm|1|
|com.microsoft:FusedMatMul|1|
|com.microsoft:Gelu|1|
|com.microsoft:MatMulIntegerToFloat|1|
|com.microsoft:NhwcMaxPool|1|
|com.microsoft:QLinearAdd|1|
|com.microsoft:QLinearAveragePool|1|
|com.microsoft:QLinearConv|1|
|com.microsoft:QLinearGlobalAveragePool|1|
|com.microsoft:QLinearLeakyRelu|1|
|com.microsoft:QLinearMul|1|
|com.microsoft:QLinearSigmoid|1|
|||
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ grand_parent: Reference
redirect_from: /docs/reference/mobile/prebuilt-package/mobile_package_op_type_support_1.9
---

# ONNX Runtime Mobile Pre-Built Package Operator and Type Support
# ONNX Runtime Mobile 1.9 Pre-Built Package Operator and Type Support

## Supported operators and types

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,7 @@ Ort::Session session(env, <path to model>, session_options);
```
Java API
```java
SessionOptions session_options = new SessionOptions();
session_options.addConfigEntry("session.load_model_format", "ORT");
Expand All @@ -216,6 +217,7 @@ OrtSession session = env.createSession(<path to model>, session_options);
```

JavaScript API

```js
import * as ort from "onnxruntime-web";

Expand All @@ -226,7 +228,7 @@ const session = await ort.InferenceSession.create("<path to model>");

If a session is created using an input byte array containing the ORT format model data, by default we will copy the model bytes at the time of session creation to ensure the model bytes buffer is valid.

You may also enable the option to use the model bytes directly by setting the Session Options config `session.use_ort_model_bytes_directly` to `1`, this may reduce the peak memory usage of ONNX Runtime Mobile, you will need to guarantee that the model bytes are valid throughout the lifespan of the ORT session using the model bytes. For ONNX Runtime Web, this option is set by default.
You may also enable the option to use the model bytes directly by setting the Session Options config `session.use_ort_model_bytes_directly` to `1`. This may reduce the peak memory usage of ONNX Runtime Mobile, but you will need to guarantee that the model bytes are valid throughout the lifespan of the ORT session. For ONNX Runtime Web, this option is set by default.

C++ API
```c++
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/reduced-operator-config-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ nav_order: 3

The reduced operator config file is an input to the ONNX Runtime build-from-source script. It specifies which operators are included in the runtime. A reduced set of operators in ONNX Runtime permits a smaller build binary size. A smaller runtime is used in constrained environments, such as mobile and web deployments.

This article shows you how to generate the reduced operator config file using the `create_reduced_build_config.py` script. You can also generate the reduced operator config file by [converting ONNX models to ORT format](./ort-format-model-conversion.md).
This article shows you how to generate the reduced operator config file using the `create_reduced_build_config.py` script. You can also generate the reduced operator config file by [converting ONNX models to ORT format](./ort-model-format.md).

## Contents
{: .no_toc}
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/mobile/deploy-ios.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ This example is heavily based on [Google Tensorflow lite - Object Detection Exam

> Conversion of this model is a two part process. The original model is in tflite format. This is firstly converted to ONNX format using the [tf2onnx converter](https://github.com/onnx/tensorflow-onnx).
>
> The model is then converted into ORT format using the [onnx to ort converter](../../reference/ort-format-model-conversion.md).
> The model is then converted into ORT format using the [onnx to ort converter](../../reference/ort-model-format.md).
>
> As well as generating the model in ORT format, the conversion script also outputs an [operator config file](../../reference/reduced-operator-config-file.md)
Expand Down
21 changes: 4 additions & 17 deletions docs/tutorials/mobile/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,31 +32,18 @@ ONNX Runtime gives you a variety of options to add machine learning to your mobi

ONNX models can be obtained from the [ONNX model zoo](https://github.com/onnx/models), converted from PyTorch or TensorFlow, and many other places.

Once you have sourced or converted the model into ONNX format, there is a further step required to optimize the model for mobile deployments. [Convert the model to ORT format](../../reference/ort-format-model-conversion.md) for optimized model binary size, faster initialization and peak memory usage.
Once you have sourced or converted the model into ONNX format, there is a further step required to optimize the model for mobile deployments. [Convert the model to ORT format](../../reference/ort-model-format.md) for optimized model binary size, faster initialization and peak memory usage.

3. How do I bootstrap my app development?

If you are starting from scratch, bootstrap your mobile application according in your mobile framework XCode or Android Development Kit. TODO check this.

a. Add the ONNX Runtime dependency
b. Consume the onnxruntime-web API in your application
b. Consume the onnxruntime API in your application
c. Add pre and post processing appropriate to your application and model

4. How do I optimize my application?

The libraries in step 1 can be optimized to meet memory and processing demands.
The execution environment on mobile devices has fixed memory and disk storage. Therefore, it is essential that any AI execution library is optimized to consume minimum resources in terms of disk footprint, memory and network usage (both model size and binary size).

The size of the ONNX Runtime itself can reduced by [building a custom package](../../build/custom.md) that only includes support for your specific model/s.

## Helpful resources


TODO

Can this be included anywhere:

The execution environment on mobile devices has fixed memory and disk storage. Therefore, it is essential that any AI execution library is optimized to consume minimum resources in terms of disk footprint, memory and network usage (both model size and binary size).

ONNX Runtime Mobile uses the ORT formatted model which enables us to create a [custom ORT build](../build/custom.md) that minimizes the binary size and reduces memory usage for client side inference. The ORT formatted model file is generated from the regular ONNX model using the `onnxruntime` python package. The custom build does this primarily by only including specified operators and types in the build, as well as trimming down dependencies per custom needs.

An ONNX model must be converted to an ORT format model to be used with minimal build in ONNX Runtime Mobile.
ONNX Runtime Mobile uses the ORT model format which enables us to create a [custom ORT build](../../build/custom.md) that minimizes the binary size and reduces memory usage for client side inference. The ORT model format file is generated from the regular ONNX model using the `onnxruntime` python package. The custom build does this primarily by only including specified operators and types in the build, as well as trimming down dependencies per custom needs.
2 changes: 1 addition & 1 deletion docs/tutorials/web/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,6 @@ For more detail on the steps below, see the [build a web application](../../refe

The libraries and models mentioned in the previous steps can be optimized to meet memory and processing demands.

a. Models in ONNX format can be [converted to ORT format](../../reference/ort-format-model-conversion.md), for optimized model binary size, faster initialization and peak memory usage.
a. Models in ONNX format can be [converted to ORT format](../../reference/ort-model-format.md), for optimized model binary size, faster initialization and peak memory usage.

b. The size of the ONNX Runtime itself can reduced by [building a custom package](../../build/custom.md) that only includes support for your specific model/s.

0 comments on commit aad47ac

Please sign in to comment.