diff --git a/docs/build/custom.md b/docs/build/custom.md index 1cd6583ebef31..2b91e3516ae51 100644 --- a/docs/build/custom.md +++ b/docs/build/custom.md @@ -23,7 +23,7 @@ To build a custom ONNX Runtime package, the [build from source](./index.md) inst * TOC placeholder {:toc} -## Reduce operator set +## Reduce operator kernels To reduce the compiled binary size of ONNX Runtime, the operator kernels included in the build can be reduced to just those required by your model/s. @@ -40,7 +40,7 @@ The operators that are included are specified at build time, in a [configuration **`--enable_reduced_operator_type_support`** -* Enables [operator type reduction](../reference/ort-format-model-conversion.md#enable-type-reduction). Requires ONNX Runtime version 1.7 or higher and for type reduction to have been enabled during model conversion +* Enables [operator type reduction](../reference/ort-model-format.md#enable-type-reduction). Requires ONNX Runtime version 1.7 or higher and for type reduction to have been enabled during model conversion If the configuration file is created using ORT format models, the input/output types that individual operators require can be tracked if `--enable_type_reduction` is specified. This can be used to further reduce the build size if `--enable_reduced_operator_type_support` is specified when building ORT. @@ -48,7 +48,7 @@ ONNX format models are not guaranteed to include the required per-node type info ## Minimal build -ONNX Runtime can be built to further minimize the binary size, by only including support for loading and executing models in [ORT format](../reference/ort-format-model-conversion.md), and not ONNX format. +ONNX Runtime can be built to further minimize the binary size, by only including support for loading and executing models in [ORT format](../reference/ort-model-format.md), and not ONNX format. **`--minimal_build`** @@ -63,7 +63,6 @@ A minimal build has the following limitations: - Execution providers that compile nodes are optionally supported - currently this is limited to the NNAPI and CoreML Execution Providers -We do not currently offer backwards compatibility guarantees for ORT format models, as we will be expanding the capabilities in the short term and may need to update the internal format in an incompatible manner to accommodate these changes. You may need to regenerate the ORT format models to use with a future version of ONNX Runtime. Once the feature set stabilizes we will provide backwards compatibility guarantees. ## Other customizations diff --git a/docs/build/web.md b/docs/build/web.md index 15ea90ceb1eb8..cbb417520f400 100644 --- a/docs/build/web.md +++ b/docs/build/web.md @@ -77,7 +77,7 @@ To get all build artifacts of ONNX Runtime WebAssembly, it needs 4 times of buil ### Minimal Build Support -ONNX Runtime WebAssembly can be built with flag `--minimal_build`. This will generate smaller artifacts and also have a less runtime memory usage. An ORT format model is required. A detailed instruction will come soon. See also [ORT format Conversion](../reference/ort-format-model-conversion.md). +ONNX Runtime WebAssembly can be built with flag `--minimal_build`. This will generate smaller artifacts and also have a less runtime memory usage. An ORT format model is required. A detailed instruction will come soon. See also [ORT format Conversion](../reference/ort-model-format.md). ### FAQ diff --git a/docs/performance/mobile-performance-tuning.md b/docs/performance/mobile-performance-tuning.md index bd1ea0edf44d2..57bb9b30abc3e 100644 --- a/docs/performance/mobile-performance-tuning.md +++ b/docs/performance/mobile-performance-tuning.md @@ -48,7 +48,7 @@ _Layout_ optimizations may be hardware specific and involve internal conversions ### Outcome of optimizations when creating an optimized ORT format model -Below is an example of the changes that occur in _basic_ and _extended_ optimizations when applied to the MNIST model with only the CPU EP enabled. The optimization level is specified when [creating the ORT format model](../reference/ort-format-model-conversion.md#optimization-level). +Below is an example of the changes that occur in _basic_ and _extended_ optimizations when applied to the MNIST model with only the CPU EP enabled. The optimization level is specified when [creating the ORT format model](../reference/ort-model-format.md#optimization-level). - At the _basic_ level we combine the Conv and Add nodes (the addition is done via the 'B' input to Conv), we combine the MatMul and Add into a single Gemm node (the addition is done via the 'C' input to Gemm), and constant fold to remove one of the Reshape nodes. - `python /tools/python/convert_onnx_models_to_ort.py --optimization_level basic /dir_with_mnist_onnx_model` @@ -121,7 +121,7 @@ To create an NNAPI-aware ORT format model please follow these steps. pip install -U build\Windows\RelWithDebIfo\RelWithDebIfo\dist\onnxruntime_noopenmp-1.7.0-cp37-cp37m-win_amd64.whl ``` -3. Create an NNAPI-aware ORT format model by running `convert_onnx_models_to_ort.py` as per the [standard instructions](../reference/ort-format-model-conversion.md), with NNAPI enabled (`--use_nnapi`), and the optimization level set to _extended_ or _all_ (e.g. `--optimization_level extended`). This will allow higher level optimizations to run on any nodes that NNAPI can not handle. +3. Create an NNAPI-aware ORT format model by running `convert_onnx_models_to_ort.py` as per the [standard instructions](../reference/ort-model-format.md), with NNAPI enabled (`--use_nnapi`), and the optimization level set to _extended_ or _all_ (e.g. `--optimization_level extended`). This will allow higher level optimizations to run on any nodes that NNAPI can not handle. ``` python /tools/python/convert_onnx_models_to_ort.py --use_nnapi --optimization_level extended /models ``` diff --git a/docs/reference/build-web-app.md b/docs/reference/build-web-app.md index a39dda0ce427e..4fc1c51b1ef2a 100644 --- a/docs/reference/build-web-app.md +++ b/docs/reference/build-web-app.md @@ -47,7 +47,7 @@ You need to understand your web app's scenario and get an ONNX model that is app ONNX models can be obtained from the [ONNX model zoo](https://github.com/onnx/models), converted from PyTorch or TensorFlow, and many other places. -You can [convert the ONNX format model to ORT format model](./ort-format-model-conversion.md), for optimized binary size, faster initialization and peak memory usage. +You can [convert the ONNX format model to ORT format model](./ort-model-format.md), for optimized binary size, faster initialization and peak memory usage. You can [perform a model-specific custom build](../build/custom.md) to further optimize binary size. diff --git a/docs/reference/operators/mobile_package_op_type_support_1.10.md b/docs/reference/operators/mobile_package_op_type_support_1.10.md new file mode 100644 index 0000000000000..8e9b16ba8629c --- /dev/null +++ b/docs/reference/operators/mobile_package_op_type_support_1.10.md @@ -0,0 +1,139 @@ +--- +title: ORT 1.10 Mobile Package Operators +parent: Operators +grand_parent: Reference +--- + + +# ONNX Runtime 1.10 Mobile Pre-Built Package Operator and Type Support + +## Supported operators and types + +The supported operators and types are based on what is required to support float32 and quantized versions of popular models. The full list of input models used to determine this list is available [here](https://github.com/microsoft/onnxruntime/blob/master/tools/ci_build/github/android/mobile_package.required_operators.readme.txt) + +## Supported data input types + + - float + - int8_t + - uint8_t + +NOTE: Operators used to manipulate dimensions and indices will support int32 and int64. + +## Supported Operators + +|Operator|Opsets| +|--------|------| +|**ai.onnx**|| +|ai.onnx:Abs|12, 13, 14, 15| +|ai.onnx:Add|12, 13, 14, 15| +|ai.onnx:And|12, 13, 14, 15| +|ai.onnx:ArgMax|12, 13, 14, 15| +|ai.onnx:ArgMin|12, 13, 14, 15| +|ai.onnx:AveragePool|12, 13, 14, 15| +|ai.onnx:Cast|12, 13, 14, 15| +|ai.onnx:Ceil|12, 13, 14, 15| +|ai.onnx:Clip|12, 13, 14, 15| +|ai.onnx:Concat|12, 13, 14, 15| +|ai.onnx:ConstantOfShape|12, 13, 14, 15| +|ai.onnx:Conv|12, 13, 14, 15| +|ai.onnx:ConvTranspose|12, 13, 14, 15| +|ai.onnx:Cos|12, 13, 14, 15| +|ai.onnx:CumSum|12, 13, 14, 15| +|ai.onnx:DepthToSpace|12, 13, 14, 15| +|ai.onnx:DequantizeLinear|12, 13, 14, 15| +|ai.onnx:Div|12, 13, 14, 15| +|ai.onnx:DynamicQuantizeLinear|12, 13, 14, 15| +|ai.onnx:Elu|12, 13, 14, 15| +|ai.onnx:Equal|12, 13, 14, 15| +|ai.onnx:Erf|12, 13, 14, 15| +|ai.onnx:Exp|12, 13, 14, 15| +|ai.onnx:Expand|12, 13, 14, 15| +|ai.onnx:Flatten|12, 13, 14, 15| +|ai.onnx:Floor|12, 13, 14, 15| +|ai.onnx:Gather|12, 13, 14, 15| +|ai.onnx:GatherND|12, 13, 14, 15| +|ai.onnx:Gemm|12, 13, 14, 15| +|ai.onnx:GlobalAveragePool|12, 13, 14, 15| +|ai.onnx:Greater|12, 13, 14, 15| +|ai.onnx:GreaterOrEqual|12, 13, 14, 15| +|ai.onnx:HardSigmoid|12, 13, 14, 15| +|ai.onnx:Identity|12, 13, 14, 15| +|ai.onnx:If|12, 13, 14, 15| +|ai.onnx:InstanceNormalization|12, 13, 14, 15| +|ai.onnx:LRN|12, 13, 14, 15| +|ai.onnx:LayerNormalization|1| +|ai.onnx:LeakyRelu|12, 13, 14, 15| +|ai.onnx:Less|12, 13, 14, 15| +|ai.onnx:LessOrEqual|12, 13, 14, 15| +|ai.onnx:Log|12, 13, 14, 15| +|ai.onnx:LogSoftmax|12, 13, 14, 15| +|ai.onnx:Loop|12, 13, 14, 15| +|ai.onnx:MatMul|12, 13, 14, 15| +|ai.onnx:MatMulInteger|12, 13, 14, 15| +|ai.onnx:Max|12, 13, 14, 15| +|ai.onnx:MaxPool|12, 13, 14, 15| +|ai.onnx:Mean|12, 13, 14, 15| +|ai.onnx:Min|12, 13, 14, 15| +|ai.onnx:Mul|12, 13, 14, 15| +|ai.onnx:Neg|12, 13, 14, 15| +|ai.onnx:NonMaxSuppression|12, 13, 14, 15| +|ai.onnx:NonZero|12, 13, 14, 15| +|ai.onnx:Not|12, 13, 14, 15| +|ai.onnx:Or|12, 13, 14, 15| +|ai.onnx:PRelu|12, 13, 14, 15| +|ai.onnx:Pad|12, 13, 14, 15| +|ai.onnx:Pow|12, 13, 14, 15| +|ai.onnx:QLinearConv|12, 13, 14, 15| +|ai.onnx:QLinearMatMul|12, 13, 14, 15| +|ai.onnx:QuantizeLinear|12, 13, 14, 15| +|ai.onnx:Range|12, 13, 14, 15| +|ai.onnx:Reciprocal|12, 13, 14, 15| +|ai.onnx:ReduceMax|12, 13, 14, 15| +|ai.onnx:ReduceMean|12, 13, 14, 15| +|ai.onnx:ReduceMin|12, 13, 14, 15| +|ai.onnx:ReduceProd|12, 13, 14, 15| +|ai.onnx:ReduceSum|12, 13, 14, 15| +|ai.onnx:Relu|12, 13, 14, 15| +|ai.onnx:Reshape|12, 13, 14, 15| +|ai.onnx:Resize|12, 13, 14, 15| +|ai.onnx:ReverseSequence|12, 13, 14, 15| +|ai.onnx:Round|12, 13, 14, 15| +|ai.onnx:Scan|12, 13, 14, 15| +|ai.onnx:ScatterND|12, 13, 14, 15| +|ai.onnx:Shape|12, 13, 14, 15| +|ai.onnx:Sigmoid|12, 13, 14, 15| +|ai.onnx:Sin|12, 13, 14, 15| +|ai.onnx:Size|12, 13, 14, 15| +|ai.onnx:Slice|12, 13, 14, 15| +|ai.onnx:Softmax|12, 13, 14, 15| +|ai.onnx:SpaceToDepth|12, 13, 14, 15| +|ai.onnx:Split|12, 13, 14, 15| +|ai.onnx:Sqrt|12, 13, 14, 15| +|ai.onnx:Squeeze|12, 13, 14, 15| +|ai.onnx:Sub|12, 13, 14, 15| +|ai.onnx:Sum|12, 13, 14, 15| +|ai.onnx:Tanh|12, 13, 14, 15| +|ai.onnx:ThresholdedRelu|12, 13, 14, 15| +|ai.onnx:Tile|12, 13, 14, 15| +|ai.onnx:TopK|12, 13, 14, 15| +|ai.onnx:Transpose|12, 13, 14, 15| +|ai.onnx:Unique|12, 13, 14, 15| +|ai.onnx:Unsqueeze|12, 13, 14, 15| +|ai.onnx:Where|12, 13, 14, 15| +||| +|**com.microsoft**|| +|com.microsoft:DynamicQuantizeMatMul|1| +|com.microsoft:FusedConv|1| +|com.microsoft:FusedGemm|1| +|com.microsoft:FusedMatMul|1| +|com.microsoft:Gelu|1| +|com.microsoft:MatMulIntegerToFloat|1| +|com.microsoft:NhwcMaxPool|1| +|com.microsoft:QLinearAdd|1| +|com.microsoft:QLinearAveragePool|1| +|com.microsoft:QLinearConv|1| +|com.microsoft:QLinearGlobalAveragePool|1| +|com.microsoft:QLinearLeakyRelu|1| +|com.microsoft:QLinearMul|1| +|com.microsoft:QLinearSigmoid|1| +||| \ No newline at end of file diff --git a/docs/reference/operators/mobile_package_op_type_support_1.9.md b/docs/reference/operators/mobile_package_op_type_support_1.9.md index f8394681e89b2..d8c68efc82f01 100644 --- a/docs/reference/operators/mobile_package_op_type_support_1.9.md +++ b/docs/reference/operators/mobile_package_op_type_support_1.9.md @@ -5,7 +5,7 @@ grand_parent: Reference redirect_from: /docs/reference/mobile/prebuilt-package/mobile_package_op_type_support_1.9 --- -# ONNX Runtime Mobile Pre-Built Package Operator and Type Support +# ONNX Runtime Mobile 1.9 Pre-Built Package Operator and Type Support ## Supported operators and types diff --git a/docs/reference/ort-format-model-conversion.md b/docs/reference/ort-model-format.md similarity index 97% rename from docs/reference/ort-format-model-conversion.md rename to docs/reference/ort-model-format.md index 7fe0bbef2d7c1..eb780368b226a 100644 --- a/docs/reference/ort-format-model-conversion.md +++ b/docs/reference/ort-model-format.md @@ -207,6 +207,7 @@ Ort::Session session(env, , session_options); ``` Java API + ```java SessionOptions session_options = new SessionOptions(); session_options.addConfigEntry("session.load_model_format", "ORT"); @@ -216,6 +217,7 @@ OrtSession session = env.createSession(, session_options); ``` JavaScript API + ```js import * as ort from "onnxruntime-web"; @@ -226,7 +228,7 @@ const session = await ort.InferenceSession.create(""); If a session is created using an input byte array containing the ORT format model data, by default we will copy the model bytes at the time of session creation to ensure the model bytes buffer is valid. -You may also enable the option to use the model bytes directly by setting the Session Options config `session.use_ort_model_bytes_directly` to `1`, this may reduce the peak memory usage of ONNX Runtime Mobile, you will need to guarantee that the model bytes are valid throughout the lifespan of the ORT session using the model bytes. For ONNX Runtime Web, this option is set by default. +You may also enable the option to use the model bytes directly by setting the Session Options config `session.use_ort_model_bytes_directly` to `1`. This may reduce the peak memory usage of ONNX Runtime Mobile, but you will need to guarantee that the model bytes are valid throughout the lifespan of the ORT session. For ONNX Runtime Web, this option is set by default. C++ API ```c++ diff --git a/docs/reference/reduced-operator-config-file.md b/docs/reference/reduced-operator-config-file.md index fabb7bd932e2a..775885c58b2a2 100644 --- a/docs/reference/reduced-operator-config-file.md +++ b/docs/reference/reduced-operator-config-file.md @@ -11,7 +11,7 @@ nav_order: 3 The reduced operator config file is an input to the ONNX Runtime build-from-source script. It specifies which operators are included in the runtime. A reduced set of operators in ONNX Runtime permits a smaller build binary size. A smaller runtime is used in constrained environments, such as mobile and web deployments. -This article shows you how to generate the reduced operator config file using the `create_reduced_build_config.py` script. You can also generate the reduced operator config file by [converting ONNX models to ORT format](./ort-format-model-conversion.md). +This article shows you how to generate the reduced operator config file using the `create_reduced_build_config.py` script. You can also generate the reduced operator config file by [converting ONNX models to ORT format](./ort-model-format.md). ## Contents {: .no_toc} diff --git a/docs/tutorials/mobile/deploy-ios.md b/docs/tutorials/mobile/deploy-ios.md index e43bc90162817..562b69660916a 100644 --- a/docs/tutorials/mobile/deploy-ios.md +++ b/docs/tutorials/mobile/deploy-ios.md @@ -65,7 +65,7 @@ This example is heavily based on [Google Tensorflow lite - Object Detection Exam > Conversion of this model is a two part process. The original model is in tflite format. This is firstly converted to ONNX format using the [tf2onnx converter](https://github.com/onnx/tensorflow-onnx). > - > The model is then converted into ORT format using the [onnx to ort converter](../../reference/ort-format-model-conversion.md). + > The model is then converted into ORT format using the [onnx to ort converter](../../reference/ort-model-format.md). > > As well as generating the model in ORT format, the conversion script also outputs an [operator config file](../../reference/reduced-operator-config-file.md) diff --git a/docs/tutorials/mobile/index.md b/docs/tutorials/mobile/index.md index 15537d76f7924..cf1e842a15b19 100644 --- a/docs/tutorials/mobile/index.md +++ b/docs/tutorials/mobile/index.md @@ -32,31 +32,18 @@ ONNX Runtime gives you a variety of options to add machine learning to your mobi ONNX models can be obtained from the [ONNX model zoo](https://github.com/onnx/models), converted from PyTorch or TensorFlow, and many other places. - Once you have sourced or converted the model into ONNX format, there is a further step required to optimize the model for mobile deployments. [Convert the model to ORT format](../../reference/ort-format-model-conversion.md) for optimized model binary size, faster initialization and peak memory usage. + Once you have sourced or converted the model into ONNX format, there is a further step required to optimize the model for mobile deployments. [Convert the model to ORT format](../../reference/ort-model-format.md) for optimized model binary size, faster initialization and peak memory usage. 3. How do I bootstrap my app development? If you are starting from scratch, bootstrap your mobile application according in your mobile framework XCode or Android Development Kit. TODO check this. a. Add the ONNX Runtime dependency - b. Consume the onnxruntime-web API in your application + b. Consume the onnxruntime API in your application c. Add pre and post processing appropriate to your application and model 4. How do I optimize my application? - The libraries in step 1 can be optimized to meet memory and processing demands. + The execution environment on mobile devices has fixed memory and disk storage. Therefore, it is essential that any AI execution library is optimized to consume minimum resources in terms of disk footprint, memory and network usage (both model size and binary size). - The size of the ONNX Runtime itself can reduced by [building a custom package](../../build/custom.md) that only includes support for your specific model/s. - -## Helpful resources - - -TODO - -Can this be included anywhere: - -The execution environment on mobile devices has fixed memory and disk storage. Therefore, it is essential that any AI execution library is optimized to consume minimum resources in terms of disk footprint, memory and network usage (both model size and binary size). - -ONNX Runtime Mobile uses the ORT formatted model which enables us to create a [custom ORT build](../build/custom.md) that minimizes the binary size and reduces memory usage for client side inference. The ORT formatted model file is generated from the regular ONNX model using the `onnxruntime` python package. The custom build does this primarily by only including specified operators and types in the build, as well as trimming down dependencies per custom needs. - -An ONNX model must be converted to an ORT format model to be used with minimal build in ONNX Runtime Mobile. \ No newline at end of file + ONNX Runtime Mobile uses the ORT model format which enables us to create a [custom ORT build](../../build/custom.md) that minimizes the binary size and reduces memory usage for client side inference. The ORT model format file is generated from the regular ONNX model using the `onnxruntime` python package. The custom build does this primarily by only including specified operators and types in the build, as well as trimming down dependencies per custom needs. diff --git a/docs/tutorials/web/index.md b/docs/tutorials/web/index.md index 5396e06e834a5..c45100ee4fca9 100644 --- a/docs/tutorials/web/index.md +++ b/docs/tutorials/web/index.md @@ -72,6 +72,6 @@ For more detail on the steps below, see the [build a web application](../../refe The libraries and models mentioned in the previous steps can be optimized to meet memory and processing demands. - a. Models in ONNX format can be [converted to ORT format](../../reference/ort-format-model-conversion.md), for optimized model binary size, faster initialization and peak memory usage. + a. Models in ONNX format can be [converted to ORT format](../../reference/ort-model-format.md), for optimized model binary size, faster initialization and peak memory usage. b. The size of the ONNX Runtime itself can reduced by [building a custom package](../../build/custom.md) that only includes support for your specific model/s.