Merge branch 'gh-pages' into genai_page_addition

microsoft · Sep 11, 2024 · 1cc299b · 1cc299b
2 parents 17bfb52 + fc3672c
commit 1cc299b
Show file tree

Hide file tree

Showing 25 changed files with 337 additions and 300 deletions.
diff --git a/docs/build/eps.md b/docs/build/eps.md
@@ -260,13 +260,13 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p
 ### Prerequisites
 {: .no_toc }
 
-1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.1** for the appropriate OS and target hardware:
-   * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
-   * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
+1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.3** for the appropriate OS and target hardware:
+   * [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
+   * [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
 
    Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions.
 
-  *2024.1 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.1](https://docs.openvino.ai/archive/2023.1/home.html) is minimal OpenVINO™ version requirement.*
+  *2024.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.3](https://docs.openvino.ai/2023.3/home.html) is minimal OpenVINO™ version requirement.*
 
 2. Configure the target hardware with specific follow on instructions:
    * To configure Intel<sup>®</sup> Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#gpu-guide-windows), [Linux](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#linux)

diff --git a/docs/execution-providers/CoreML-ExecutionProvider.md b/docs/execution-providers/CoreML-ExecutionProvider.md
@@ -128,10 +128,12 @@ Operators that are supported by the CoreML Execution Provider when a NeuralNetwo
 |ai.onnx.ReduceSum||
 |ai.onnx:Relu||
 |ai.onnx:Reshape||
-|ai.onnx:Resize||
+|ai.onnx:Resize|4D input.<br/>`coordinate_transformation_mode` == `asymmetric`.<br/>`mode` == `linear` or `nearest`.<br/>`nearest_mode` == `floor`.<br/>`exclude_outside` == false<br/>`scales` or `sizes` must be constant.|
 |ai.onnx:Shape|Attribute `start` with non-default value is not supported.<br/>Attribute `end` is not supported.|
 |ai.onnx:Sigmoid||
 |ai.onnx:Slice|Inputs `starts`, `ends`, `axes`, and `steps` should be constant. Empty slice is not supported.|
+|ai.onnx:Softmax||
+|ai.onnx:Split|If provided, `splits` must be constant.|
 |ai.onnx:Squeeze||
 |ai.onnx:Sqrt||
 |ai.onnx:Sub||
@@ -147,15 +149,26 @@ Operators that are supported by the CoreML Execution Provider when a MLProgram m
 |ai.onnx:Add||
 |ai.onnx:AveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
 |ai.onnx:Clip||
+|ai.onnx:Concat||
 |ai.onnx:Conv|Only 1D/2D Conv is supported.<br/>Bias if provided must be constant.|
+|ai.onnx:ConvTranspose|Weight and bias must be constant.<br/>padding_type of SAME_UPPER/SAME_LOWER is not supported.<br/>kernel_shape must have default values.<br/>output_shape is not supported.<br/>output_padding must have default values.|
+|ai.onnx.DepthToSpace|If 'mode' is 'CRD' the input must have a fixed shape.|
 |ai.onnx:Div||
 |ai.onnx:Gemm|Input B must be constant.|
 |ai.onnx:GlobalAveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
 |ai.onnx:GlobalMaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
+|ai.onnx:GridSample|4D input.<br/>'mode' of 'linear' or 'zeros'.<br/>(mode==linear && padding_mode==reflection && align_corners==0) is not supported.|
+|ai.onnx.LeakyRelu||
 |ai.onnx:MatMul|Only support for transA == 0, alpha == 1.0 and beta == 1.0 is currently implemented.|
 |ai.onnx:MaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
 |ai.onnx:Mul||
 |ai.onnx:Pow|Only supports cases when both inputs are fp32.|
 |ai.onnx:Relu||
 |ai.onnx:Reshape||
+|ai.onnx:Resize|See [resize_op_builder.cc](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/coreml/builders/impl/resize_op_builder.cc) implementation. There are too many permutations to describe the valid combinations.|
+|ai.onnx.Slice|starts/ends/axes/steps must be constant initializers.|
+|ai.onnx.Split|If provided, `splits` must be constant.|
 |ai.onnx:Sub||
+|ai.onnx:Sigmoid||
+|ai.onnx:Tanh||
+|ai.onnx:Transpose||
diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md
@@ -20,7 +20,7 @@ Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution
 ## Install
 
 Pre-built packages and Docker images are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release.
-* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.2 Release](https://github.com/intel/onnxruntime/releases)
+* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.4 Release](https://github.com/intel/onnxruntime/releases)
 * Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/)
 * Docker image: [openvino/onnxruntime_ep_ubuntu20](https://hub.docker.com/r/openvino/onnxruntime_ep_ubuntu20)
 
@@ -30,10 +30,9 @@ ONNX Runtime OpenVINO™ Execution Provider is compatible with three lastest rel
 
 |ONNX Runtime|OpenVINO™|Notes|
 |---|---|---|
+|1.19.0|2024.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.4)|
+|1.18.0|2024.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.3)|
 |1.17.1|2023.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.2)|
-|1.16.0|2023.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.1)|
-|1.15.0|2023.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.0.0)|
-|1.14.0|2022.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v4.3)|
 
 ## Build
 
@@ -200,8 +199,30 @@ For more information on Multi-Device plugin of OpenVINO™, please refer to the
 [Intel OpenVINO™ Multi Device Plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Running_on_multiple_devices.html).
 
 ### Export OpenVINO Compiled Blob 
-Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. The exported model is saved to the same directory as the source model with the suffix -ov_{device}_blob.onnx where device can be one of the supported like CPU or NPU. This feature is currently enabled for fully supported models only. 
-Refer to [Configuration Options](#configuration-options) for more information about using these runtime options.
+Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. This feature is currently enabled for fully supported models only. It complies with the ORT session config keys
+```
+  Ort::SessionOptions session_options;
+
+      // Enable EP context feature to dump the partitioned graph which includes the EP context into Onnx file.
+      // "0": disable. (default)
+      // "1": enable.
+
+  session_options.AddConfigEntry(kOrtSessionOptionEpContextEnable, "1");
+
+      // Flag to specify whether to dump the EP context into single Onnx model or pass bin path.
+      // "0": dump the EP context into separate file, keep the file name in the Onnx model.
+      // "1": dump the EP context into the Onnx model. (default).
+
+  session_options.AddConfigEntry(kOrtSessionOptionEpContextEmbedMode, "1");
+
+      // Specify the file path for the Onnx model which has EP context.
+      // Defaults to <actual_model_path>/original_file_name_ctx.onnx if not specified
+
+  session_options.AddConfigEntry(kOrtSessionOptionEpContextFilePath, ".\ov_compiled_epctx.onnx");
+
+  sess = onnxruntime.InferenceSession(<path_to_model_file>, session_options)
+```
+Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h) for more information about session options.
 
 ### Enable QDQ Optimizations Passes
 Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled. 
@@ -239,8 +260,7 @@ The session configuration options are passed to SessionOptionsAppendExecutionPro
 
 ```
 OrtOpenVINOProviderOptions options;
-options.device_type = "GPU";
-options.precision = "FP32"; 
+options.device_type = "GPU_FP32";
 options.num_of_threads = 8;
 options.cache_dir = "";
 options.context = 0x123456ff;
@@ -277,7 +297,6 @@ The following table lists all the available configuration options for API 2.0 an
 | context | string | OpenCL Context | void* | This option is only available when OpenVINO EP is built with OpenCL flags enabled. It takes in the remote context i.e the cl_context address as a void pointer.|
 | enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). |
 | enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. |
-| export_ep_ctx_blob | string | True/False | boolean | This options enables exporting the OpenVINO Compiled Blob as an ONNX Operator EPContext. | 
 
 
 Valid Hetero or Multi or Auto Device combinations:

diff --git a/docs/genai/howto/build-from-source.md b/docs/genai/howto/build-from-source.md
@@ -16,7 +16,7 @@ nav_order: 2
 ## Pre-requisites
 
 - `cmake`
-- `.Net v6` (if building C#)
+- `.NET6` (if building C#)
 
 ## Clone the onnxruntime-genai repo
 
@@ -25,11 +25,10 @@ git clone https://github.com/microsoft/onnxruntime-genai
 cd onnxruntime-genai
 ```
 
-## Install ONNX Runtime
+## Download ONNX Runtime binaries
 
 By default, the onnxruntime-genai build expects to find the ONNX Runtime include and binaries in a folder called `ort` in the root directory of onnxruntime-genai. You can put the ONNX Runtime files in a different location and specify this location to the onnxruntime-genai build via the --ort_home command line argument.
 
-### Option 1: Install from release
 
 These instructions assume you are in the `onnxruntime-genai` folder.
 
@@ -38,161 +37,96 @@ These instructions assume you are in the `onnxruntime-genai` folder.
 These instruction use `win-x64`. Replace this if you are using a different architecture.
 
 ```bash
-curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.18.0/onnxruntime-win-x64-1.18.0.zip -o onnxruntime-win-x64-1.18.0.zip
-tar xvf onnxruntime-win-x64-1.18.0.zip
-move onnxruntime-win-x64-1.18.0 ort 
+curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.19.2/onnxruntime-win-x64-1.19.2.zip -o onnxruntime-win-x64-1.19.2.zip
+tar xvf onnxruntime-win-x64-1.19.2.zip
+move onnxruntime-win-x64-1.19.2 ort 
 ```
 
 #### Linux and Mac
 
 These instruction use `linux-x64-gpu`. Replace this if you are using a different architecture.
 
 ```bash
-curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.18.0/onnxruntime-linux-x64-gpu-1.18.0.tgz -o onnxruntime-linux-x64-gpu-1.18.0.tgz
-tar xvzf onnxruntime-linux-x64-gpu-1.18.0.tgz
-mv onnxruntime-linux-x64-gpu-1.18.0 ort 
+curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.19.2/onnxruntime-linux-x64-gpu-1.19.2.tgz -o onnxruntime-linux-x64-gpu-1.19.2.tgz
+tar xvzf onnxruntime-linux-x64-gpu-1.19.2.tgz
+mv onnxruntime-linux-x64-gpu-1.19.2 ort 
 ```
 
-### Option 2: Install from nightly
+#### Android
 
-Download the nightly nuget package `Microsoft.ML.OnnxRuntime` from: https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly.
-
-Extract the nuget package.
-
-```bash
-tar xvf Microsoft.ML.OnnxRuntime.1.18.0-dev-20240322-0323-ca825cb6e6.nupkg
-```
-
-Copy the include and lib files into `ort`.
-
-On Windows
-
-Example is given for `win-x64`. Change this to your architecture if different.
-
-```cmd
-copy build\native\include\onnxruntime_c_api.h ort\include
-copy runtimes\win-x64\native\*.dll ort\lib
-```
-
-On Linux
-
-Example is given for `linux-x64`. Change this to your architecture if different.
-
-```cmd
-cp build/native/include/onnxruntime_c_api.h ort/include
-cp build/linux-x64/native/libonnxruntime*.so* ort/lib
-```      
-      
-### Option 3: Build from source
-
-#### Clone the onnxruntime repo 
+If you do not already have an `ort` folder, create one.
 
 ```bash
-cd ..
-git clone https://github.com/microsoft/onnxruntime.git
-cd onnxruntime
+mkdir ort
 ```
 
-#### Build ONNX Runtime for CPU on Windows
-
 ```bash
-build.bat --build_shared_lib --skip_tests --parallel --config Release
-copy include\onnxruntime\core\session\onnxruntime_c_api.h ..\onnxruntime-genai\ort\include
-copy build\Windows\Release\Release\*.dll ..\onnxruntime-genai\ort\lib
-copy build\Windows\Release\Release\onnxruntime.lib ..\onnxruntime-genai\ort\lib
-```
-
-#### Build ONNX Runtime for DirectML on Windows
-
-```bash
-build.bat --build_shared_lib --skip_tests --parallel --use_dml --config Release
-copy include\onnxruntime\core\session\onnxruntime_c_api.h ..\onnxruntime-genai\ort\include
-copy include\onnxruntime\core\providers\dml\dml_provider_factory.h ..\onnxruntime-genai\ort\include
-copy build\Windows\Release\Release\*.dll ..\onnxruntime-genai\ort\lib
-copy build\Windows\Release\Release\onnxruntime.lib ..\onnxruntime-genai\ort\lib
+curl -L https://repo1.maven.org/maven2/com/microsoft/onnxruntime/onnxruntime-android/1.19.2/onnxruntime-android-1.19.2.aar -o ort/onnxruntime-android-1.19.2.aar
+cd ort
+tar xvf onnxruntime-android-1.19.2.aar
+cd ..
 ```
 
+## Build the generate() API
 
-#### Build ONNX Runtime for CUDA on Windows
-
-```bash
-build.bat --build_shared_lib --skip_tests --parallel --use_cuda --config Release
-copy include\onnxruntime\core\session\onnxruntime_c_api.h ..\onnxruntime-genai\ort\include
-copy include\onnxruntime\core\providers\cuda\*.h ..\onnxruntime-genai\ort\include
-copy build\Windows\Release\Release\*.dll ..\onnxruntime-genai\ort\lib
-copy build\Windows\Release\Release\onnxruntime.lib ..\onnxruntime-genai\ort\lib
-```
+This step assumes that you are in the root of the onnxruntime-genai repo, and you have followed the previous steps to copy the onnxruntime headers and binaries into the folder specified by <ORT_HOME>, which defaults to `onnxruntime-genai/ort`.
 
-#### Build ONNX Runtime on Linux
+All of the build commands below have a `--config` argument, which takes the following options:
+- `Release` builds release binaries
+- `Debug` build binaries with debug symbols
+- `RelWithDebInfo` builds release binaries with debug info
 
-```bash
-./build.sh --build_shared_lib --skip_tests --parallel [--use_cuda] --config Release
-cp include/onnxruntime/core/session/onnxruntime_c_api.h ../onnxruntime-genai/ort/include
-cp build/Linux/Release/libonnxruntime*.so* ../onnxruntime-genai/ort/lib
-```
+### Build Python API
 
-You may need to provide extra command line options for building with CUDA on Linux. An example full command is as follows.
+#### Windows CPU build
 
 ```bash
-./build.sh --parallel --build_shared_lib --use_cuda --cuda_version 11.8 --cuda_home /usr/local/cuda-11.8 --cudnn_home /usr/lib/x86_64-linux-gnu/ --config Release --build_wheel --skip_tests --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="80" --cmake_extra_defines CMAKE_CUDA_COMPILER=/usr/local/cuda-11.8/bin/nvcc
+python build.py --config `Release`
 ```
 
-Replace the values given above for different versions and locations of CUDA.
-
-#### Build ONNX Runtime on Mac
+#### Windows DirectML build
 
 ```bash
-./build.sh --build_shared_lib --skip_tests --parallel --config Release
-cp include/onnxruntime/core/session/onnxruntime_c_api.h ../onnxruntime-genai/ort/include
-cp build/MacOS/Release/libonnxruntime*.dylib* ../onnxruntime-genai/ort/lib
+python build.py --use_dml --config `Release`
 ```
 
-## Build the generate() API
-
-This step assumes that you are in the root of the onnxruntime-genai repo, and you have followed the previos steps to copy the onnxruntime headers and binaries into the folder specified by <ORT_HOME>, which defaults to `onnxruntime-genai/ort`.
+#### Linux build
 
 ```bash
-cd ../onnxruntime-genai
+python build.py --config `Release`
 ```
 
-### Build Python API
-
-#### Build for Windows CPU
+#### Linux CUDA build
 
 ```bash
-python build.py
+python build.py --use_cuda --config `Release`
 ```
 
-#### Build for Windows DirectML
+#### Mac build
 
 ```bash
-python build.py --use_dml
+python build.py --config `Release`
 ```
 
-#### Build on Linux
+### Build Java API
 
 ```bash
-python build.py
+python build.py --build_java --config Release
 ```
 
-#### Build on Linux with CUDA
-
-```bash
-python build.py --use_cuda
-```
+### Build for Android
 
-#### Build on Mac
+If building on Windows, install `ninja`.
 
 ```bash
-python build.py
+pip install ninja
 ```
 
-### Build Java API
+Run the build script.
 
 ```bash
-python build.py --build_java --config Release
+python build.py --build_java --android --android_home <path to your Android SDK> --android_ndk_path <path to your NDK installation> --android_abi  [armeabi-v7a|arm64-v8a|x86|x86_64] --config Release
 ```
-Change config to Debug for debug builds.
 
 ## Install the library into your application
 
@@ -203,12 +137,28 @@ cd build/wheel
 pip install *.whl
 ```
 
-### Install .jar
+### Install NuGet
+
+_Coming soon_
+
+### Install JAR
 
 Copy `build/Windows/Release/src/java/build/libs/*.jar` into your application.
 
-### Install Nuget package
+### Install AAR
+
+Copy `build/Android/Release/src/java/build/android/outputs/aar/onnxruntime-genai-release.aar` into your application.
+
 
 ### Install C/C++ header file and library
 
-_Coming soon_
+#### Windows
+
+Use the header in `src\ort_genai.h` and the libraries in `build\Windows\Release`
+
+#### Linux
+
+Use the header in `src/ort_genai.h` and the libraries in `build/Linux/Release`
+
+
+