Skip to content

Commit

Permalink
Merge branch 'gh-pages' into genai_page_addition
Browse files Browse the repository at this point in the history
  • Loading branch information
MaanavD authored Sep 11, 2024
2 parents 17bfb52 + fc3672c commit 1cc299b
Show file tree
Hide file tree
Showing 25 changed files with 337 additions and 300 deletions.
8 changes: 4 additions & 4 deletions docs/build/eps.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,13 +260,13 @@ See more information on the OpenVINO™ Execution Provider [here](../execution-p
### Prerequisites
{: .no_toc }

1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.1** for the appropriate OS and target hardware:
* [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
* [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?VERSION=v_2023_1_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)
1. Install the OpenVINO™ offline/online installer from Intel<sup>®</sup> Distribution of OpenVINO™<sup>TM</sup> Toolkit **Release 2024.3** for the appropriate OS and target hardware:
* [Windows - CPU, GPU, NPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=WINDOWS&DISTRIBUTION=ARCHIVE).
* [Linux - CPU, GPU](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html?PACKAGE=OPENVINO_BASE&VERSION=v_2024_3_0&OP_SYSTEM=LINUX&DISTRIBUTION=ARCHIVE)

Follow [documentation](https://docs.openvino.ai/2024/home.html) for detailed instructions.

*2024.1 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.1](https://docs.openvino.ai/archive/2023.1/home.html) is minimal OpenVINO™ version requirement.*
*2024.3 is the current recommended OpenVINO™ version. [OpenVINO™ 2023.3](https://docs.openvino.ai/2023.3/home.html) is minimal OpenVINO™ version requirement.*

2. Configure the target hardware with specific follow on instructions:
* To configure Intel<sup>®</sup> Processor Graphics(GPU) please follow these instructions: [Windows](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#gpu-guide-windows), [Linux](https://docs.openvino.ai/latest/openvino_docs_install_guides_configurations_for_intel_gpu.html#linux)
Expand Down
15 changes: 14 additions & 1 deletion docs/execution-providers/CoreML-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,10 +128,12 @@ Operators that are supported by the CoreML Execution Provider when a NeuralNetwo
|ai.onnx.ReduceSum||
|ai.onnx:Relu||
|ai.onnx:Reshape||
|ai.onnx:Resize||
|ai.onnx:Resize|4D input.<br/>`coordinate_transformation_mode` == `asymmetric`.<br/>`mode` == `linear` or `nearest`.<br/>`nearest_mode` == `floor`.<br/>`exclude_outside` == false<br/>`scales` or `sizes` must be constant.|
|ai.onnx:Shape|Attribute `start` with non-default value is not supported.<br/>Attribute `end` is not supported.|
|ai.onnx:Sigmoid||
|ai.onnx:Slice|Inputs `starts`, `ends`, `axes`, and `steps` should be constant. Empty slice is not supported.|
|ai.onnx:Softmax||
|ai.onnx:Split|If provided, `splits` must be constant.|
|ai.onnx:Squeeze||
|ai.onnx:Sqrt||
|ai.onnx:Sub||
Expand All @@ -147,15 +149,26 @@ Operators that are supported by the CoreML Execution Provider when a MLProgram m
|ai.onnx:Add||
|ai.onnx:AveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
|ai.onnx:Clip||
|ai.onnx:Concat||
|ai.onnx:Conv|Only 1D/2D Conv is supported.<br/>Bias if provided must be constant.|
|ai.onnx:ConvTranspose|Weight and bias must be constant.<br/>padding_type of SAME_UPPER/SAME_LOWER is not supported.<br/>kernel_shape must have default values.<br/>output_shape is not supported.<br/>output_padding must have default values.|
|ai.onnx.DepthToSpace|If 'mode' is 'CRD' the input must have a fixed shape.|
|ai.onnx:Div||
|ai.onnx:Gemm|Input B must be constant.|
|ai.onnx:GlobalAveragePool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
|ai.onnx:GlobalMaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
|ai.onnx:GridSample|4D input.<br/>'mode' of 'linear' or 'zeros'.<br/>(mode==linear && padding_mode==reflection && align_corners==0) is not supported.|
|ai.onnx.LeakyRelu||
|ai.onnx:MatMul|Only support for transA == 0, alpha == 1.0 and beta == 1.0 is currently implemented.|
|ai.onnx:MaxPool|Only 2D Pool is supported currently. 3D and 5D support can be added if needed.|
|ai.onnx:Mul||
|ai.onnx:Pow|Only supports cases when both inputs are fp32.|
|ai.onnx:Relu||
|ai.onnx:Reshape||
|ai.onnx:Resize|See [resize_op_builder.cc](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/coreml/builders/impl/resize_op_builder.cc) implementation. There are too many permutations to describe the valid combinations.|
|ai.onnx.Slice|starts/ends/axes/steps must be constant initializers.|
|ai.onnx.Split|If provided, `splits` must be constant.|
|ai.onnx:Sub||
|ai.onnx:Sigmoid||
|ai.onnx:Tanh||
|ai.onnx:Transpose||
37 changes: 28 additions & 9 deletions docs/execution-providers/OpenVINO-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Accelerate ONNX models on Intel CPUs, GPUs, NPU with Intel OpenVINO™ Execution
## Install

Pre-built packages and Docker images are published for OpenVINO™ Execution Provider for ONNX Runtime by Intel for each release.
* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.2 Release](https://github.com/intel/onnxruntime/releases)
* OpenVINO™ Execution Provider for ONNX Runtime Release page: [Latest v5.4 Release](https://github.com/intel/onnxruntime/releases)
* Python wheels Ubuntu/Windows: [onnxruntime-openvino](https://pypi.org/project/onnxruntime-openvino/)
* Docker image: [openvino/onnxruntime_ep_ubuntu20](https://hub.docker.com/r/openvino/onnxruntime_ep_ubuntu20)

Expand All @@ -30,10 +30,9 @@ ONNX Runtime OpenVINO™ Execution Provider is compatible with three lastest rel

|ONNX Runtime|OpenVINO™|Notes|
|---|---|---|
|1.19.0|2024.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.4)|
|1.18.0|2024.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.3)|
|1.17.1|2023.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.2)|
|1.16.0|2023.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.1)|
|1.15.0|2023.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.0.0)|
|1.14.0|2022.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v4.3)|

## Build

Expand Down Expand Up @@ -200,8 +199,30 @@ For more information on Multi-Device plugin of OpenVINO™, please refer to the
[Intel OpenVINO™ Multi Device Plugin](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Running_on_multiple_devices.html).
### Export OpenVINO Compiled Blob
Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. The exported model is saved to the same directory as the source model with the suffix -ov_{device}_blob.onnx where device can be one of the supported like CPU or NPU. This feature is currently enabled for fully supported models only.
Refer to [Configuration Options](#configuration-options) for more information about using these runtime options.
Export the OpenVINO compiled blob as an ONNX model. Using this ONNX model for subsequent inferences avoids model recompilation and could have a positive impact on Session creation time. This feature is currently enabled for fully supported models only. It complies with the ORT session config keys
```
Ort::SessionOptions session_options;
// Enable EP context feature to dump the partitioned graph which includes the EP context into Onnx file.
// "0": disable. (default)
// "1": enable.
session_options.AddConfigEntry(kOrtSessionOptionEpContextEnable, "1");
// Flag to specify whether to dump the EP context into single Onnx model or pass bin path.
// "0": dump the EP context into separate file, keep the file name in the Onnx model.
// "1": dump the EP context into the Onnx model. (default).
session_options.AddConfigEntry(kOrtSessionOptionEpContextEmbedMode, "1");
// Specify the file path for the Onnx model which has EP context.
// Defaults to <actual_model_path>/original_file_name_ctx.onnx if not specified
session_options.AddConfigEntry(kOrtSessionOptionEpContextFilePath, ".\ov_compiled_epctx.onnx");
sess = onnxruntime.InferenceSession(<path_to_model_file>, session_options)
```
Refer to [Session Options](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/session/onnxruntime_session_options_config_keys.h) for more information about session options.
### Enable QDQ Optimizations Passes
Optimizes ORT quantized models for the NPU device to only keep QDQs for supported ops and optimize for performance and accuracy.Generally this feature will give better performance/accuracy with ORT Optimizations disabled.
Expand Down Expand Up @@ -239,8 +260,7 @@ The session configuration options are passed to SessionOptionsAppendExecutionPro
```
OrtOpenVINOProviderOptions options;
options.device_type = "GPU";
options.precision = "FP32";
options.device_type = "GPU_FP32";
options.num_of_threads = 8;
options.cache_dir = "";
options.context = 0x123456ff;
Expand Down Expand Up @@ -277,7 +297,6 @@ The following table lists all the available configuration options for API 2.0 an
| context | string | OpenCL Context | void* | This option is only available when OpenVINO EP is built with OpenCL flags enabled. It takes in the remote context i.e the cl_context address as a void pointer.|
| enable_opencl_throttling | string | True/False | boolean | This option enables OpenCL queue throttling for GPU devices (reduces CPU utilization when using GPU). |
| enable_qdq_optimizer | string | True/False | boolean | This option enables QDQ Optimization to improve model performance and accuracy on NPU. |
| export_ep_ctx_blob | string | True/False | boolean | This options enables exporting the OpenVINO Compiled Blob as an ONNX Operator EPContext. |
Valid Hetero or Multi or Auto Device combinations:
Expand Down
166 changes: 58 additions & 108 deletions docs/genai/howto/build-from-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ nav_order: 2
## Pre-requisites

- `cmake`
- `.Net v6` (if building C#)
- `.NET6` (if building C#)

## Clone the onnxruntime-genai repo

Expand All @@ -25,11 +25,10 @@ git clone https://github.com/microsoft/onnxruntime-genai
cd onnxruntime-genai
```

## Install ONNX Runtime
## Download ONNX Runtime binaries

By default, the onnxruntime-genai build expects to find the ONNX Runtime include and binaries in a folder called `ort` in the root directory of onnxruntime-genai. You can put the ONNX Runtime files in a different location and specify this location to the onnxruntime-genai build via the --ort_home command line argument.

### Option 1: Install from release

These instructions assume you are in the `onnxruntime-genai` folder.

Expand All @@ -38,161 +37,96 @@ These instructions assume you are in the `onnxruntime-genai` folder.
These instruction use `win-x64`. Replace this if you are using a different architecture.

```bash
curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.18.0/onnxruntime-win-x64-1.18.0.zip -o onnxruntime-win-x64-1.18.0.zip
tar xvf onnxruntime-win-x64-1.18.0.zip
move onnxruntime-win-x64-1.18.0 ort
curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.19.2/onnxruntime-win-x64-1.19.2.zip -o onnxruntime-win-x64-1.19.2.zip
tar xvf onnxruntime-win-x64-1.19.2.zip
move onnxruntime-win-x64-1.19.2 ort
```

#### Linux and Mac

These instruction use `linux-x64-gpu`. Replace this if you are using a different architecture.

```bash
curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.18.0/onnxruntime-linux-x64-gpu-1.18.0.tgz -o onnxruntime-linux-x64-gpu-1.18.0.tgz
tar xvzf onnxruntime-linux-x64-gpu-1.18.0.tgz
mv onnxruntime-linux-x64-gpu-1.18.0 ort
curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.19.2/onnxruntime-linux-x64-gpu-1.19.2.tgz -o onnxruntime-linux-x64-gpu-1.19.2.tgz
tar xvzf onnxruntime-linux-x64-gpu-1.19.2.tgz
mv onnxruntime-linux-x64-gpu-1.19.2 ort
```

### Option 2: Install from nightly
#### Android

Download the nightly nuget package `Microsoft.ML.OnnxRuntime` from: https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly.

Extract the nuget package.

```bash
tar xvf Microsoft.ML.OnnxRuntime.1.18.0-dev-20240322-0323-ca825cb6e6.nupkg
```

Copy the include and lib files into `ort`.

On Windows

Example is given for `win-x64`. Change this to your architecture if different.

```cmd
copy build\native\include\onnxruntime_c_api.h ort\include
copy runtimes\win-x64\native\*.dll ort\lib
```

On Linux

Example is given for `linux-x64`. Change this to your architecture if different.

```cmd
cp build/native/include/onnxruntime_c_api.h ort/include
cp build/linux-x64/native/libonnxruntime*.so* ort/lib
```
### Option 3: Build from source

#### Clone the onnxruntime repo
If you do not already have an `ort` folder, create one.

```bash
cd ..
git clone https://github.com/microsoft/onnxruntime.git
cd onnxruntime
mkdir ort
```

#### Build ONNX Runtime for CPU on Windows

```bash
build.bat --build_shared_lib --skip_tests --parallel --config Release
copy include\onnxruntime\core\session\onnxruntime_c_api.h ..\onnxruntime-genai\ort\include
copy build\Windows\Release\Release\*.dll ..\onnxruntime-genai\ort\lib
copy build\Windows\Release\Release\onnxruntime.lib ..\onnxruntime-genai\ort\lib
```

#### Build ONNX Runtime for DirectML on Windows

```bash
build.bat --build_shared_lib --skip_tests --parallel --use_dml --config Release
copy include\onnxruntime\core\session\onnxruntime_c_api.h ..\onnxruntime-genai\ort\include
copy include\onnxruntime\core\providers\dml\dml_provider_factory.h ..\onnxruntime-genai\ort\include
copy build\Windows\Release\Release\*.dll ..\onnxruntime-genai\ort\lib
copy build\Windows\Release\Release\onnxruntime.lib ..\onnxruntime-genai\ort\lib
curl -L https://repo1.maven.org/maven2/com/microsoft/onnxruntime/onnxruntime-android/1.19.2/onnxruntime-android-1.19.2.aar -o ort/onnxruntime-android-1.19.2.aar
cd ort
tar xvf onnxruntime-android-1.19.2.aar
cd ..
```

## Build the generate() API

#### Build ONNX Runtime for CUDA on Windows

```bash
build.bat --build_shared_lib --skip_tests --parallel --use_cuda --config Release
copy include\onnxruntime\core\session\onnxruntime_c_api.h ..\onnxruntime-genai\ort\include
copy include\onnxruntime\core\providers\cuda\*.h ..\onnxruntime-genai\ort\include
copy build\Windows\Release\Release\*.dll ..\onnxruntime-genai\ort\lib
copy build\Windows\Release\Release\onnxruntime.lib ..\onnxruntime-genai\ort\lib
```
This step assumes that you are in the root of the onnxruntime-genai repo, and you have followed the previous steps to copy the onnxruntime headers and binaries into the folder specified by <ORT_HOME>, which defaults to `onnxruntime-genai/ort`.

#### Build ONNX Runtime on Linux
All of the build commands below have a `--config` argument, which takes the following options:
- `Release` builds release binaries
- `Debug` build binaries with debug symbols
- `RelWithDebInfo` builds release binaries with debug info

```bash
./build.sh --build_shared_lib --skip_tests --parallel [--use_cuda] --config Release
cp include/onnxruntime/core/session/onnxruntime_c_api.h ../onnxruntime-genai/ort/include
cp build/Linux/Release/libonnxruntime*.so* ../onnxruntime-genai/ort/lib
```
### Build Python API

You may need to provide extra command line options for building with CUDA on Linux. An example full command is as follows.
#### Windows CPU build

```bash
./build.sh --parallel --build_shared_lib --use_cuda --cuda_version 11.8 --cuda_home /usr/local/cuda-11.8 --cudnn_home /usr/lib/x86_64-linux-gnu/ --config Release --build_wheel --skip_tests --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES="80" --cmake_extra_defines CMAKE_CUDA_COMPILER=/usr/local/cuda-11.8/bin/nvcc
python build.py --config `Release`
```

Replace the values given above for different versions and locations of CUDA.

#### Build ONNX Runtime on Mac
#### Windows DirectML build

```bash
./build.sh --build_shared_lib --skip_tests --parallel --config Release
cp include/onnxruntime/core/session/onnxruntime_c_api.h ../onnxruntime-genai/ort/include
cp build/MacOS/Release/libonnxruntime*.dylib* ../onnxruntime-genai/ort/lib
python build.py --use_dml --config `Release`
```

## Build the generate() API

This step assumes that you are in the root of the onnxruntime-genai repo, and you have followed the previos steps to copy the onnxruntime headers and binaries into the folder specified by <ORT_HOME>, which defaults to `onnxruntime-genai/ort`.
#### Linux build

```bash
cd ../onnxruntime-genai
python build.py --config `Release`
```

### Build Python API

#### Build for Windows CPU
#### Linux CUDA build

```bash
python build.py
python build.py --use_cuda --config `Release`
```

#### Build for Windows DirectML
#### Mac build

```bash
python build.py --use_dml
python build.py --config `Release`
```

#### Build on Linux
### Build Java API

```bash
python build.py
python build.py --build_java --config Release
```

#### Build on Linux with CUDA

```bash
python build.py --use_cuda
```
### Build for Android

#### Build on Mac
If building on Windows, install `ninja`.

```bash
python build.py
pip install ninja
```

### Build Java API
Run the build script.

```bash
python build.py --build_java --config Release
python build.py --build_java --android --android_home <path to your Android SDK> --android_ndk_path <path to your NDK installation> --android_abi [armeabi-v7a|arm64-v8a|x86|x86_64] --config Release
```
Change config to Debug for debug builds.

## Install the library into your application

Expand All @@ -203,12 +137,28 @@ cd build/wheel
pip install *.whl
```

### Install .jar
### Install NuGet

_Coming soon_

### Install JAR

Copy `build/Windows/Release/src/java/build/libs/*.jar` into your application.

### Install Nuget package
### Install AAR

Copy `build/Android/Release/src/java/build/android/outputs/aar/onnxruntime-genai-release.aar` into your application.


### Install C/C++ header file and library

_Coming soon_
#### Windows

Use the header in `src\ort_genai.h` and the libraries in `build\Windows\Release`

#### Linux

Use the header in `src/ort_genai.h` and the libraries in `build/Linux/Release`



Loading

0 comments on commit 1cc299b

Please sign in to comment.