Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update phi3-v.md with DirectML #21868

Merged
merged 7 commits into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 49 additions & 13 deletions docs/genai/tutorials/phi3-v.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ image: /images/coffee.png

The Phi-3 vision model is a small, but powerful multi modal model that allows you to use both image and text to output text. It is used in scenarios such as describing the content of images in detail.

The Phi-3 vision model is supported by versions of onnxruntime-genai 0.3.0-rc2 and later.
The Phi-3 vision model is supported by versions of onnxruntime-genai 0.3.0 and later.

You can download the models here:

* [https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu)
* [https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-directml](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-directml)
* [https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cuda](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cuda)

Support for DirectML is coming soon!

* TOC placeholder
{:toc}
Expand All @@ -46,20 +46,18 @@ Support for DirectML is coming soon!
## Choose your platform

If you have an NVIDIA GPU, that will give the best performance right now.

The models will also run on CPU, but they will be slower.

Support for Windows machines with GPUs other than NVIDIA is coming soon!

**Note: Only one package and model is required based on your hardware. That is, only execute the steps for one of the following sections**


## Run with NVIDIA CUDA

1. Download the model

```bash
huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx-cuda --include cuda-int4-rtn-block-32/* --local-dir .
```

This command downloads the model into a folder called `cuda-int4-rtn-block-32`.

2. Setup your CUDA environment
Expand All @@ -74,15 +72,13 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!
* CUDA 11

```bash
pip install numpy
pip install --pre onnxruntime-genai-cuda --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-genai/pypi/simple/
pip install onnxruntime-genai-cuda --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/
```

* CUDA 12

```bash
pip install numpy
pip install onnxruntime-genai-cuda --pre --index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
pip install onnxruntime-genai-cuda
```

4. Run the model
Expand All @@ -91,6 +87,7 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!

```bash
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3v.py -o phi3v.py
pip install pyreadline3
natke marked this conversation as resolved.
Show resolved Hide resolved
python phi3v.py -m cuda-int4-rtn-block-32
```

Expand All @@ -117,9 +114,8 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!

2. Install the generate() API for CPU

```
pip install numpy
pip install --pre onnxruntime-genai
```bash
pip install onnxruntime-genai
```

3. Run the model
Expand All @@ -128,6 +124,7 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!

```bash
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3v.py -o phi3v.py
pip install pyreadline3
python phi3v.py -m cpu-int4-rtn-block-32-acc-level-4
```

Expand All @@ -152,3 +149,42 @@ Support for Windows machines with GPUs other than NVIDIA is coming soon!
The products include Chocolade, Gummibarchen, Scottish Longbreads, Sir Rodney's Scones, Tarte au sucre,
and Chocolate Biscuits. The Grand Total column sums up the sales for each product across the two quarters.</s>
```

## Run with DirectML
natke marked this conversation as resolved.
Show resolved Hide resolved

1. Download the model

```bash
huggingface-cli download microsoft/Phi-3-vision-128k-instruct-onnx-directml --include directml-int4-rtn-block-32/* --local-dir .
```

This command downloads the model into a folder called `directml-int4-rtn-block-32`.

2. Install the generate() API

```bash
pip install onnxruntime-genai-directml
```

3. Run the model

Run the model with [phi3v.py](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3v.py).

```bash
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3v.py -o phi3v.py
pip install pyreadline3
python phi3v.py -m directml-int4-rtn-block-32
```

Enter the path to an image file and a prompt. The model uses the image and prompt to give you an answer.

For example: `What does the sign say?`

![coffee](../../../images/nashville.jpg)

```
The sign says 'DO NOT ENTER'.
```



Binary file added images/nashville.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading