microsoft · guotuofeng · Apr 18, 2024 · Dec 7, 2023 · Dec 7, 2023 · Dec 7, 2023
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -22,7 +22,7 @@ repos:
     rev: v0.3.1
     hooks:
       - id: absolufy-imports
-        exclude: examples/directml/llama_v2/chat_app/
+        exclude: examples/directml/llm/chat_app/
   - repo: https://github.com/astral-sh/ruff-pre-commit
     # Ruff version.
     rev: v0.2.0

diff --git a/docs/source/examples.md b/docs/source/examples.md
@@ -11,7 +11,7 @@
 ||red pajama|[Link](https://github.com/microsoft/Olive/tree/main/examples/red_pajama)| `CPU`: with Optimum conversion and merging and ONNX Runtime optimizations for a single optimized ONNX model
 ||bert|[Link](https://github.com/microsoft/Olive/tree/main/examples/bert)|`CPU`: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model<br>`CPU`: with ONNX Runtime optimizations and Intel® Neural Compressor quantization for optimized INT8 ONNX model<br>`CPU`: with PyTorch QAT Customized Training Loop and ONNX Runtime optimizations for optimized ONNX INT8 model<br>`GPU`: with ONNX Runtime optimizations for CUDA EP<br>`GPU`: with ONNX Runtime optimizations for TRT EP
 ||deberta|[Link](https://github.com/microsoft/Olive/tree/main/examples/deberta)|`GPU`: Optimize Azureml Registry Model with ONNX Runtime optimizations and quantization
-||dolly_v2|[Link](https://github.com/microsoft/Olive/tree/main/examples/directml/dolly_v2)|`GPU`: with Optimum conversion and merging and ONNX Runtime optimizations with DirectML EP
+||Multiple LLMs|[Link](https://github.com/microsoft/Olive/tree/main/examples/directml/llm)|`GPU`: DirectML optimizations for multiple LLMs, including LLaMA 2, Mistral, Phi-2, Falcon, Zephyr, etc.
 ||gptj|[Link](https://github.com/microsoft/Olive/tree/main/examples/gptj)|`CPU`: with Intel® Neural Compressor static/dynamic quantization for INT8 ONNX model
 |Audio|whisper|[Link](https://github.com/microsoft/Olive/tree/main/examples/whisper)|`CPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP32<br>`CPU`: with ONNX Runtime optimizations for all-in-one ONNX model in INT8<br>`CPU`: with ONNX Runtime optimizations and Intel® Neural Compressor Dynamic Quantization for all-in-one ONNX model in INT8<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP32<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP16<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in INT8
 ||audio spectrogram<br>transformer|[Link](https://github.com/microsoft/Olive/tree/main/examples/AST)|`CPU`: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model

diff --git a/examples/directml/dolly_v2/.gitignore b/examples/directml/dolly_v2/.gitignore
diff --git a/examples/directml/dolly_v2/README.md b/examples/directml/dolly_v2/README.md
diff --git a/examples/directml/dolly_v2/config.py b/examples/directml/dolly_v2/config.py
diff --git a/examples/directml/dolly_v2/dolly_v2.py b/examples/directml/dolly_v2/dolly_v2.py
diff --git a/examples/directml/dolly_v2/requirements.txt b/examples/directml/dolly_v2/requirements.txt
diff --git a/examples/directml/dolly_v2/user_script.py b/examples/directml/dolly_v2/user_script.py
diff --git a/examples/directml/llama_v2/LICENSE b/examples/directml/llama_v2/LICENSE