Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLM sample for DirectML #1082

Merged
merged 57 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
0b55774
Add GQA support (batch version still looks buggy?)
PatriceVignola Dec 7, 2023
b3dba4c
Fix batching
PatriceVignola Dec 7, 2023
9a17f08
Fix chat app pipeline
PatriceVignola Dec 7, 2023
4e3b686
Remove argmax sampling model
PatriceVignola Dec 8, 2023
acbff0d
Merge branch 'main' of https://github.com/microsoft/Olive into user/p…
PatriceVignola Dec 17, 2023
0cbf150
Use HF version of LLaMA
PatriceVignola Dec 17, 2023
b7c7f3e
Use json config from huggingface instead of tokenizer.model
PatriceVignola Dec 17, 2023
bc92843
Add kv broadcasting (not working yet)
PatriceVignola Dec 18, 2023
69ab184
Change key permutate location to enable MHA fusion
PatriceVignola Dec 18, 2023
de3d8d0
Onboard mistral
PatriceVignola Dec 18, 2023
16002a5
Rename the llama_v2 example to llm
PatriceVignola Dec 18, 2023
e73a376
Fix chat app
PatriceVignola Dec 18, 2023
8fdbc74
Remove non-chat llama from list of supported models
PatriceVignola Dec 18, 2023
9e13431
Allow multiple models to be generated at the same time
PatriceVignola Dec 18, 2023
ba3d44c
Remove llama-2-7b from the list of models
PatriceVignola Dec 18, 2023
3df03ed
Fix models
PatriceVignola Dec 19, 2023
b9d2b6e
Remove onnxruntime-directml 1.16.2 dependency
PatriceVignola Jan 4, 2024
f06a339
Perf improvements
PatriceVignola Jan 17, 2024
79c1a56
Use torch LayerNorm instead of custom one
PatriceVignola Jan 21, 2024
884e5ad
Add residual connection post layernorm option
PatriceVignola Jan 21, 2024
2852811
Add option to keep LayerNorm nodes in fp32
PatriceVignola Jan 21, 2024
ccc3e97
Include model_type parameter
PatriceVignola Jan 21, 2024
a28a01e
Fix layer norm
PatriceVignola Jan 21, 2024
212b060
Add default chat template option
PatriceVignola Jan 21, 2024
b37949b
Add LLaVA support
PatriceVignola Jan 28, 2024
437b504
Fix chat app
PatriceVignola Jan 28, 2024
69130ba
Add codellama
PatriceVignola Jan 30, 2024
f141cbc
Add OpenOrca Mistral
PatriceVignola Jan 30, 2024
f7dbe6d
Add tiiuae/falcon-7b-instruct model support to llm-combined (#907)
aamajumder Feb 1, 2024
04223f9
Onboard more models
PatriceVignola Feb 1, 2024
9aa5acd
Merge branch 'user/pavignol/llm-combined' of https://github.com/micro…
PatriceVignola Feb 1, 2024
6947392
WIP
PatriceVignola Feb 1, 2024
5381d24
Add AWQ (WIP)
PatriceVignola Feb 20, 2024
7577f72
Cleanup
PatriceVignola Feb 20, 2024
f29a123
Add Phi support
PatriceVignola Mar 5, 2024
496f977
Fix opset issue
PatriceVignola Mar 21, 2024
3961bd2
WIP
PatriceVignola Apr 14, 2024
92dcd23
Merge branch 'main' of https://github.com/microsoft/Olive into user/p…
PatriceVignola Apr 14, 2024
5d54bda
Fix Phi-2
PatriceVignola Apr 14, 2024
7e10835
Update README and delete phi/dolly-v2 folders
PatriceVignola Apr 14, 2024
4c5e9e9
Add block_size and bit_size options
PatriceVignola Apr 15, 2024
6ef9bd6
Add local model support
PatriceVignola Apr 15, 2024
c73f55d
Merge branch 'main' of https://github.com/microsoft/Olive into user/p…
PatriceVignola Apr 15, 2024
1e0f8d0
Revert optimum_merging changes
PatriceVignola Apr 15, 2024
25fdc62
Address PR comments
PatriceVignola Apr 15, 2024
f2846a5
Fix lint errors
PatriceVignola Apr 15, 2024
04b46c2
Use sys.exit(1) instead of exit(1)
PatriceVignola Apr 15, 2024
4346041
Fix lint errors
PatriceVignola Apr 15, 2024
b7637a9
Update examples.md
PatriceVignola Apr 15, 2024
fa18cab
Fixes
PatriceVignola Apr 15, 2024
6225f08
Remove dolly from example.md
PatriceVignola Apr 15, 2024
01341af
Make NUM_KEY_VALUE_HEADS_NAMES a tuple
PatriceVignola Apr 15, 2024
424dbcd
Address PR comments
PatriceVignola Apr 16, 2024
b90f9f4
Address PR comments
PatriceVignola Apr 16, 2024
eafe308
Fix dataloader and update neural-compressor dependency
PatriceVignola Apr 16, 2024
0c70dae
Merge branch 'main' of https://github.com/microsoft/Olive into user/p…
PatriceVignola Apr 16, 2024
be09c7a
Address PR comments
PatriceVignola Apr 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ repos:
rev: v0.3.1
hooks:
- id: absolufy-imports
exclude: examples/directml/llama_v2/chat_app/
exclude: examples/directml/llm/chat_app/
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.2.0
Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
||red pajama|[Link](https://github.com/microsoft/Olive/tree/main/examples/red_pajama)| `CPU`: with Optimum conversion and merging and ONNX Runtime optimizations for a single optimized ONNX model
||bert|[Link](https://github.com/microsoft/Olive/tree/main/examples/bert)|`CPU`: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model<br>`CPU`: with ONNX Runtime optimizations and Intel® Neural Compressor quantization for optimized INT8 ONNX model<br>`CPU`: with PyTorch QAT Customized Training Loop and ONNX Runtime optimizations for optimized ONNX INT8 model<br>`GPU`: with ONNX Runtime optimizations for CUDA EP<br>`GPU`: with ONNX Runtime optimizations for TRT EP
||deberta|[Link](https://github.com/microsoft/Olive/tree/main/examples/deberta)|`GPU`: Optimize Azureml Registry Model with ONNX Runtime optimizations and quantization
||dolly_v2|[Link](https://github.com/microsoft/Olive/tree/main/examples/directml/dolly_v2)|`GPU`: with Optimum conversion and merging and ONNX Runtime optimizations with DirectML EP
||Multiple LLMs|[Link](https://github.com/microsoft/Olive/tree/main/examples/directml/llm)|`GPU`: DirectML optimizations for multiple LLMs, including LLaMA 2, Mistral, Phi-2, Falcon, Zephyr, etc.
||gptj|[Link](https://github.com/microsoft/Olive/tree/main/examples/gptj)|`CPU`: with Intel® Neural Compressor static/dynamic quantization for INT8 ONNX model
guotuofeng marked this conversation as resolved.
Show resolved Hide resolved
|Audio|whisper|[Link](https://github.com/microsoft/Olive/tree/main/examples/whisper)|`CPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP32<br>`CPU`: with ONNX Runtime optimizations for all-in-one ONNX model in INT8<br>`CPU`: with ONNX Runtime optimizations and Intel® Neural Compressor Dynamic Quantization for all-in-one ONNX model in INT8<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP32<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in FP16<br>`GPU`: with ONNX Runtime optimizations for all-in-one ONNX model in INT8
||audio spectrogram<br>transformer|[Link](https://github.com/microsoft/Olive/tree/main/examples/AST)|`CPU`: with ONNX Runtime optimizations and quantization for optimized INT8 ONNX model
Expand Down
1 change: 0 additions & 1 deletion examples/directml/dolly_v2/.gitignore

This file was deleted.

43 changes: 0 additions & 43 deletions examples/directml/dolly_v2/README.md

This file was deleted.

6 changes: 0 additions & 6 deletions examples/directml/dolly_v2/config.py

This file was deleted.

132 changes: 0 additions & 132 deletions examples/directml/dolly_v2/dolly_v2.py

This file was deleted.

7 changes: 0 additions & 7 deletions examples/directml/dolly_v2/requirements.txt

This file was deleted.

45 changes: 0 additions & 45 deletions examples/directml/dolly_v2/user_script.py

This file was deleted.

125 changes: 0 additions & 125 deletions examples/directml/llama_v2/LICENSE

This file was deleted.

Loading
Loading