-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cherry pick 1.17.3 - Round 2 #20178
Cherry pick 1.17.3 - Round 2 #20178
Conversation
See the comments inside of the changed files for more detailed information. The file onnxruntime/core/platform/windows/hardware_core_enumerator.cc and onnxruntime/core/platform/windows/hardware_core_enumerator.h were copied from WinML source folder in this repo, with minor coding style changes. I had an offline discussion with Sheil. We agree that given the lack of a future proof solution, we may check-in this temp fix first, and rework it later. I will have a meeting with @ivberg for discussing the issue deeply, and seeking for a long term solution. Thanks for offering help, @ivberg ! With this change, we will see about 2x perf improvement on some Intel CPUs.
### Description This PR adds flash attention v2 and support for INT4 CUDA benchmarking in PyTorch. ### Motivation and Context The [flash attention v2](https://github.com/Dao-AILab/flash-attention) algorithm helps improve model performance in PyTorch. Support for INT4 CUDA in PyTorch is done through the [`bitsandbytes`](https://github.com/TimDettmers/bitsandbytes) package.
/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline |
/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline |
/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline |
Azure Pipelines successfully started running 10 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 2 pipeline(s). |
### Description <!-- Describe your changes. --> See #19921 Just to address one comment: #19921 (comment) since this is an external branch. need to open another pull request for this. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Sai Kishan Pampana <[email protected]> Co-authored-by: rachguo <[email protected]> Co-authored-by: Jian Chen <[email protected]>
Adds an example to demonstrate the export of openai whipser implemenation with batch_size > 1 and addition of prompts for each audio snippet. Also handles the scenario for when prompts are not of the same size. For example if our prompt ids are [p1_id_1, p1_id_2] and [p2_id_1], the final decoder_input_ids will look as such after padding: `[prev_token, p1_id_1, p1_id_2, start_token, lang_token, transcribe_token] [prev_token, p2_id_1, PAD_TOKEN, start_token, lang_token, transcribe_token]` --------- Co-authored-by: kunal-vaishnavi <[email protected]>
Description
Motivation and Context