Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry pick 1.17.3 - Round 2 #20178

Merged
merged 4 commits into from
Apr 3, 2024
Merged

Conversation

YUNQIUGUO
Copy link
Contributor

Description

Motivation and Context

snnn and others added 2 commits April 2, 2024 10:24
See the comments inside of the changed files for more detailed
information.

The file onnxruntime/core/platform/windows/hardware_core_enumerator.cc
and onnxruntime/core/platform/windows/hardware_core_enumerator.h were
copied from WinML source folder in this repo, with minor coding style
changes.

I had an offline discussion with Sheil. We agree that given the lack of
a future proof solution, we may check-in this temp fix first, and rework
it later. I will have a meeting with @ivberg for discussing the issue
deeply, and seeking for a long term solution. Thanks for offering help,
@ivberg !

With this change, we will see about 2x perf improvement on some Intel
CPUs.
### Description
This PR adds flash attention v2 and support for INT4 CUDA benchmarking
in PyTorch.

### Motivation and Context
The [flash attention v2](https://github.com/Dao-AILab/flash-attention)
algorithm helps improve model performance in PyTorch. Support for INT4
CUDA in PyTorch is done through the
[`bitsandbytes`](https://github.com/TimDettmers/bitsandbytes) package.
@YUNQIUGUO YUNQIUGUO marked this pull request as ready for review April 2, 2024 17:38
@YUNQIUGUO
Copy link
Contributor Author

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@YUNQIUGUO
Copy link
Contributor Author

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

@YUNQIUGUO
Copy link
Contributor Author

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

YUNQIUGUO and others added 2 commits April 2, 2024 17:31
### Description
<!-- Describe your changes. -->

See #19921 Just to address one comment:
#19921 (comment)

since this is an external branch. need to open another pull request for
this.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Sai Kishan Pampana <[email protected]>
Co-authored-by: rachguo <[email protected]>
Co-authored-by: Jian Chen <[email protected]>
Adds an example to demonstrate the export of openai whipser
implemenation with batch_size > 1 and addition of prompts for each audio
snippet.

Also handles the scenario for when prompts are not of the same size. For
example if our prompt ids are [p1_id_1, p1_id_2] and [p2_id_1], the
final decoder_input_ids will look as such after padding:
`[prev_token, p1_id_1, p1_id_2, start_token, lang_token,
transcribe_token]
[prev_token, p2_id_1, PAD_TOKEN, start_token, lang_token,
transcribe_token]`

---------

Co-authored-by: kunal-vaishnavi <[email protected]>
@YUNQIUGUO YUNQIUGUO requested review from kunal-vaishnavi, smk2007, snnn, jchen351, sophies927 and mszhanyi and removed request for kunal-vaishnavi and smk2007 April 3, 2024 00:32
@tianleiwu tianleiwu merged commit a61add2 into rel-1.17.3 Apr 3, 2024
102 of 108 checks passed
@tianleiwu tianleiwu deleted the yguo/cherry-pick-1.17.3-round2 branch April 3, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants