Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Intel Thread Counts #22894

Merged
merged 2 commits into from
Dec 6, 2024
Merged

Conversation

A-Satti
Copy link
Contributor

@A-Satti A-Satti commented Nov 19, 2024

Description

The default thread count methodology by onnxruntime did not account for new upcoming Intel microarchitectures leading to a suboptimal thread count. Optimizing the thread count for new Intel microarchitectures reveal gains on the majority of models across datatypes and shows gains up to ~1.5x speedup.

Motivation and Context

Applications should run on Intel with the most performant thread configuration for the majority of models. With new microarchitectures, adjusting the thread count methodology is required to take advantage of their differences.

@tianleiwu
Copy link
Contributor

@jywu-msft jywu-msft requested a review from liqunfu November 21, 2024 00:11
@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

Copy link

Azure Pipelines successfully started running 8 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@tianleiwu tianleiwu merged commit f5293d2 into microsoft:main Dec 6, 2024
91 checks passed
@snnn
Copy link
Member

snnn commented Dec 9, 2024

@A-Satti , this PR removes "/Qspectre" compile flag, which is a critical security flag. Though the flag has performance penalty, we cannot trade security for performance. You are fine to not using this flag in your private build, but all ORT's official binaries must be built with this flag. Please add it back.

@A-Satti
Copy link
Contributor Author

A-Satti commented Dec 10, 2024

Hi @snnn created this PR to address #23060. Returning the original flag and only removed the stale Meteorlake flag and comment

ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
The default thread count methodology by onnxruntime did not account for
new upcoming Intel microarchitectures leading to a suboptimal thread
count. Optimizing the thread count for new Intel microarchitectures
reveal gains on the majority of models across datatypes and shows gains
up to ~1.5x speedup.


### Motivation and Context
Applications should run on Intel with the most performant thread
configuration for the majority of models. With new microarchitectures,
adjusting the thread count methodology is required to take advantage of
their differences.
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
The default thread count methodology by onnxruntime did not account for
new upcoming Intel microarchitectures leading to a suboptimal thread
count. Optimizing the thread count for new Intel microarchitectures
reveal gains on the majority of models across datatypes and shows gains
up to ~1.5x speedup.


### Motivation and Context
Applications should run on Intel with the most performant thread
configuration for the majority of models. With new microarchitectures,
adjusting the thread count methodology is required to take advantage of
their differences.
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
ankitm3k pushed a commit to intel/onnxruntime that referenced this pull request Dec 11, 2024
### Description
The default thread count methodology by onnxruntime did not account for
new upcoming Intel microarchitectures leading to a suboptimal thread
count. Optimizing the thread count for new Intel microarchitectures
reveal gains on the majority of models across datatypes and shows gains
up to ~1.5x speedup.


### Motivation and Context
Applications should run on Intel with the most performant thread
configuration for the majority of models. With new microarchitectures,
adjusting the thread count methodology is required to take advantage of
their differences.
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants