Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU EP] optimize qlinearsoftmax #22686

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

wejoncy
Copy link
Contributor

@wejoncy wejoncy commented Nov 1, 2024

Description

X86-64
AVX2/AVX512

ARM
Not implemented

Improves

shape instruction cycles
1x1917x81 Naive 778138
1x1917x81 AVX2 233498

Motivation and Context

@wejoncy wejoncy requested a review from yihonglyu November 1, 2024 05:43
onnxruntime/test/mlas/bench/bench_qsoftmax.cpp Outdated Show resolved Hide resolved
onnxruntime/test/mlas/bench/bench_qsoftmax.cpp Outdated Show resolved Hide resolved
onnxruntime/test/mlas/bench/bench_qsoftmax.cpp Outdated Show resolved Hide resolved
@wejoncy wejoncy force-pushed the jicwen/optimize_qsoftmax branch from 32a8c8c to 6df1676 Compare November 1, 2024 10:25
@wejoncy wejoncy force-pushed the jicwen/optimize_qsoftmax branch from 4233379 to f7d97e1 Compare November 1, 2024 12:24
@wejoncy wejoncy marked this pull request as ready for review November 5, 2024 03:30
@wejoncy wejoncy requested a review from a team as a code owner November 5, 2024 03:30
@yihonglyu
Copy link
Contributor

yihonglyu commented Nov 7, 2024

Description

X86-64 AVX2/AVX512

ARM Not implemented

Improves

shape instruction cycles
1x1917x81 Naive 778138
1x1917x81 AVX2 233498

Motivation and Context

Could you also include cycles for Softmax using the existing microbenchmark or by adding a new one?"

@wejoncy
Copy link
Contributor Author

wejoncy commented Nov 7, 2024

include cycles for Softmax

What do you mean by " include cycles for Softmax" ?

Mlas BENCHMARK_CAPTURE should have timer function to estimate the time cost.

@yihonglyu
Copy link
Contributor

include cycles for Softmax

What do you mean by " include cycles for Softmax" ?

Mlas BENCHMARK_CAPTURE should have timer function to estimate the time cost.

Given that the unoptimized QLinearSoftmax is slower than Softmax, I wanted to include the Softmax number as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants