Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize SLN with large dimension #18138

Merged
merged 2 commits into from
Oct 30, 2023
Merged

optimize SLN with large dimension #18138

merged 2 commits into from
Oct 30, 2023

Conversation

yufenglee
Copy link
Member

Description

Optimize SkipLayerNorm for large dimension (>=2048) by handling 8 elements in one thread. It avoid the re-writing and re-loading sum of input, skip and bias to main memory. It reduces the latency of dimension 4096 with small batch size from ~18us to ~3.8us on A100.

Motivation and Context

@yufenglee yufenglee merged commit 90d1f53 into main Oct 30, 2023
@yufenglee yufenglee deleted the yufeng/SLN_large_dim branch October 30, 2023 21:12
tianleiwu pushed a commit that referenced this pull request Oct 31, 2023
### Description
<!-- Describe your changes. -->
Optimize SkipLayerNorm for large dimension (>=2048) by handling 8
elements in one thread. It avoid the re-writing and re-loading sum of
input, skip and bias to main memory. It reduces the latency of dimension
4096 with small batch size from ~18us to ~3.8us on A100.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
### Description
<!-- Describe your changes. -->
Optimize SkipLayerNorm for large dimension (>=2048) by handling 8
elements in one thread. It avoid the re-writing and re-loading sum of
input, skip and bias to main memory. It reduces the latency of dimension
4096 with small batch size from ~18us to ~3.8us on A100.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants