Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize LayerNormalization for better cache efficiency + SIMD #465

Merged
merged 3 commits into from
Dec 17, 2024

Conversation

robertknight
Copy link
Owner

Optimize the LayerNormalization implementation by:

  1. Re-organizing the process to be more cache efficient. Instead of applying each step in the process to the whole input before moving on to the next, apply the whole normalization process to each normalized slice individually. This means we only load each slice into cache once (assuming each slice fits in L1)
  2. Fusing the steps that normalize the variance and apply the bias and scale into one vectorized pass

Tested on docvqa this speeds up LayerNormalization in the encoder by 2.5-3x.

@robertknight robertknight force-pushed the simd-layer-norm branch 4 times, most recently from b9041a6 to 6ed9b72 Compare December 17, 2024 10:27
@robertknight robertknight marked this pull request as ready for review December 17, 2024 15:35
Instead of performing each step of normalization on the whole input before
moving onto the next, perform the full normalization over each input slice
before moving on to the next. This is more cache efficient. Also fuse and
vectorize the steps that scale the input to normalize the variance and apply
elementwise scales.

With these changes the operator is ~2.5-3x faster on x64 assuming the input is
already contiguous.

The `LayerNormalization` operator specification allows for the `bias` and
`scale` values to have any shape that can be broadcast to the input shape.
However actual models seen so far always set these shapes to match the
normalized axes of the input. Hence this change drops support for other
bias/scale input shapes for the time being.
@robertknight robertknight merged commit 0f9eac6 into main Dec 17, 2024
2 checks passed
@robertknight robertknight deleted the simd-layer-norm branch December 17, 2024 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant