-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimize LayerNormalization for better cache efficiency + SIMD usage
Instead of performing each step of normalization on the whole input before moving onto the next, perform the full normalization over each input slice before moving on to the next. This is more cache efficient. Also fuse and vectorize the steps that scale the input to normalize the variance and apply elementwise scales. With these changes the operator is ~2.5-3x faster on x64 assuming the input is already contiguous. The `LayerNormalization` operator specification allows for the `bias` and `scale` values to have any shape that can be broadcast to the input shape. However actual models seen so far always set these shapes to match the normalized axes of the input. Hence this change drops support for other bias/scale input shapes for the time being.
- Loading branch information
1 parent
f9ad7af
commit 1e03a4f
Showing
3 changed files
with
69 additions
and
70 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters