-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement normalization methods(BatchNorm/LayerNorm/BatchRenorm) as functions in a common header file #5685
Comments
Layer normalization is just transposing the input of batch norm (moving average of mean and std is no longer needed), is it possible that just wrap the batch norm to implement the layer normalization? |
Another potentially usefully normalization method I am interested in is weight normalization. As I understand, most of the user-defined normalization can be implemented by the combination of some primary arithmetical operators. Implement a certain normalization method in an independent operator is helpful to improve the computation and memory efficiency (most time many intermediate computation results can be simplified by manually check the formulas). But I did not check all the potential normalization methods. |
Yes, we could, and that's what TensorFlow does in their repo. They wrap it around a non-fused BatchNorm implementation. There are some subtleties about LayerNorm: the sizes of estimated mean and variance are batch size rather than channel size, but they bypass it using broadcasting property. However, I think it's not optimal. As you have already pointed out, non-fused BatchNorm is significantly slower, it makes a big difference if we want to use it in deep models. Since one big advantage of LayerNorm is it is directly usable in RNN units, it also makes sense to reuse code between standalone LayerNorm layer and "LSTMUnitsWithLayerNorm". |
I also checked weight normalization. It seems simple enough to be implemented in basic operators efficiently, so I didn't mention it in the title. If it turns out to be necessary, we can also write it in the common header. |
Current Status
In the new Paddle core, we currently have a
batch_norm_op
which essentially is an implementation of fused spatial batch normalization method. Like most other operators, most computations are directly included in BatchNorm kernels.Suggestion
I suggest we abstract away some normalization calculations and implement them as functions/functors in a common header file like
normalization.h
to better reuse existing code.Reasons
The text was updated successfully, but these errors were encountered: