-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WeightNorm causes NaN for Conv layer gradients #95
Comments
In general, I think (if you know of any contradicting example I am open to checking that) |
I think that you're right about bias not being normalized. The original paper doesn't mention normalizing the biases at all (I guess it is weight normalization after all). Throwing an error is a good call. |
Once #106 is merged. The following error will be thrown: ERROR: ArgumentError: Parameter bias is completely zero. This will result in NaN gradients.
Either remove this parameter from `which_params` or modify the initialization in the actual layer.
Typically this is controlled using the `init_bias` keyword argument. |
When normalizing the bias of a conv layer, Zygote returns NaNs for the gradient of bias_v. This also happens with 2d conv layers. The gradient works as expected without normalizing the bias.
prints
with
The text was updated successfully, but these errors were encountered: