Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeightNorm causes NaN for Conv layer gradients #95

Closed
carterjgreen opened this issue Jul 20, 2022 · 3 comments · Fixed by #106
Closed

WeightNorm causes NaN for Conv layer gradients #95

carterjgreen opened this issue Jul 20, 2022 · 3 comments · Fixed by #106
Labels
bug Something isn't working

Comments

@carterjgreen
Copy link

When normalizing the bias of a conv layer, Zygote returns NaNs for the gradient of bias_v. This also happens with 2d conv layers. The gradient works as expected without normalizing the bias.

using Lux, Random, Zygote

function test_weightnorm()
    Random.seed!(12345)
    rng = Random.default_rng()
    x = randn(Float32, 300, 72, 32)

    model = WeightNorm(Conv((9,), 72=>72, stride=1, pad=1, dilation=1), (:weight, :bias))
    ps, st = Lux.setup(rng, model)

    ∇params, _ = gradient(ps, x) do p, x
        pred, _ = Lux.apply(model, x, p, st)
        sum(pred)
    end
    println(∇params[:normalized][:bias_v])
end

test_weightnorm()

prints

[NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN;;;]

with

Julia v1.7.3
Lux v0.4.9
NNlib v0.8.8
Zygote v0.6.41
@avik-pal
Copy link
Member

In general, I think (if you know of any contradicting example I am open to checking that) bias should not be normalized (basically anything that's completely zero). I will update the code to throw an error if such normalization is requested.

@avik-pal avik-pal added the bug Something isn't working label Jul 21, 2022
@carterjgreen
Copy link
Author

I think that you're right about bias not being normalized. The original paper doesn't mention normalizing the biases at all (I guess it is weight normalization after all). Throwing an error is a good call.

avik-pal added a commit that referenced this issue Jul 26, 2022
@avik-pal
Copy link
Member

Once #106 is merged. The following error will be thrown:

ERROR: ArgumentError: Parameter bias is completely zero. This will result in NaN gradients.
Either remove this parameter from `which_params` or modify the initialization in the actual layer.
Typically this is controlled using the `init_bias` keyword argument.

avik-pal added a commit that referenced this issue Jul 26, 2022
@avik-pal avik-pal linked a pull request Jul 26, 2022 that will close this issue
avik-pal added a commit that referenced this issue Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants