WeightNorm causes NaN for Conv layer gradients #95

carterjgreen · 2022-07-20T20:36:37Z

When normalizing the bias of a conv layer, Zygote returns NaNs for the gradient of bias_v. This also happens with 2d conv layers. The gradient works as expected without normalizing the bias.

using Lux, Random, Zygote

function test_weightnorm()
    Random.seed!(12345)
    rng = Random.default_rng()
    x = randn(Float32, 300, 72, 32)

    model = WeightNorm(Conv((9,), 72=>72, stride=1, pad=1, dilation=1), (:weight, :bias))
    ps, st = Lux.setup(rng, model)

    ∇params, _ = gradient(ps, x) do p, x
        pred, _ = Lux.apply(model, x, p, st)
        sum(pred)
    end
    println(∇params[:normalized][:bias_v])
end

test_weightnorm()

prints

[NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN;;;]

with

Julia v1.7.3
Lux v0.4.9
NNlib v0.8.8
Zygote v0.6.41

avik-pal · 2022-07-21T02:44:21Z

In general, I think (if you know of any contradicting example I am open to checking that) bias should not be normalized (basically anything that's completely zero). I will update the code to throw an error if such normalization is requested.

carterjgreen · 2022-07-21T14:30:54Z

I think that you're right about bias not being normalized. The original paper doesn't mention normalizing the biases at all (I guess it is weight normalization after all). Throwing an error is a good call.

avik-pal · 2022-07-26T05:09:16Z

Once #106 is merged. The following error will be thrown:

ERROR: ArgumentError: Parameter bias is completely zero. This will result in NaN gradients.
Either remove this parameter from `which_params` or modify the initialization in the actual layer.
Typically this is controlled using the `init_bias` keyword argument.

avik-pal added the bug Something isn't working label Jul 21, 2022

avik-pal added a commit that referenced this issue Jul 26, 2022

Fixes #95

c0a4ab2

avik-pal added a commit that referenced this issue Jul 26, 2022

Fixes #95

972c503

avik-pal linked a pull request Jul 26, 2022 that will close this issue

Fixes WeightNorm with zero Parameter bug #106

Merged

avik-pal closed this as completed in #106 Jul 26, 2022

avik-pal added a commit that referenced this issue Sep 8, 2022

Fixes #95

f58a0fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeightNorm causes NaN for Conv layer gradients #95

WeightNorm causes NaN for Conv layer gradients #95

carterjgreen commented Jul 20, 2022

avik-pal commented Jul 21, 2022

carterjgreen commented Jul 21, 2022

avik-pal commented Jul 26, 2022

WeightNorm causes NaN for Conv layer gradients #95

WeightNorm causes NaN for Conv layer gradients #95

Comments

carterjgreen commented Jul 20, 2022

avik-pal commented Jul 21, 2022

carterjgreen commented Jul 21, 2022

avik-pal commented Jul 26, 2022