Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient incorrect for Conv-layer and complex numbers #1876

Closed
zsoerenm opened this issue Feb 15, 2022 · 0 comments · Fixed by FluxML/NNlib.jl#389
Closed

Gradient incorrect for Conv-layer and complex numbers #1876

zsoerenm opened this issue Feb 15, 2022 · 0 comments · Fixed by FluxML/NNlib.jl#389
Labels

Comments

@zsoerenm
Copy link

zsoerenm commented Feb 15, 2022

I created a real valued convolution layer to verify, if the gradients calculated for the complex valued Conv-layer are correct:

using Flux
struct ComplexWeight w_re; w_im end # A 1 dim conv layer, where real and imaginary parts are seperated
eachmat(A) = (view(A, :, :, i) for i in axes(A, 3))
(cw::ComplexWeight)(A) = reshape(mapreduce(x -> [x[:,1:4] * cw.w_re .- x[:,5:8] * cw.w_im x[:,5:8] * cw.w_re .+ x[:,1:4] * cw.w_im], hcat, eachmat(A)), (size(A,1), 2, 1))
Flux.@functor ComplexWeight
complex_init = randn(ComplexF32, 1, 4, 1)
real_convl = ComplexWeight(real.(vec(complex_init)), imag.(vec(complex_init)))
convl = Conv((1,), 4 => 1, identity; pad=SamePad(), init=(dims...) -> complex_init, bias=false)
xs = randn(ComplexF32, 256, 4, 1);
ys = randn(ComplexF32, 256, 1, 1);
to_real(A) = hcat(real.(A), imag.(A))
to_complex(A) = complex.(A[:,1:size(A,2) >> 1,:], A[:,size(A,2) >> 1 + 1:end,:])

# Check if layers produce the same output
real_y = real_convl(to_real(xs));
convl(xs)  complex.(real_y[:,1], real_y[:,2]) # true

# Create loss functions and check if they result in the same output
loss_real(model, xs, ys) = Flux.Losses.mse(to_complex(model(xs)), ys)
loss(model, xs, ys) = Flux.Losses.mse(model(xs), ys)
loss_real(real_convl, to_real(xs), ys)  loss(convl, xs, ys) # true

# Calculate gradients
params_real = Flux.params(real_convl)
grads_real = Flux.gradient(params_real) do
    loss_real(real_convl, to_real(xs), ys)
end
params = Flux.params(convl)
grads = Flux.gradient(params) do
    loss(convl, xs, ys)
end
vec(grads[params[1]])  complex.(grads_real[params_real[1]], grads_real[params_real[2]]) # false

The layers and the loss functions produce the same output given the same weights. However, the gradients are different.
I've checked a basic gradient calculation:

using Statistics: mean
using Flux
xs = randn(ComplexF64, 12, 4)
w = randn(ComplexF64, 4)
y = xs * w + randn(ComplexF64, 12)
f(w) = mean(abs2.(xs * w - y))
f2(w) = Flux.Losses.mse(xs * w, y)
function f_real(w) 
    re_part = real.(xs) * w[1] - imag.(xs) * w[2] - real.(y)
    im_part = imag.(xs) * w[1] + real.(xs) * w[2] - imag.(y)
    mean(re_part .* re_part + im_part .* im_part)
end
f_real2(w) = mean(abs2.(complex.(real.(xs) * w[1] - imag.(xs) * w[2], imag.(xs) * w[1] + real.(xs) * w[2]) - y))
f(w)  f2(w)  f_real([real.(w), imag.(w)])  f_real2([real.(w), imag.(w)]) # true
df(w) = gradient(f, w)[1]
df2(w) = gradient(f2, w)[1]
df_real(w) = gradient(f_real, w)[1]
df_real2(w) = gradient(f_real2, w)[1]
df_real_w = df_real([real.(w), imag.(w)])
df_real2_w = df_real2([real.(w), imag.(w)])
df(w)  df2(w)  complex.(df_real_w[1], df_real_w[2])  complex.(df_real2_w[1], df_real2_w[2]) # true

This is correct. My guess is that the error is somewhere in the Conv-layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants