Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nonzero beta + flipkernel bugfix #519

Merged
merged 4 commits into from
Jul 8, 2023
Merged

Conversation

nikopj
Copy link
Contributor

@nikopj nikopj commented Jul 4, 2023

Addresses #518.

@CarloLucibello
Copy link
Member

can you add a test?

@ToucheSir
Copy link
Member

It also feels like it should be possible to address the issue with a fix to how the function does indexing in the inner loop instead of wholesale flipping the kernel gradients inside the function (won't they be flipped back and forth for multiple calls?) but I may be missing something.

@nikopj
Copy link
Contributor Author

nikopj commented Jul 6, 2023

It also feels like it should be possible to address the issue with a fix to how the function does indexing in the inner loop instead of wholesale flipping the kernel gradients inside the function (won't they be flipped back and forth for multiple calls?) but I may be missing something.

You're right, it can be done much more cleanly with view.

On writing the tests for this, I've come across a much wider set of bugs across conv implementations with non-zero beta. For example,

for beta=(0f0, 1f0) 
    @show beta
    x = fill(1f0, 2, 1, 1)
    w = [-1f0; 1f0;;;]
    y = fill(1f0, 1, 1, 1)
    cdims = NNlib.DenseConvDims(x, w)

    x_direct = NNlib.∇conv_data_direct!(copy(x), y, w, cdims; alpha=1f0, beta=beta)
    x_im2col = NNlib.∇conv_data_im2col!(copy(x), y, w, cdims; alpha=1f0, beta=beta)

    @show x_direct
    @show x_im2col
    @show x_direct  x_im2col
end

output:

beta = 0.0f0
x_direct = [1.0; -1.0;;;]
x_im2col = [1.0; -1.0;;;]
x_direct ≈ x_im2col = true
beta = 1.0f0
x_direct = [2.0; 0.0;;;]
x_im2col = [1.0; -1.0;;;]
x_direct ≈ x_im2col = false

Here the direct version is correct and the im2col version is incorrect. This can be seen in src/impl/conv_im2col.jl line 163, 165. I think the cleanest solution would be to add alpha/beta args to col2im! and im2col!. Let me know what you think and I can work on fixing this too.

@ToucheSir
Copy link
Member

I haven't worked through the math yet, but assuming it's correct I think it would be easiest to add the tests first and mark them as broken for the im2col path. Then we can merge those quickly and allow the fixes to be made at a more leisurely pace. It concerns me greatly that nobody bothered to write tests with a non-zero beta (or a non-one alpha!) when the conv kernels were first added.

@nikopj
Copy link
Contributor Author

nikopj commented Jul 6, 2023

Tests are added. I kept alpha=2, beta=-1 in all tests. I generate some random data for the conv! tests. I chose to do rand(rng, -9e0:9e0) so that debugging may be easier by keeping answers as integers. I can change those if there is some standard protocol for these that I'm not aware of.

In general, the filter and data im2col implementations are broken with non-zero beta. It seems that the direct implementations are also broken for some cases of spatial-rank, stride, and dilation.

The original flipkernel + non-zero beta bug with \delconv_filter_direct! is fixed.

Copy link
Member

@ToucheSir ToucheSir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the change and the expanded test suite is great!

src/impl/conv_direct.jl Outdated Show resolved Hide resolved
@ToucheSir ToucheSir merged commit 629475a into FluxML:master Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants