Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed NaN error in dilation #45

Merged
merged 4 commits into from
May 31, 2018
Merged

Fixed NaN error in dilation #45

merged 4 commits into from
May 31, 2018

Conversation

tejank10
Copy link
Contributor

@tejank10 tejank10 commented May 23, 2018

@staticfloat please review

@staticfloat
Copy link
Contributor

Confirmed that this suppresses the symptoms I was seeing. Let's go ahead and add this workload to test suite (since the previous test didn't seem to catch this), remove the commented out dilation_dims() calls, and call it good.

As to why this was happening, I think this was due to us passing in the incorrect dimensions to the gemm() calls, which caused OpenBLAS to read from the wrong memory locations.

@staticfloat
Copy link
Contributor

Uh oh, we need to fix gradients as well:

# 2-length convolution with 1 channel in and 1 channel out
w = reshape(Float64[1, 1], (2, 1, 1))

# 128-length input vector with 1 channel and 1 batch.
x = reshape(Float64[1:128...], (128, 1, 1))

# Do the convolution (this doesn't have any NaN's)
y = NNlib.conv(x, w; dilation = 2)
@test !any(isnan.(y))

# Generate random upstream derivative
dy = randn(size(y))

# Calculate backprop for w
dws = []
for idx in 1:1000
    dw = NNlib.∇conv_filter(dy, x, w; dilation=2)
    push!(dws, dw)
end

# Calculate backprop for x
dxs = []
for idx in 1:1000
    dx = NNlib.∇conv_data(dy, x, w; dilation=2)
    push!(dxs, dx)
end

@show any([any(isnan.(dw)) for dw in dws])
@show any([any(isnan.(dx)) for dx in dxs])

In my experiments, dx gets NaN's sometimes, dw has not, but that could just be because my w is very small, so whatever memory region the BLAS routines are reaching into to get uninitialized memory might not be getting read from if w and x are too small.

@tejank10
Copy link
Contributor Author

tejank10 commented May 24, 2018

im2col_dims was returning extra memory size. Also, I had goofed up the order of dimensions in dilation parameter of col2im2d!. Sorry about that.

@staticfloat
Copy link
Contributor

Yep, that's great! Thanks Tejan! Can you add some of these workloads as test cases in the test suite, so that we can be sure to not run into these problems in the future? I can confirm that these changes fix my NaN issues.

@MikeInnes
Copy link
Member

Thanks! I'm getting segfaults on 0.7, so hopefully this will address those.

@MikeInnes MikeInnes merged commit 1ed81ee into FluxML:master May 31, 2018
ToucheSir added a commit that referenced this pull request Feb 13, 2023
Missing Identity Activation Import
ToucheSir added a commit that referenced this pull request Feb 13, 2023
Missing Identity Activation Import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants