Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in RNN docs #1638

Closed
CarloLucibello opened this issue Jun 27, 2021 · 6 comments · Fixed by #1639
Closed

bug in RNN docs #1638

CarloLucibello opened this issue Jun 27, 2021 · 6 comments · Fixed by #1639

Comments

@CarloLucibello
Copy link
Member

CarloLucibello commented Jun 27, 2021

The documentation for recurrent neural networks
https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/recurrence.md
provides examples that would lead to erroneous gradient computations.

This is due to the fact that while Recur special cases broadcast and forwards it to the map machinery, which in turn reverses the execution order when computing the adjoint for vector inputs, the same does not apply to Chain or generic stateful functions.

Here I show explicitly the issue:

using Flux
using FiniteDifferences
using Random

m = Chain(RNN(2, 5), Dense(5, 1))

function loss(x, y)
    Flux.reset!(m)
    sum((Flux.stack(m.(x), 1) .- y) .^ 2)
end


function loss_loop(x, y)
    Flux.reset!(m)
    l = 0f0 
    for (xi, yi) in zip(x, y)
        l += sum((m(xi) .- yi).^ 2)
    end
    return l
end

Random.seed!(17)
x = [rand(Float32, 2) for i = 1:3]
y = rand(Float32, 3)

print(loss_loop(x, y)) # 3.1758256
print(loss(x, y))      # 3.1758256

println(gradient(x -> loss_loop(x, y), x)[1])
# Vector{Float32}[[-0.27259377, 0.3039751], [1.0806422, 0.79439837], [-0.6706671, 0.6848929]]

println(gradient(x -> loss(x, y), x)[1]) # WRONG GRADIENT
# Vector{Float32}[[-0.09357249, 0.2245327], [0.42562985, 0.81408167], [0.044034332, 1.131884]]

fdm = FiniteDifferences.central_fdm(5,1)
println(FiniteDifferences.grad(fdm, x -> loss(x, y), x)[1])
# Vector{Float32}[[-0.27259293, 0.3039818], [1.0806386, 0.7944023], [-0.670666, 0.68489385]]

I think we should stop directing users toward the use of broadcast and map when using stateful layers, the julia language doesn't give any ordering guarantees so why should we?

Related to FluxML/Zygote.jl#807

@ToucheSir
Copy link
Member

ToucheSir commented Jun 27, 2021

For single output RNNs, foldl/foldr have well-defined semantics and can still be used as a one-liner:

loss_func(foldl(m, x), y) # if m is a Recur
loss_func(foldl((state, xi) -> m(state, xi)[1], x), y) # if m is an RNNCell

The latter option could also let us decouple the mutable state field from Recur (and all the issues that come with it).

@CarloLucibello
Copy link
Member Author

foldl is a good option for the case in which only the final output is used in the loss computation, I'll add an example to #1639

@darsnack
Copy link
Member

darsnack commented Jun 27, 2021

Does map not have an ordering guarantee? I thought it was reduce that didn't have a guarantee.

@CarloLucibello
Copy link
Member Author

CarloLucibello commented Jun 27, 2021

loss_func(foldl(m, x), y) # if m is a Recur

wait, this doesn't really work, m needs to take 2 args

@CarloLucibello
Copy link
Member Author

CarloLucibello commented Jun 27, 2021

Does map not have an ordering guarantee?

I don't see it explicitly stated, although it's likely to be ordered for 1d arrays or generic iterators. I'm not sure what happens when mapping on higher dimension arrays instead

@darsnack
Copy link
Member

I think you want mapfoldl(x -> loss(m(x), y), +, xs).

@bors bors bot closed this as completed in 1a14301 Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants