bug in RNN docs #1638

CarloLucibello · 2021-06-27T16:29:00Z

The documentation for recurrent neural networks
https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/recurrence.md
provides examples that would lead to erroneous gradient computations.

This is due to the fact that while Recur special cases broadcast and forwards it to the map machinery, which in turn reverses the execution order when computing the adjoint for vector inputs, the same does not apply to Chain or generic stateful functions.

Here I show explicitly the issue:

using Flux
using FiniteDifferences
using Random

m = Chain(RNN(2, 5), Dense(5, 1))

function loss(x, y)
    Flux.reset!(m)
    sum((Flux.stack(m.(x), 1) .- y) .^ 2)
end


function loss_loop(x, y)
    Flux.reset!(m)
    l = 0f0 
    for (xi, yi) in zip(x, y)
        l += sum((m(xi) .- yi).^ 2)
    end
    return l
end

Random.seed!(17)
x = [rand(Float32, 2) for i = 1:3]
y = rand(Float32, 3)

print(loss_loop(x, y)) # 3.1758256
print(loss(x, y))      # 3.1758256

println(gradient(x -> loss_loop(x, y), x)[1])
# Vector{Float32}[[-0.27259377, 0.3039751], [1.0806422, 0.79439837], [-0.6706671, 0.6848929]]

println(gradient(x -> loss(x, y), x)[1]) # WRONG GRADIENT
# Vector{Float32}[[-0.09357249, 0.2245327], [0.42562985, 0.81408167], [0.044034332, 1.131884]]

fdm = FiniteDifferences.central_fdm(5,1)
println(FiniteDifferences.grad(fdm, x -> loss(x, y), x)[1])
# Vector{Float32}[[-0.27259293, 0.3039818], [1.0806386, 0.7944023], [-0.670666, 0.68489385]]

I think we should stop directing users toward the use of broadcast and map when using stateful layers, the julia language doesn't give any ordering guarantees so why should we?

Related to FluxML/Zygote.jl#807

The text was updated successfully, but these errors were encountered:

ToucheSir · 2021-06-27T17:39:03Z

For single output RNNs, foldl/foldr have well-defined semantics and can still be used as a one-liner:

loss_func(foldl(m, x), y) # if m is a Recur
loss_func(foldl((state, xi) -> m(state, xi)[1], x), y) # if m is an RNNCell

The latter option could also let us decouple the mutable state field from Recur (and all the issues that come with it).

CarloLucibello · 2021-06-27T17:54:13Z

foldl is a good option for the case in which only the final output is used in the loss computation, I'll add an example to #1639

darsnack · 2021-06-27T17:59:30Z

Does map not have an ordering guarantee? I thought it was reduce that didn't have a guarantee.

CarloLucibello · 2021-06-27T18:00:31Z

loss_func(foldl(m, x), y) # if m is a Recur

wait, this doesn't really work, m needs to take 2 args

CarloLucibello · 2021-06-27T18:08:55Z

Does map not have an ordering guarantee?

I don't see it explicitly stated, although it's likely to be ordered for 1d arrays or generic iterators. I'm not sure what happens when mapping on higher dimension arrays instead

darsnack · 2021-06-27T18:14:11Z

I think you want mapfoldl(x -> loss(m(x), y), +, xs).

CarloLucibello mentioned this issue Jun 27, 2021

fix recurrence docs #1639

Merged

bors bot closed this as completed in 1a14301 Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug in RNN docs #1638

bug in RNN docs #1638

CarloLucibello commented Jun 27, 2021 •

edited

Loading

ToucheSir commented Jun 27, 2021 •

edited

Loading

CarloLucibello commented Jun 27, 2021

darsnack commented Jun 27, 2021 •

edited

Loading

CarloLucibello commented Jun 27, 2021 •

edited

Loading

CarloLucibello commented Jun 27, 2021 •

edited

Loading

darsnack commented Jun 27, 2021

bug in RNN docs #1638

bug in RNN docs #1638

Comments

CarloLucibello commented Jun 27, 2021 • edited Loading

ToucheSir commented Jun 27, 2021 • edited Loading

CarloLucibello commented Jun 27, 2021

darsnack commented Jun 27, 2021 • edited Loading

CarloLucibello commented Jun 27, 2021 • edited Loading

CarloLucibello commented Jun 27, 2021 • edited Loading

darsnack commented Jun 27, 2021

CarloLucibello commented Jun 27, 2021 •

edited

Loading

ToucheSir commented Jun 27, 2021 •

edited

Loading

darsnack commented Jun 27, 2021 •

edited

Loading

CarloLucibello commented Jun 27, 2021 •

edited

Loading

CarloLucibello commented Jun 27, 2021 •

edited

Loading