-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug in RNN docs #1638
Comments
For single output RNNs, loss_func(foldl(m, x), y) # if m is a Recur
loss_func(foldl((state, xi) -> m(state, xi)[1], x), y) # if m is an RNNCell The latter option could also let us decouple the mutable |
|
Does |
wait, this doesn't really work, |
I don't see it explicitly stated, although it's likely to be ordered for 1d arrays or generic iterators. I'm not sure what happens when mapping on higher dimension arrays instead |
I think you want |
The documentation for recurrent neural networks
https://github.com/FluxML/Flux.jl/blob/master/docs/src/models/recurrence.md
provides examples that would lead to erroneous gradient computations.
This is due to the fact that while
Recur
special cases broadcast and forwards it to the map machinery, which in turn reverses the execution order when computing the adjoint for vector inputs, the same does not apply toChain
or generic stateful functions.Here I show explicitly the issue:
I think we should stop directing users toward the use of
broadcast
andmap
when using stateful layers, the julia language doesn't give any ordering guarantees so why should we?Related to FluxML/Zygote.jl#807
The text was updated successfully, but these errors were encountered: