Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM gradient calculation fails on GPU, works on CPU #1586

Closed
mahowald opened this issue Apr 28, 2021 · 5 comments
Closed

LSTM gradient calculation fails on GPU, works on CPU #1586

mahowald opened this issue Apr 28, 2021 · 5 comments

Comments

@mahowald
Copy link

I have encountered an error when attempting to train an LSTM model that only appears on GPU. Below is a minimal example to reproduce the error. The code works fine without the |> gpu, and it also works fine with GPU if the LSTM layer is replaced with RNN(5, 64).

using Flux
using CUDA
using Statistics

m = Chain(LSTM(5, 64), Dense(64, 64, relu), Dense(64, 1)) |> gpu
pars = Flux.params(m)

x = rand(5, 10) |> gpu
y = rand(1, 10) |> gpu

grads = gradient(pars) do
    mean( abs.(m(x) .- y) )
end

Here is the stacktrace for the error:

ERROR: LoadError: ArgumentError: NamedTuple{(:σ, :Wi, :Wh, :b, :h, :c), Tuple{Nothing, LinearAlgebra.Transpose{Float32, CuArray{Float32, 2}}, LinearAlgebra.Transpose{Float32, CuArray{Float32, 2}}, CuArray{Float32, 1}, Nothing, Nothing}} keys must be a subset of NamedTuple{(:Wi, :Wh, :b, :h, :c), NTuple{5, Nothing}} keys
Stacktrace:
  [1] #s65#84
    @ ~/.julia/packages/Zygote/RxTZu/src/lib/lib.jl:20 [inlined]
  [2] var"#s65#84"(::Any, x::Any, y::Any)
    @ Zygote ./none:0
  [3] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any, N} where N)
    @ Core ./boot.jl:571
  [4] struct_grad!(cx::Zygote.Context, x::Flux.LSTMCell{CuArray{Float32, 2}, CuArray{Float32, 1}}, x̄::NamedTuple{(:σ, :Wi, :Wh, :b, :h, :c), Tuple{Nothing, LinearAlgebra.Transpose{Float32, CuArray{Float32, 2}}, LinearAlgebra.Transpose{Float32, CuArray{Float32, 2}}, CuArray{Float32, 1}, Nothing, Nothing}})
    @ Flux.CUDAint ~/.julia/packages/Flux/goUGu/src/cuda/curnn.jl:64
  [5] #13
    @ ~/.julia/packages/Flux/goUGu/src/cuda/curnn.jl:86 [inlined]
  [6] #541#back
    @ ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59 [inlined]
  [7] #178
    @ ~/.julia/packages/Zygote/RxTZu/src/lib/lib.jl:194 [inlined]
  [8] (::Zygote.var"#1686#back#180"{Zygote.var"#178#179"{Tuple{Tuple{Nothing}, Tuple{Nothing}}, Flux.CUDAint.var"#541#back#15"{Flux.CUDAint.var"#13#14"{Zygote.Context, Flux.LSTMCell{CuArray{Float32, 2}, CuArray{Float32, 1}}, CuArray{Float32, 1}, CuArray{Float32, 1}, CUDA.CUDNN.var"#56#57"{CUDA.CUDNN.RNNDesc{Float32}, CuArray{Float32, 2}, CuArray{Float32, 1}, CuArray{Float32, 1}, CuArray{Float32, 2}, CuArray{UInt8, 1}}}}}})(Δ::Tuple{Nothing, CuArray{Float32, 2}})
    @ Zygote ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59
  [9] Pullback
    @ ~/.julia/packages/Flux/goUGu/src/layers/recurrent.jl:36 [inlined]
 [10] (::typeof(∂(λ)))(Δ::CuArray{Float32, 2})
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface2.jl:0
 [11] Pullback
    @ ~/.julia/packages/Flux/goUGu/src/layers/basic.jl:36 [inlined]
 [12] (::typeof(∂(applychain)))(Δ::CuArray{Float32, 2})
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface2.jl:0
 [13] Pullback
    @ ~/.julia/packages/Flux/goUGu/src/layers/basic.jl:38 [inlined]
 [14] (::typeof(∂(λ)))(Δ::CuArray{Float32, 2})
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface2.jl:0
 [15] Pullback
    @ /tmp/error.jl:12 [inlined]
 [16] (::typeof(∂(#1)))(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface2.jl:0
 [17] (::Zygote.var"#69#70"{Zygote.Params, typeof(∂(#1)), Zygote.Context})(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface.jl:252
 [18] gradient(f::Function, args::Zygote.Params)
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface.jl:59
 [19] top-level scope
    @ /tmp/error.jl:11
in expression starting at /tmp/error.jl:11

This occurs on Linux, CUDA v10.1, Julia v1.6. Any ideas about what's going wrong?

@DhairyaLGandhi
Copy link
Member

Cc @mzgubic should we relax this bound somewhat?

I can take a look in a bit too

@mzgubic
Copy link

mzgubic commented Apr 29, 2021

For reference FluxML/Zygote.jl#924

In a nutshell, Zygote used to quietly drop certain derivatives when accumulating two NamedTuples. After that PR it throws an error.

The underlying issue that this exposes is that one of the custom @adjoints is incorrect (i.e. at least one of the NamedTuples, representing gradients w.r.t. a struct, is missing at least one of the fields.

That said, I don't know why it would work on CPU and not on GPU. My only guess is that CUDA has some compatibility requirements which change the Zygote version compared to the one running on CPU?

@mahowald
Copy link
Author

@mzgubic Good catch---differing Zygote versions looks like it was the root issue. After updating Zygote and CUDA the error disappears. I'll go ahead and close the issue.

@ToucheSir
Copy link
Member

ToucheSir commented Apr 29, 2021

Flux depends on CUDA unconditionally, so the Zygote version should be the same regardless of whether GPU functionality is used.

@mahowald can you update to a newer version of Flux? From the stacktrace, it looks like this is a version that still relies on the (often buggy) CuDNN RNN path, which we've more or less removed in recent versions. The presence of a in the first NamedTuple is especially suspect.

Edit: looks like Zygote and CUDA may have been held back for some reason, glad you resolved it. For posterity, do you mind listing what Flux, Zygote and CUDA versions you were and are on?

@mahowald
Copy link
Author

@ToucheSir Current versions are:

Flux - v0.12.1
CUDA - v2.6.3
Zygote - v0.6.10

If I remember correctly, the previous versions were:

Flux - v0.11.6
CUDA - v2.6.2
Zygote - v0.6.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants