LSTM gradient calculation fails on GPU, works on CPU #1586

mahowald · 2021-04-28T20:45:10Z

I have encountered an error when attempting to train an LSTM model that only appears on GPU. Below is a minimal example to reproduce the error. The code works fine without the |> gpu, and it also works fine with GPU if the LSTM layer is replaced with RNN(5, 64).

using Flux
using CUDA
using Statistics

m = Chain(LSTM(5, 64), Dense(64, 64, relu), Dense(64, 1)) |> gpu
pars = Flux.params(m)

x = rand(5, 10) |> gpu
y = rand(1, 10) |> gpu

grads = gradient(pars) do
    mean( abs.(m(x) .- y) )
end

Here is the stacktrace for the error:

ERROR: LoadError: ArgumentError: NamedTuple{(:σ, :Wi, :Wh, :b, :h, :c), Tuple{Nothing, LinearAlgebra.Transpose{Float32, CuArray{Float32, 2}}, LinearAlgebra.Transpose{Float32, CuArray{Float32, 2}}, CuArray{Float32, 1}, Nothing, Nothing}} keys must be a subset of NamedTuple{(:Wi, :Wh, :b, :h, :c), NTuple{5, Nothing}} keys
Stacktrace:
  [1] #s65#84
    @ ~/.julia/packages/Zygote/RxTZu/src/lib/lib.jl:20 [inlined]
  [2] var"#s65#84"(::Any, x::Any, y::Any)
    @ Zygote ./none:0
  [3] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any, N} where N)
    @ Core ./boot.jl:571
  [4] struct_grad!(cx::Zygote.Context, x::Flux.LSTMCell{CuArray{Float32, 2}, CuArray{Float32, 1}}, x̄::NamedTuple{(:σ, :Wi, :Wh, :b, :h, :c), Tuple{Nothing, LinearAlgebra.Transpose{Float32, CuArray{Float32, 2}}, LinearAlgebra.Transpose{Float32, CuArray{Float32, 2}}, CuArray{Float32, 1}, Nothing, Nothing}})
    @ Flux.CUDAint ~/.julia/packages/Flux/goUGu/src/cuda/curnn.jl:64
  [5] #13
    @ ~/.julia/packages/Flux/goUGu/src/cuda/curnn.jl:86 [inlined]
  [6] #541#back
    @ ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59 [inlined]
  [7] #178
    @ ~/.julia/packages/Zygote/RxTZu/src/lib/lib.jl:194 [inlined]
  [8] (::Zygote.var"#1686#back#180"{Zygote.var"#178#179"{Tuple{Tuple{Nothing}, Tuple{Nothing}}, Flux.CUDAint.var"#541#back#15"{Flux.CUDAint.var"#13#14"{Zygote.Context, Flux.LSTMCell{CuArray{Float32, 2}, CuArray{Float32, 1}}, CuArray{Float32, 1}, CuArray{Float32, 1}, CUDA.CUDNN.var"#56#57"{CUDA.CUDNN.RNNDesc{Float32}, CuArray{Float32, 2}, CuArray{Float32, 1}, CuArray{Float32, 1}, CuArray{Float32, 2}, CuArray{UInt8, 1}}}}}})(Δ::Tuple{Nothing, CuArray{Float32, 2}})
    @ Zygote ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59
  [9] Pullback
    @ ~/.julia/packages/Flux/goUGu/src/layers/recurrent.jl:36 [inlined]
 [10] (::typeof(∂(λ)))(Δ::CuArray{Float32, 2})
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface2.jl:0
 [11] Pullback
    @ ~/.julia/packages/Flux/goUGu/src/layers/basic.jl:36 [inlined]
 [12] (::typeof(∂(applychain)))(Δ::CuArray{Float32, 2})
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface2.jl:0
 [13] Pullback
    @ ~/.julia/packages/Flux/goUGu/src/layers/basic.jl:38 [inlined]
 [14] (::typeof(∂(λ)))(Δ::CuArray{Float32, 2})
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface2.jl:0
 [15] Pullback
    @ /tmp/error.jl:12 [inlined]
 [16] (::typeof(∂(#1)))(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface2.jl:0
 [17] (::Zygote.var"#69#70"{Zygote.Params, typeof(∂(#1)), Zygote.Context})(Δ::Float32)
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface.jl:252
 [18] gradient(f::Function, args::Zygote.Params)
    @ Zygote ~/.julia/packages/Zygote/RxTZu/src/compiler/interface.jl:59
 [19] top-level scope
    @ /tmp/error.jl:11
in expression starting at /tmp/error.jl:11

This occurs on Linux, CUDA v10.1, Julia v1.6. Any ideas about what's going wrong?

The text was updated successfully, but these errors were encountered:

DhairyaLGandhi · 2021-04-28T20:51:47Z

Cc @mzgubic should we relax this bound somewhat?

I can take a look in a bit too

mzgubic · 2021-04-29T09:25:33Z

For reference FluxML/Zygote.jl#924

In a nutshell, Zygote used to quietly drop certain derivatives when accumulating two NamedTuples. After that PR it throws an error.

The underlying issue that this exposes is that one of the custom @adjoints is incorrect (i.e. at least one of the NamedTuples, representing gradients w.r.t. a struct, is missing at least one of the fields.

That said, I don't know why it would work on CPU and not on GPU. My only guess is that CUDA has some compatibility requirements which change the Zygote version compared to the one running on CPU?

mahowald · 2021-04-29T15:15:53Z

@mzgubic Good catch---differing Zygote versions looks like it was the root issue. After updating Zygote and CUDA the error disappears. I'll go ahead and close the issue.

ToucheSir · 2021-04-29T15:16:00Z

Flux depends on CUDA unconditionally, so the Zygote version should be the same regardless of whether GPU functionality is used.

@mahowald can you update to a newer version of Flux? From the stacktrace, it looks like this is a version that still relies on the (often buggy) CuDNN RNN path, which we've more or less removed in recent versions. The presence of a :σ in the first NamedTuple is especially suspect.

Edit: looks like Zygote and CUDA may have been held back for some reason, glad you resolved it. For posterity, do you mind listing what Flux, Zygote and CUDA versions you were and are on?

mahowald · 2021-04-29T16:06:35Z

@ToucheSir Current versions are:

Flux - v0.12.1
CUDA - v2.6.3
Zygote - v0.6.10

If I remember correctly, the previous versions were:

Flux - v0.11.6
CUDA - v2.6.2
Zygote - v0.6.8

mahowald closed this as completed Apr 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM gradient calculation fails on GPU, works on CPU #1586

LSTM gradient calculation fails on GPU, works on CPU #1586

mahowald commented Apr 28, 2021

DhairyaLGandhi commented Apr 28, 2021

mzgubic commented Apr 29, 2021 •

edited

Loading

mahowald commented Apr 29, 2021

ToucheSir commented Apr 29, 2021 •

edited

Loading

mahowald commented Apr 29, 2021

LSTM gradient calculation fails on GPU, works on CPU #1586

LSTM gradient calculation fails on GPU, works on CPU #1586

Comments

mahowald commented Apr 28, 2021

DhairyaLGandhi commented Apr 28, 2021

mzgubic commented Apr 29, 2021 • edited Loading

mahowald commented Apr 29, 2021

ToucheSir commented Apr 29, 2021 • edited Loading

mahowald commented Apr 29, 2021

mzgubic commented Apr 29, 2021 •

edited

Loading

ToucheSir commented Apr 29, 2021 •

edited

Loading