Dropout erroring on latest CUDA #1960

CarloLucibello · 2022-05-06T11:35:24Z

On

  [052768ef] CUDA v3.9.0
  [587475ba] Flux v0.13.0

I get the following error:

julia> x = rand(10) |> gpu;

julia> d = Dropout(0.5)
Dropout(0.5)

julia> gradient(x -> sum(d(x)), x)
ERROR: ArgumentError: x isa CuArray, but rng isa Random.TaskLocalRNG. dropout_mask only support CUDA.RNG for CuArrays.
Stacktrace:
  [1] dropout_mask(rng::Random.TaskLocalRNG, x::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, p::Float64; kwargs::Base.Pairs{Symbol, Colon, Tuple{Symbol}, NamedTuple{(:dims,), Tuple{Colon}}})
    @ Flux ~/.julia/packages/Flux/18YZE/src/layers/normalise.jl:42
  [2] chain_rrule_kw
    @ ~/.julia/packages/Zygote/Y6SC4/src/compiler/chainrules.jl:229 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0 [inlined]
  [4] _pullback(::Zygote.Context, ::Flux.var"#dropout_mask##kw", ::NamedTuple{(:dims,), Tuple{Colon}}, ::typeof(Flux.dropout_mask), ::Random.TaskLocalRNG, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::Float64)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:9
  [5] _pullback
    @ ~/.julia/packages/Flux/18YZE/src/layers/normalise.jl:36 [inlined]
  [6] _pullback
    @ ~/.julia/packages/Flux/18YZE/src/layers/normalise.jl:35 [inlined]
  [7] _pullback(::Zygote.Context, ::Flux.var"#dropout##kw", ::NamedTuple{(:dims, :active), Tuple{Colon, Bool}}, ::typeof(Flux.dropout), ::Random.TaskLocalRNG, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::Float64)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
  [8] _pullback
    @ ~/.julia/packages/Flux/18YZE/src/layers/normalise.jl:87 [inlined]
  [9] _pullback(ctx::Zygote.Context, f::Dropout{Float64, Colon, Random.TaskLocalRNG}, args::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [10] _pullback
    @ ./REPL[8]:1 [inlined]
 [11] _pullback(ctx::Zygote.Context, f::var"#3#4", args::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [12] _pullback(f::Function, args::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:34
 [13] pullback(f::Function, args::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:40
 [14] gradient(f::Function, args::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:75
 [15] top-level scope
    @ REPL[8]:1
 [16] top-level scope
    @ ~/.julia/packages/CUDA/Uurn4/src/initialization.jl:52

The text was updated successfully, but these errors were encountered:

darsnack · 2022-05-06T13:13:40Z

This is expected since the model wasn't moved to the GPU. I don't think we've ever guaranteed that layers on the CPU will work on the CuArray inputs.

CarloLucibello · 2022-05-06T13:39:56Z

ah, sorry, silly mistake

chengchingwen · 2022-05-25T17:29:15Z

This seems to break models which ignore dropout layer in functor.

ToucheSir · 2022-05-25T17:35:28Z

Do you have an example? I don't understand your comment,

darsnack · 2022-05-25T17:40:12Z

Do you mean a layer that wraps a dropout layer and does not include the dropout layer as part of the functor leaves? That seems like an improper use of functor when implementing the wrapper layer (though I can believe there's some good reason for doing this).

Alternatively, as a workaround, the RNG can be set when constructing the dropout layer:

Dropout(p; rng = Flux.rng_from_array(CuArray))

chengchingwen · 2022-05-25T17:57:38Z

Like this one:

struct MultiheadAttention{Q<:Dense, K<:Dense, V<:Dense, O<:Dense, DP<:Dropout} <: AbstractAttention
    head::Int
    future::Bool
    iqproj::Q
    ikproj::K
    ivproj::V
    oproj::O
    drop::DP
end

Flux.functor(mh::MultiheadAttention) = (mh.iqproj, mh.ikproj, mh.ivproj, mh.oproj), m -> MultiheadAttention(mh.head, mh.future, m..., mh.drop)

This is an really old code snip from Transformers, which should definitely be updated. But I think it is potentially breaking because functor was used to extract parameter and dropout was not treating as a layer with parameter before. So maybe the broader question would be how do we know if a Flux layer need to be fmap recursively or not when composing layers?

ToucheSir · 2022-05-25T18:01:40Z

So maybe the broader question would be how do we know if a Flux layer need to be fmap recursively or not when composing layers?

I would say always assume a sublayer could have parameters. There is basically no performance loss from doing so.

pawbz · 2022-06-10T06:33:48Z

I still face the same issue with AD when using dropout.

x = rand(10) |> gpu;
d = Dropout(0.5) |> gpu
gradient(x -> sum(d(x)), x)

darsnack · 2022-06-10T14:34:14Z

Can you post a full stack trace?

ToucheSir · 2022-06-10T14:37:56Z

As well as a full list of packages in your environment. If I had to guess, something is holding essential Flux dependencies back.

pawbz · 2022-06-11T04:43:42Z

Thank you all. The issue was resolved when I updated CUDA.jl.

CarloLucibello closed this as completed May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropout erroring on latest CUDA #1960

Dropout erroring on latest CUDA #1960

CarloLucibello commented May 6, 2022 •

edited

Loading

darsnack commented May 6, 2022

CarloLucibello commented May 6, 2022

chengchingwen commented May 25, 2022

ToucheSir commented May 25, 2022

darsnack commented May 25, 2022

chengchingwen commented May 25, 2022

ToucheSir commented May 25, 2022

pawbz commented Jun 10, 2022

darsnack commented Jun 10, 2022

ToucheSir commented Jun 10, 2022

pawbz commented Jun 11, 2022

Dropout erroring on latest CUDA #1960

Dropout erroring on latest CUDA #1960

Comments

CarloLucibello commented May 6, 2022 • edited Loading

darsnack commented May 6, 2022

CarloLucibello commented May 6, 2022

chengchingwen commented May 25, 2022

ToucheSir commented May 25, 2022

darsnack commented May 25, 2022

chengchingwen commented May 25, 2022

ToucheSir commented May 25, 2022

pawbz commented Jun 10, 2022

darsnack commented Jun 10, 2022

ToucheSir commented Jun 10, 2022

pawbz commented Jun 11, 2022

CarloLucibello commented May 6, 2022 •

edited

Loading