Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropout erroring on latest CUDA #1960

Closed
CarloLucibello opened this issue May 6, 2022 · 11 comments
Closed

Dropout erroring on latest CUDA #1960

CarloLucibello opened this issue May 6, 2022 · 11 comments

Comments

@CarloLucibello
Copy link
Member

CarloLucibello commented May 6, 2022

On

  [052768ef] CUDA v3.9.0
  [587475ba] Flux v0.13.0

I get the following error:

julia> x = rand(10) |> gpu;

julia> d = Dropout(0.5)
Dropout(0.5)

julia> gradient(x -> sum(d(x)), x)
ERROR: ArgumentError: x isa CuArray, but rng isa Random.TaskLocalRNG. dropout_mask only support CUDA.RNG for CuArrays.
Stacktrace:
  [1] dropout_mask(rng::Random.TaskLocalRNG, x::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, p::Float64; kwargs::Base.Pairs{Symbol, Colon, Tuple{Symbol}, NamedTuple{(:dims,), Tuple{Colon}}})
    @ Flux ~/.julia/packages/Flux/18YZE/src/layers/normalise.jl:42
  [2] chain_rrule_kw
    @ ~/.julia/packages/Zygote/Y6SC4/src/compiler/chainrules.jl:229 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0 [inlined]
  [4] _pullback(::Zygote.Context, ::Flux.var"#dropout_mask##kw", ::NamedTuple{(:dims,), Tuple{Colon}}, ::typeof(Flux.dropout_mask), ::Random.TaskLocalRNG, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::Float64)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:9
  [5] _pullback
    @ ~/.julia/packages/Flux/18YZE/src/layers/normalise.jl:36 [inlined]
  [6] _pullback
    @ ~/.julia/packages/Flux/18YZE/src/layers/normalise.jl:35 [inlined]
  [7] _pullback(::Zygote.Context, ::Flux.var"#dropout##kw", ::NamedTuple{(:dims, :active), Tuple{Colon, Bool}}, ::typeof(Flux.dropout), ::Random.TaskLocalRNG, ::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, ::Float64)
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
  [8] _pullback
    @ ~/.julia/packages/Flux/18YZE/src/layers/normalise.jl:87 [inlined]
  [9] _pullback(ctx::Zygote.Context, f::Dropout{Float64, Colon, Random.TaskLocalRNG}, args::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [10] _pullback
    @ ./REPL[8]:1 [inlined]
 [11] _pullback(ctx::Zygote.Context, f::var"#3#4", args::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface2.jl:0
 [12] _pullback(f::Function, args::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:34
 [13] pullback(f::Function, args::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:40
 [14] gradient(f::Function, args::CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Zygote ~/.julia/packages/Zygote/Y6SC4/src/compiler/interface.jl:75
 [15] top-level scope
    @ REPL[8]:1
 [16] top-level scope
    @ ~/.julia/packages/CUDA/Uurn4/src/initialization.jl:52
@darsnack
Copy link
Member

darsnack commented May 6, 2022

This is expected since the model wasn't moved to the GPU. I don't think we've ever guaranteed that layers on the CPU will work on the CuArray inputs.

@CarloLucibello
Copy link
Member Author

ah, sorry, silly mistake

@chengchingwen
Copy link
Member

This seems to break models which ignore dropout layer in functor.

@ToucheSir
Copy link
Member

Do you have an example? I don't understand your comment,

@darsnack
Copy link
Member

Do you mean a layer that wraps a dropout layer and does not include the dropout layer as part of the functor leaves? That seems like an improper use of functor when implementing the wrapper layer (though I can believe there's some good reason for doing this).

Alternatively, as a workaround, the RNG can be set when constructing the dropout layer:

Dropout(p; rng = Flux.rng_from_array(CuArray))

@chengchingwen
Copy link
Member

Like this one:

struct MultiheadAttention{Q<:Dense, K<:Dense, V<:Dense, O<:Dense, DP<:Dropout} <: AbstractAttention
    head::Int
    future::Bool
    iqproj::Q
    ikproj::K
    ivproj::V
    oproj::O
    drop::DP
end

Flux.functor(mh::MultiheadAttention) = (mh.iqproj, mh.ikproj, mh.ivproj, mh.oproj), m -> MultiheadAttention(mh.head, mh.future, m..., mh.drop)

This is an really old code snip from Transformers, which should definitely be updated. But I think it is potentially breaking because functor was used to extract parameter and dropout was not treating as a layer with parameter before. So maybe the broader question would be how do we know if a Flux layer need to be fmap recursively or not when composing layers?

@ToucheSir
Copy link
Member

So maybe the broader question would be how do we know if a Flux layer need to be fmap recursively or not when composing layers?

I would say always assume a sublayer could have parameters. There is basically no performance loss from doing so.

@pawbz
Copy link

pawbz commented Jun 10, 2022

I still face the same issue with AD when using dropout.

x = rand(10) |> gpu;
d = Dropout(0.5) |> gpu
gradient(x -> sum(d(x)), x)

image

@darsnack
Copy link
Member

Can you post a full stack trace?

@ToucheSir
Copy link
Member

As well as a full list of packages in your environment. If I had to guess, something is holding essential Flux dependencies back.

@pawbz
Copy link

pawbz commented Jun 11, 2022

Thank you all. The issue was resolved when I updated CUDA.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants