Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-isbits arguments passed to CUDAnative #353

Closed
maleadt opened this issue Aug 17, 2018 · 8 comments
Closed

Non-isbits arguments passed to CUDAnative #353

maleadt opened this issue Aug 17, 2018 · 8 comments
Assignees

Comments

@maleadt
Copy link
Collaborator

maleadt commented Aug 17, 2018

From https://discourse.julialang.org/t/llvm-crash-when-running-flux-and-cuarray-examples-in-julia-0-7/12967

using Flux, CuArrays
x = cu(rand(10))
d = Dense(10, 10)
g = gpu(d)
g(x)

This'll crash with a Illegal inttoptr due to JuliaLang/julia#28645.

Removing the abort(), we get to see the underlying reason: a regular CuArray is being used (these are passed as GC-tracked pointers, which are broken by CUDAnative, JuliaGPU/CUDAnative.jl#223):

ERROR: GPU compilation failed, try inspecting generated code with any of the @device_code_... macros
CompilerError: could not compile #19(CuArrays.CuKernelState, CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},getfield(Base.Broadcast, Symbol("##1#2")){Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{TrackedArray{…,CuArray{Float32,1}},TrackedArray{…,CuArray{Float32,1}}}}}},getfield(Base.Broadcast, Symbol("##7#8")){Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{TrackedArray{…,CuArray{Float32,1}},TrackedArray{…,CuArray{Float32,1}}}},getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}},Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}); passing and using non-bitstype argument
- argument_type =Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},getfield(Base.Broadcast, Symbol("##1#2")){Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Tuple{Base.OneTo{Int64}},typeof(identity),Tuple{Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{TrackedArray{…,CuArray{Float32,1}},TrackedArray{…,CuArray{Float32,1}}}}}},getfield(Base.Broadcast, Symbol("##7#8")){Base.Broadcast.Broadcasted{Flux.Tracker.TrackedStyle,Nothing,typeof(+),Tuple{TrackedArray{…,CuArray{Float32,1}},TrackedArray{…,CuArray{Float32,1}}}},getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##9#10")){getfield(Base.Broadcast, Symbol("##11#12"))}},getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##13#14")){getfield(Base.Broadcast, Symbol("##15#16"))}},getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##5#6")){getfield(Base.Broadcast, Symbol("##3#4"))}}}},Tuple{Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}},Base.Broadcast.Extruded{CUDAnative.CuDeviceArray{ForwardDiff.Dual{Nothing,Float32,2},1,CUDAnative.AS.Global},Tuple{Bool},Tuple{Int64}}}}
- argument = 4

This passes arguments TrackedArray{…,CuArray{Float32,1}}.
Probably needs an adapt rule for TrackedArray.

@MikeInnes
Copy link
Member

MikeInnes commented Aug 20, 2018

We shouldn't be passing TrackedArrays to the kernel; Flux's copy(::Broadcasted) should be unwrapping them. More minimal test case:

xs = cu(param([1,2,3]))
ys = cu([4,5,6])
b1 = Base.Broadcast.broadcasted(+, xs, ys)
b2 = Base.Broadcast.broadcasted(identity, b1)
copy(b2)

@MikeInnes
Copy link
Member

The problem is that the flattened broadcast function closes over the original broadcast struct:

julia> Base.Broadcast.flatten(b2).f.bc
Base.Broadcast.Broadcasted(identity, (Base.Broadcast.Broadcasted(+, (Flux.Tracker.TrackedReal{Float32}[1.0 (tracked), 2.0 (tracked), 3.0 (tracked)], Float32[4.0, 5.0, 6.0])),))

@mbauman any ideas how we can avoid this?

@MikeInnes
Copy link
Member

Ok, this is simple enough to fix. Will have to copy these functions into Flux for now.

diff --git a/base/broadcast.jl b/base/broadcast.jl
index 88d83d2..aac0320 100644
--- a/base/broadcast.jl
+++ b/base/broadcast.jl
@@ -304,9 +304,9 @@ function flatten(bc::Broadcasted{Style}) where {Style}
     #     makeargs(w, x, y, z) = (w, makeargs1(x, y, z)...)
     #                          = (w, g(x, y), makeargs2(z)...)
     #                          = (w, g(x, y), z)
-    let makeargs = make_makeargs(bc)
+    let makeargs = make_makeargs(bc), f = bc.f
         newf = @inline function(args::Vararg{Any,N}) where N
-            bc.f(makeargs(args...)...)
+            f(makeargs(args...)...)
         end
         return Broadcasted{Style}(newf, args, bc.axes)
     end
@@ -332,13 +332,13 @@ make_makeargs(bc::Broadcasted) = make_makeargs(()->(), bc.args)
 end
 @inline function make_makeargs(makeargs, t::Tuple{<:Broadcasted,Vararg{Any}})
     bc = t[1]
-    let makeargs = make_makeargs(makeargs, tail(t))
+    let makeargs = make_makeargs(makeargs, tail(t)), f = bc.f
         let makeargs = make_makeargs(makeargs, bc.args)
             headargs, tailargs = make_headargs(bc.args), make_tailargs(bc.args)
             return @inline function(args::Vararg{Any,N}) where N
                 args1 = makeargs(args...)
                 a, b = headargs(args1...), tailargs(args1...)
-                (bc.f(a...), b...)
+                (f(a...), b...)
             end
         end
     end

@Keno
Copy link
Contributor

Keno commented Oct 12, 2018

Could you submit this change to base and use a version check? Overriding a method in Base in this way is rather frowned upon.

@Keno
Copy link
Contributor

Keno commented Oct 12, 2018

(Also it slows down loading Flux)

@vchuravy
Copy link

Maybe avoiding flattening altogether would be good? Why are you calling flatten on the broadcast?
I am happy to help writing the recursive version ;)

@MikeInnes
Copy link
Member

It's for AD through broadcast, for which it's quite a lot easier to inject the Dual numbers and pull out gradients from a tuple rather than a tree. But if you want to make that work on the tree that'd be an improvement I'm sure.

@Keno
Copy link
Contributor

Keno commented Oct 15, 2018

Yes, I have no problem with the change, just the piracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants