Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two errors on Julia 1.6 #897

Closed
mcabbott opened this issue Feb 3, 2021 · 12 comments
Closed

Two errors on Julia 1.6 #897

mcabbott opened this issue Feb 3, 2021 · 12 comments

Comments

@mcabbott
Copy link
Member

mcabbott commented Feb 3, 2021

These two probably unrelated things work fine on 1.5, but give errors on 1.6 or master.

First, only for long enough generators:

julia> using Zygote, LinearAlgebra

julia> gradient(x -> sum(norm, collect(eachcol(x))), rand(3,4))[1]
3×4 Matrix{Float64}:
 0.0980459  0.916691  0.397663  0.548254
 0.78843    0.216233  0.900898  0.0887359
 0.60726    0.336038  0.173914  0.831591

julia> gradient(x -> sum(norm, collect(eachcol(x))), rand(3,400))[1] # only size changed
ERROR: Can't differentiate loopinfo expression
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] macro expansion
    @ ./simdloop.jl:79 [inlined]
  [3] Pullback
    @ ./reduce.jl:245 [inlined]
  [4] (::typeof((mapreduce_impl)))(Δ::Float64)
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
  [5] Pullback
    @ ./reduce.jl:259 [inlined]
  [6] (::typeof((mapreduce_impl)))(Δ::Float64)
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
  [7] Pullback
    @ ./reduce.jl:417 [inlined]
  [8] (::typeof((_mapreduce)))(Δ::Float64)
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
  [9] Pullback
    @ ./reducedim.jl:330 [inlined]
 [10] Pullback
    @ ./reducedim.jl:322 [inlined]
 [11] (::typeof((mapreduce)))(Δ::Float64)
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
 [12] Pullback
    @ ./reducedim.jl:890 [inlined]
 [13] (::typeof((#_sum#717)))(Δ::Float64)
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
 [14] Pullback
    @ ./reducedim.jl:890 [inlined]
 [15] (::typeof((_sum)))(Δ::Float64)
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
 [16] Pullback
    @ ./reducedim.jl:886 [inlined]
 [17] Pullback
    @ ./REPL[288]:1 [inlined]
 [18] (::typeof((#359)))(Δ::Float64)
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
 [19] (::Zygote.var"#43#44"{typeof((#359))})(Δ::Float64)
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface.jl:40
 [20] gradient(f::Function, args::Matrix{Float64})
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface.jl:58
 [21] top-level scope
    @ REPL[288]:1

And second, seemingly for any structured matrix:

julia> using Zygote, LinearAlgebra

julia> gradient(x -> sum(sin, Diagonal(x)), rand(2))[1]
ERROR: ArgumentError: cannot set off-diagonal entry (1, 2) to a nonzero value (1.0)
Stacktrace:
  [1] setindex!
    @ ~/.julia/dev/julia/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/diagonal.jl:101 [inlined]
  ...
   [10] materialize!
    @ ./broadcast.jl:890 [inlined]
 [11] (::Zygote.var"#382#384"{Diagonal{Float64, Vector{Float64}}, Tuple{CartesianIndex{2}}})(dy::Float64)
    @ Zygote ~/.julia/dev/Zygote/src/lib/array.jl:42
 [12] (::Zygote.var"#2221#back#378"{Zygote.var"#382#384"{Diagonal{Float64, Vector{Float64}}, Tuple{CartesianIndex{2}}}})(Δ::Float64)
    @ Zygote ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59
 [13] Pullback
    @ ./abstractarray.jl:1090 [inlined]
 [14] (::typeof((iterate)))(Δ::Tuple{Float64, Tuple{Nothing, NamedTuple{(:I,), Tuple{Tuple{Nothing, Nothing}}}}})
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
...
 [22] (::typeof((mapfoldl)))(Δ::Float64)
    @ Zygote ~/.julia/dev/Zygote/src/compiler/interface2.jl:0
...
 [36] top-level scope
    @ REPL[289]:1

julia> gradient(x -> sum(sin, Symmetric(x)), rand(2,2))[1]
ERROR: ArgumentError: Cannot set a non-diagonal index in a symmetric matrix
Stacktrace:
  [1] setindex!
    @ ~/.julia/dev/julia/usr/share/julia/stdlib/v1.7/LinearAlgebra/src/symmetric.jl:225 [inlined]
...
@mcabbott mcabbott changed the title An error on Julia 1.6 Two errors on Julia 1.6 Feb 3, 2021
@mcabbott
Copy link
Member Author

mcabbott commented Feb 5, 2021

Maybe worth adding: the following work fine, with either explicit broadcasting, or with map instead. So perhaps the error is specific to sum(f, xs).

gradient(x -> sum(norm.(collect(eachcol(x)))), rand(3,400))[1] 

gradient(x -> sum(map(norm, collect(eachcol(x)))), rand(3,400))[1]

gradient(x -> sum(sin.(Diagonal(x))), rand(2))[1]

gradient(x -> sum(map(sin, Diagonal(x))), rand(2))[1]

@haozhangphd
Copy link

I'm not sure if this is a Julia issue or a Zygote issue. I checked all the Julia files in the above stack trace, it seems the only relevant commit at the Julia side is JuliaLang/julia#36188, but I didn't find anything obvious that would break Zygote. Also prod/maximum/minimum are also covered by this commit, but there is no similar errors for these three functions. I have not checked for Julia commits unrelated to the source files mentioned in the stack trace.

On the other hand, the following expression gradient(x -> mapreduce(norm, Base.add_sum, eachcol(x)), rand(3,400)) fails with error message Can't differentiate loopinfo expression on both Julia 1.5 and 1.6. I cannot understand why it is the case, as sum is defined as sum(f, a) = mapreduce(f, add_sum, a) for Julia 1.5, and gradient(x -> sum(norm, eachcol(x)), rand(3,400)) works without error on Julia 1.5.

Another weird issue is that the expression gradient(x -> mapreduce(norm, Base.mul_prod, eachcol(x)), rand(3,400)) fails with both Julia 1.5 and 1.6, but gradient(x -> prod(norm, eachcol(x)), rand(3,400)) works on both versions. Again the the later case is defined in terms of the former.

@haozhangphd
Copy link

The code posted at #157 shows the same error message on Julia 1.6 but not on Julia 1.5.

@mcabbott
Copy link
Member Author

mcabbott commented Apr 5, 2021

Thanks for investigating.

The reason sum(f, xs) and mapreduce(f, Base.add_sum, xs) can differ is that Zygote's rules are attached to the former, not the latter. Which hints that the problem must be with that rule, which was last adjusted in #684.

Reversing that change appears to fix it, on 1.6. I think the version which takes a keyword isn't being run at all, and somehow doesn't break on 1.5.

julia> @eval Zygote begin
       function _pullback(cx::AContext, ::typeof(sum), f, xs::AbstractArray)
          y, back = pullback(cx, ((f, xs) -> sum(f.(xs))), f, xs)
          @show y   # just to check this gets run
          y, ȳ -> (nothing, back(ȳ)...)
       end
       end
_pullback (generic function with 492 methods)

julia> gradient(x -> sum(norm, collect(eachcol(x))), rand(3,400))[1] # only size changed
y = 386.0308068393151
3×400 Matrix{Float64}:
 0.913379  0.269357  0.55777   0.666639  0.354963    0.0933578  0.740294  0.230064  0.236234
 0.167705  0.501518  0.10804   0.687187  0.189529     0.674529   0.487126  0.901705  0.178678
 0.370963  0.822148  0.822934  0.288732  0.915467     0.732322   0.463328  0.366059  0.955127

julia> gradient(x -> sum(sin, Diagonal(x)), rand(2))[1]
y = 1.3417542054446077
2-element Vector{Float64}:
 0.8518952755965761
 0.5751584407885143

@haozhangphd
Copy link

Yes the rule for sum is not used at all in Julia 1.6. I set a breakpoint at line 277 of array.jl in Zygote. The breakpoint is only hit with Julia 1.5 and not Julia 1.6.

Within _pullback(ctx::AContext, f, args...), the variables on Julia 1.5 is as follows:

About to run: <(ZygoteRules._pullback)(Zygote.Context(nothing), Base.#sum#628, Colon(), sum, sum, SubArray{Float64,1...>
2|debug> fr
[2] _pullback(ctx, f, args) at ~/.julia/packages/Zygote/lwmfx/src/compiler/interface2.jl:9
  | ctx::Zygote.Context = Zygote.Context(nothing)
  | f::typeof(sum) = sum
  | args::Tuple{typeof(sum),Array{SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true},1}} = <(sum, SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}[[0.6741320...>

And on Julia 1.6 it is as follows:

About to run: <(ZygoteRules._pullback)(Zygote.Context(nothing), sum, sum, SubArray{Float64, 1, Matrix{Float64}, Tupl...>
1|debug> fr
[1] _pullback(ctx, f, args) at ~/.julia/packages/Zygote/lwmfx/src/compiler/interface2.jl:9
  | ctx::Zygote.Context = Zygote.Context(nothing)
  | f::typeof(sum) = sum
  | args::Tuple{typeof(sum), Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}} = <(sum, SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}[[0.598...>

I do not see any difference between the two except the first line (the line with "About to run..."). In particular, the second argument is displayed as Base.#sum#628 on Julia 1.5, but only as sum on 1.6. I am not sure if this will invalidate the Zygote rule though...

@DhairyaLGandhi
Copy link
Member

Is there a separate function to hit instead?

@haozhangphd
Copy link

I don't think so. The lowered code becomes the following on 1.6

In _pullback(ctx, f, args) at ~/.julia/packages/Zygote/lwmfx/src/compiler/interface2.jl:9
  1  1 ─       $(Expr(:meta, :inline))
  2  │   %2  = Core.getfield(_4, 1)
  3  │   %3  = Core.getfield(_4, 2)
  4  │   %4  = ZygoteRules._pullback(_2, $(QuoteNode(NamedTuple)))
  5  │   %5  = Base.getindex(%4, 1)
  6  │   %6  = Base.getindex(%4, 2)
  7  │   %7  = ZygoteRules._pullback(_2, $(QuoteNode(pairs)), %5)
  8  │   %8  = Base.getindex(%7, 1)
  9  │   %9  = Base.getindex(%7, 2)
 10  │   %10 = ZygoteRules._pullback(_2, $(QuoteNode(Base.var"#sum#680")), $(QuoteNode(Colon())), %8, _3, %2, %3)
 11  │   %11 = Base.getindex(%10, 1)
 12  │   %12 = Base.getindex(%10, 2)
 13  │   %13 = Core.tuple(%9, %12, %6)
 14  │   %14 = (typeof(∂(sum)))(%13)
 15  │   %15 = Core.tuple(%11, %14)
 16  └──       return %15

The line ZygoteRules._pullback(_2, $(QuoteNode(Base.var"#sum#680")), $(QuoteNode(Colon())), %8, _3, %2, %3) gets lowered to

In _pullback(ctx, f, args) at ~/.julia/packages/Zygote/lwmfx/src/compiler/interface2.jl:9
  1  1 ─       $(Expr(:meta, :inline))
  2  │   %2  = Core.getfield(_4, 1)
  3  │   %3  = Core.getfield(_4, 2)
  4  │         Core.getfield(_4, 3)
  5  │   %5  = Core.getfield(_4, 4)
  6  │   %6  = Core.getfield(_4, 5)
  7  │   %7  = (Vector{Any})()
  8  │   %8  = (Vector{Any})()
  9  │   %9  = ZygoteRules._pullback(_2, $(QuoteNode(NamedTuple)))
 10  │   %10 = Base.getindex(%9, 1)
 11  │   %11 = Base.getindex(%9, 2)
 12  │   %12 = ZygoteRules._pullback(_2, $(QuoteNode(merge)), %10, %3)
 13  │   %13 = Base.getindex(%12, 1)
 14  │   %14 = Base.getindex(%12, 2)
 15  │   %15 = ZygoteRules._pullback(_2, $(QuoteNode(isempty)), %13)
 16  │   %16 = Base.getindex(%15, 1)
 17  │   %17 = Base.getindex(%15, 2)
 18  └──       goto #3 if not %16
 19  2 ─ %19 = ZygoteRules._pullback(_2, $(QuoteNode(Base._sum)), %5, %6, %2)
 20  │         _5 = Base.getindex(%19, 1)
 21  │   %21 = Base.getindex(%19, 2)
 22  │         Zygote._push!(%8, %21)
 23  │         _6 = 0x01
 24  └──       goto #4
 25  3 ─ %25 = Core.kwfunc($(QuoteNode(Base._sum)))
 26  │   %26 = ZygoteRules._pullback(_2, %25, %13, $(QuoteNode(Base._sum)), %5, %6, %2)
 27  │         _5 = Base.getindex(%26, 1)
 28  │   %28 = Base.getindex(%26, 2)
 29  │         Zygote._push!(%7, %28)
 30  │         _6 = 0x02
 31  └──       goto #4
 32  4 ┄ %32 = Core.tuple(%17, %11, %14, %8, %7, _6)
 33  │   %33 = (typeof(∂(#sum#680)))(%32)
 34  │   %34 = Core.tuple(_5, %33)
 35  └──       return %34

The line ZygoteRules._pullback(_2, $(QuoteNode(Base._sum)), %5, %6, %2) in turn has a very convoluted call-chains, and eventually calls ZygoteRules._pullback(_2, $(QuoteNode(Base._mapreduce)), %2, %3, %8, %5), at which point I gave up tracing...

On 1.5, we have the following, which is a lot simpler:

In _pullback(ctx, f, args) at ~/.julia/packages/Zygote/lwmfx/src/compiler/interface2.jl:9
  1  1 ─      $(Expr(:meta, :inline))
  2  │   %2 = (getfield)(args, 1)
  3  │   %3 = (getfield)(args, 2)
  4  │   %4 = (ZygoteRules._pullback)(ctx, Base.#sum#628, Colon(), f, %2, %3)
  5  │   %5 = (getindex)(%4, 1)
  6  │   %6 = (getindex)(%4, 2)
  7  │   %7 = (tuple)(%6)
  8  │   %8 = (typeof(∂(sum)))(%7)
  9  │   %9 = (tuple)(%5, %8)
 10  └──      return %9

And the line (ZygoteRules._pullback)(ctx, Base.#sum#628, Colon(), f, %2, %3) just calls _pullback(cx::AContext, kwtype, kws, ::typeof(sum), f, xs::AbstractArray) in array.jl.

@DhairyaLGandhi
Copy link
Member

Right thanks, I meant it is following a different code path, which this seems like it is. Julia seems to be playing around with kwarg handling much earlier now. Maybe we can get away with wrapping the function as a QuoteNode cc @simeonschaub who might be interested in this

@simeonschaub
Copy link
Member

A bit of a shot in the dark, but are you sure the only difference is the Julia version? I don't really know about any kwargs changes in Base, is there any chance this could be due to JuliaDiff/ChainRulesCore.jl#308? Otherwise, I might be able to take a look at this tomorrow.

@haozhangphd
Copy link

I was testing on two almost identical Julia environments. The only difference between the two environments is the Julia version itself. Also both environments have the same Zygote version (0.6.8) as well as the same ChainRulesCore version (0.9.37). Thus I think the different Julia version is the most likely cause here.

simeonschaub added a commit to simeonschaub/Zygote.jl that referenced this issue Apr 26, 2021
addresses the second part of FluxML#897
simeonschaub added a commit to simeonschaub/Zygote.jl that referenced this issue Apr 26, 2021
simeonschaub added a commit to simeonschaub/Zygote.jl that referenced this issue Apr 26, 2021
bors bot added a commit that referenced this issue Apr 30, 2021
956: fix differentiation of loopinfo exprs r=DhairyaLGandhi a=simeonschaub

addresses the first part of #897

Co-authored-by: Simeon Schaub <[email protected]>
bors bot added a commit that referenced this issue Apr 30, 2021
955: fix adjoint for sum r=DhairyaLGandhi a=simeonschaub

addresses the second part of #897

Co-authored-by: Simeon Schaub <[email protected]>
bors bot added a commit that referenced this issue May 5, 2021
955: fix adjoint for sum r=DhairyaLGandhi a=simeonschaub

addresses the second part of #897

Co-authored-by: Simeon Schaub <[email protected]>
@ToucheSir
Copy link
Member

Just tried the failing cases up top and they all return a result now. Can we call this closed by #955 and #956?

@simeonschaub
Copy link
Member

Yes, both should be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants