Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recursive stagedfunction and type inference #8853

Closed
Jutho opened this issue Oct 30, 2014 · 26 comments
Closed

recursive stagedfunction and type inference #8853

Jutho opened this issue Oct 30, 2014 · 26 comments
Labels
bug Indicates an unexpected problem or unintended behavior needs tests Unit tests are required for this change

Comments

@Jutho
Copy link
Contributor

Jutho commented Oct 30, 2014

I was trying to explore the possibility of using staged functions with tuple arguments in a recursive manner, to reimplement e.g. #6517 . Therefore, I first tried to implement the Fibonacci function using staged functions, where I have an nt::NTuple{N,Int} argument, where the first integer nt[1] corresponds to the Fibonacci index/parameter, and I just do some trivial operations on the other values in the tuple.

This is the code

stagedfunction fibrec{N}(nt::NTuple{N,Int})
    statements=[Expr(:(=),symbol("n$i"),Expr(:call,:+,Expr(:ref,:nt,i),i-1)) for i=1:N]
    args1=Any[symbol("n$i") for i=1:N]
    args1[1]=Expr(:call,:-,:n1,1)
    args2=Any[symbol("n$i") for i=1:N]
    args2[1]=Expr(:call,:-,:n1,2)
    rec1=Expr(:call,:fibrec,Expr(:tuple,args1...))
    rec2=Expr(:call,:fibrec,Expr(:tuple,args2...))

    ex=Expr(:if,:(n1<2),:(return n1),Expr(:return,Expr(:call,:+,rec1,rec2)))
    return Expr(:block,statements...,ex)
end

and for something like N=3 this returned expression is

    n1 = nt[1] + 0
    n2 = nt[2] + 1
    n3 = nt[3] + 2
    if n1 < 2
        return n1
    else 
        return fibrec((n1 - 1,n2,n3)) + fibrec((n1 - 2,n2,n3))
    end

which looks ok.

However, trying to run gives rise to

ERROR: stack overflow
 in _methods at ./reflection.jl:100
 in _methods at /usr/local/julia/usr/lib/julia/sys.dylib
 in _methods at ./reflection.jl:117
 in _methods at /usr/local/julia/usr/lib/julia/sys.dylib
 in _methods at ./reflection.jl:117
 in _methods at /usr/local/julia/usr/lib/julia/sys.dylib
 in _methods at ./reflection.jl:97
 in _methods at /usr/local/julia/usr/lib/julia/sys.dylib
 in abstract_call_gf at ./inference.jl:664
 in abstract_call at ./inference.jl:831
 in abstract_eval_call at ./inference.jl:923
 in abstract_eval at ./inference.jl:948
 in abstract_interpret at ./inference.jl:1103
 in typeinf at ./inference.jl:1432
 in typeinf at /usr/local/julia/usr/lib/julia/sys.dylib
 in typeinf at ./inference.jl:1238
 in abstract_call_gf at ./inference.jl:718
 in abstract_call at ./inference.jl:831
 in abstract_eval_call at ./inference.jl:923
 in abstract_eval at ./inference.jl:948
 in abstract_eval_arg at ./inference.jl:887
 in abstract_eval_call at ./inference.jl:899
 in abstract_eval at ./inference.jl:948
 in abstract_eval_arg at ./inference.jl:887
 in typeinf at ./inference.jl:1495
 in typeinf at /usr/local/julia/usr/lib/julia/sys.dylib
 in typeinf at ./inference.jl:1238
...

and from there on it repeats this last block starting at in abstract_call_gf at ./inference.jl:718 many times.

@mlubin
Copy link
Member

mlubin commented Oct 30, 2014

👍 Would be cool to see this working.

@Jutho
Copy link
Contributor Author

Jutho commented Nov 4, 2014

I can confirm that a similar thing occurs when just building a recursive staged function, but where the return value is not the recursive call and so the return type can be inferred without having to go through the recursion, as for the function permutedimsblock! in the following gist:
https://gist.github.com/Jutho/e10e39a91e1a473315ab

In fact, running

A=randn(10,10);
B=randn(10,10);
permutedimsnew!(B,A,[2,1])

just hangs julia. I didn't get it to throw a stack overflow in any reasonable time and had to force it to give up with Ctrl+C.

@Jutho
Copy link
Contributor Author

Jutho commented Dec 21, 2014

Bump.
I think it would be nice to fix this and could lead to some great examples of using stagedfunction. I checked what is going wrong. Essentially, the recursion check in typinf is failing, the reason being is(f.ast,ast0) on line https://github.com/JuliaLang/julia/blob/master/base/inference.jl#L1289 returning false. This is caused because every inference run on a staged function calls func_for_method, which for a staged function calls jl_instantiate_staged ( https://github.com/JuliaLang/julia/blob/master/base/inference.jl#L604 ) every time. While this function always returns the same AST, it does so with a new vector, such that is(f.ast,ast0) fails because it only checks whether the addresses of these objects are the same.

I've checked and simply replacing is(f.ast,ast0) by f.ast==ast0 ( line https://github.com/JuliaLang/julia/blob/master/base/inference.jl#L1289 ) fixes this problem. However, this is probably not the recommended solution? I have no idea about possible side effects of this ...

@timholy
Copy link
Member

timholy commented Dec 22, 2014

I can't comment on the acceptability of your proposed fix (though it seems correct/unavoidable), but a tiny tip: when linking to line numbers, hit 'y' on the source-code page and it will specialize the URL to the particular commit you're looking at. That way your line numbers don't go out-of-date as the codebase changes.

@Jutho
Copy link
Contributor Author

Jutho commented Dec 22, 2014

Thanks for the tip, I'll try to remember. If I don't get any response by tonight, I will just transform this fix into a pull request and see what that gives...

@timholy
Copy link
Member

timholy commented Dec 22, 2014

I guess the only problem with it might be performance. But if == (or a custom function to perform this particular comparison) first checks addresses and quits if those pass, then you won't even have a performance hit.

@Jutho
Copy link
Contributor Author

Jutho commented Dec 22, 2014

I guess (is(f.ast,ast0) || f.ast==ast0) could do?

@jakebolewski
Copy link
Member

In my tests (is(f.ast,ast0) || f.ast==ast0) is actually slightly faster than what we have currently (counterintuitively). The real fix is to cache the output of jl_instantiate_statged properly, but that is a nontrivial change.

@Jutho
Copy link
Contributor Author

Jutho commented Jan 18, 2015

This seems to be still an issue. The fix in #9440 fixed the example above, but the following still fails:

using Base.Cartesian
stagedfunction copy_strided_recursive!{N}(D,startD,stridesD::CartesianIndex{N},S,startS,stridesS::CartesianIndex{N},dims::CartesianIndex{N})
    return quote
        if prod(dims)<2048
            # copy_strided!(D,startD,stridesD,S,startS,stridesS,dims)
        else
            d = $N
            minstrides = min(stridesD,stridesS)
            valued = (dims[d]-1)*minstrides[d]
            @nif $N n->(valued<(dims[n]-1)*minstrides[n]) n->begin
                d = n
                valued = (dims[d]-1)*minstrides[d]
            end

            newdim = dims[d]>>1
            newdims = @ncall $N $dims n->(n==d ? newdim : dims[n])
            copy_strided_recursive!(D,startD,stridesD,S,startS,stridesS,newdims)

            startD += newdim*stridesD[d]
            startS += newdim*stridesS[d]
            newdim = dims[d]-newdim
            newdims = @ncall $N $dims n->(n==d ? newdim : dims[n])
            copy_strided_recursive!(D,startD,stridesD,S,startS,stridesS,newdims)
        end
        return D
    end
end

@Jutho
Copy link
Contributor Author

Jutho commented Jan 18, 2015

Strangely enough, acting with code_typed on this example returns the correct, type inferred code. However, actually trying to run it still triggers an infinite recursion in type inference.

@Jutho
Copy link
Contributor Author

Jutho commented Jan 18, 2015

Maybe somebody with some better knowledge of how type inference can briefly explain me if code_typed and actually calling a function take the same code path through type inference or not.

@Jutho Jutho added the bug Indicates an unexpected problem or unintended behavior label Jan 18, 2015
@Jutho
Copy link
Contributor Author

Jutho commented Jan 18, 2015

This might be related to #8504 . The reason that code_typed worked while running it did not, was because this was being called from within a function where N was an argument and hence its value could not be inferred. Rewriting the function such that N is inferred correctly using a ::Val{N} argument allows to run the function without problems.

Maybe somebody who understands inference.jl knows how this influences the rest of type inference and why recursion is recognized in one case and not in the other case.

@Jutho
Copy link
Contributor Author

Jutho commented Jan 25, 2015

Current status:
using the following two test functions:

function test1(N)
    A=Array(Float64,ntuple(N,n->2)::NTuple{N,Int});
    stridesA=CartesianIndex(strides(A))
    dims=CartesianIndex(size(A))
    copy_strided_recursive!(A,1,stridesA,A,1,stridesA,dims)
end
function test2{N}(::Val{N})
    A=Array(Float64,ntuple(N,n->2)::NTuple{N,Int});
    stridesA=CartesianIndex(strides(A))
    dims=CartesianIndex(size(A))
    copy_strided_recursive!(A,1,stridesA,A,1,stridesA,dims)
end

gives rise to the following behavior

  • @code_typed test1(3) -> N cannot be inferred, stagedfunction compilation is tried twice, but fails since N is unknown, code_typed finished with Any argument
  • test1(3)-> hangs in infinite inference recursion, the stagedfunction keeps getting compiled with the correct types, so it's the detection of recursion which fails
  • @code_typed test2(Val{3}()) -> on fresh start, returns fully inferred function after compiling the stagedfunction several times, sometimes with correct and sometimes with Any argument types. This depends on the value of N and on the precise statements inside the function. However, all of this is only after a fresh start; after running something like @code_typed test1(3) first, it hangs and keeps recompiling the stagedfunction with the correct types, so the recursion detection fails.

I tried debugging inference.jl but failed to detect the problem (or get any wiser about what all of this code is doing), despite the useful HackThatBase tool of @ihnorton .

However, caching the stagedfunction generation as suggested by @jakebolewski seems to fix the problem completely, at least in the following naive way as I implemented it. I just cached it in fun_for_method by running the following (just in REPL, doesn't even require to recompile or use HackThatBase):

let stagedcache=Dict{Any,Any}()
    function Base.func_for_method(m::Method, tt, env)
        if !m.isstaged
            return m.func.code
        elseif haskey(stagedcache,(m,tt,env))
            return stagedcache[(m,tt,env)].code
        else
            f=ccall(:jl_instantiate_staged,Any,(Any,Any,Any),m,tt,env)
            stagedcache[(m,tt,env)]=f
            return f.code
        end
    end
end

fixes all of the above problems. Now, since this is probably not the best way or even place to cache the stagedfunction generation, I'll leave it up to somebody who knows the julia internals to implement this in a proper way, but I would like to stress that this is really pertinent. I would not consider the stagedfunction implementation complete as a 0.4-project without fixing this (and #8504 ).

@timholy
Copy link
Member

timholy commented Jan 25, 2015

Totally agreed that the lingering bugs in stagedfunctions desperately need fixing; there's quite a lot of cleanup of base that should become possible when that happens. (E.g., #9098, needed for #9150, etc).

However, your caching implementation doesn't look so bad to me---indeed, this might be the right way to do it. Care to submit it as a PR?

@timholy
Copy link
Member

timholy commented Jan 25, 2015

One small concern: stagedcache might become really, really big.

@Jutho
Copy link
Contributor Author

Jutho commented Jan 25, 2015

That's why I wanted some input, I don't know what I need to store, which part of the keys are really necessary, where else this could be used, etc. But maybe a PR with RFC is the best way to get the discussion going.

@timholy
Copy link
Member

timholy commented Jan 25, 2015

I'm not sure about the env part (others will have to comment, I don't even remember what env is), but m and tt are clearly essential. I would say the best solution to the size problem---if it is a problem---would be an LRU-type design. Rather than using a dict, perhaps you could have a cyclic array with fixed capacity of 100 or so.

But it's also not entirely obvious this is a problem: after all, julia has to cache the compiled function, so maybe there isn't a real problem with having a separate cache of the AST.

@Jutho
Copy link
Contributor Author

Jutho commented Feb 12, 2015

I forgot to close this issue after PR #9921 got merged, but it now seems we may need to revert that PR because of #10178, so I'll leave it up to somebody else to decide whether this can be closed.

@timholy
Copy link
Member

timholy commented Feb 12, 2015

I don't think it should be reverted, because redefinition is a less important feature than the fixes this provides. Agreed it's a troublesome situation, however.

I suspect a lot of things would be fixable if we added a version number to methods.

@Jutho
Copy link
Contributor Author

Jutho commented Feb 12, 2015

I agree, but the solution to this issue should definitely be revisited by somebody with a better knowledge/understanding of inference.jl.

@Jutho
Copy link
Contributor Author

Jutho commented May 6, 2015

Yesterday, I ran into a case where the fix of #9921 was not sufficient and this problem popped up again; i.e. a recursive generated function giving rise to an infinite recursion in type inference.

The expression body of my generated function contained a macro, which defined a few local variables, giving rise gensym calls. Indeed, putting a @show in front of the generated expression showed that the @generated function was compiled over and over again, always yielding the same expression up to the name of the variables introduced by the macro. However, the whole point of #9921 was that the @generated function would not be compiled a second time for the same arguments, so I don't know where it goes wrong. Nevertheless, this was clearly the issue since there were two ways to stop type inference from getting stuck:

  1. rewrite the function such that there were no random variable names
  2. add dummy arguments to the function such that the total number of arguments is bigger than 8, I guess at that point type inference gives up?

On a more general note, I was wondering about the following. All of my recursive functions (generated or not) are of the form

function modify_recursive!(arg1, more args ...)
    if some condition
        modify_base!(arg1, more args ...)
    else
        modify_recursive!(arg1, more args for smaller case)
    end
    return arg1
end

I am actually wondering why type inference even bothers trying to compute the return type of the recursive call in the else block of the above definition. It is not used anywhere. So if type inference would only investigate those lines where it needs to infer the types involved to generate optimised code, it could even skip that call. Function calls of which the output is discarded could be ignored in the inference proces, no?

@timholy
Copy link
Member

timholy commented May 6, 2015

Without having any immediate answers to your questions, I'll advertise that my new favorite way to understand what's going inside inference.jl is @ihnorton's amazingly-useful HackThatBase, especially in conjunction with @toivoh's Debug.

@Jutho
Copy link
Contributor Author

Jutho commented May 6, 2015

I've used both before and they are indeed very useful. I see that you have been committing to those recently to bring them up to date with the latest master. So thanks and 👍 to all three of you!

@Jutho
Copy link
Contributor Author

Jutho commented May 6, 2015

It seems like the stagedfunction caching fails because it compares the method, the argument types and the env. Now, consider the following test case:

@generated function f{N}(a::NTuple{N,Int})
#code
end
tt, env, m = Base._methods(f,Tuple{Tuple{Int,Int,Int}},-1)[1]
tt2, env2, m2 = Base._methods(f,Tuple{Tuple{Int,Int,Int}},-1)[1]
@show env  #->   svec(N,3)
@show env2 #-> svec(N,3)
@show m==m2, tt==tt2, env==env2 #->  true, true, false

The problem is thus with comparing SimpleVector types. I guess this needs to be fixed, but wouldn't know where to start. I hope @JeffBezanson doesn't mind I involve him in this discussion?

@Jutho
Copy link
Contributor Author

Jutho commented May 6, 2015

Well, it wasn't actually that hard, just adding the appropriate definition in base.jl. My apologies for the spam @JeffBezanson , especially in light of # 8839 (deliberately broken link).

@vtjnash
Copy link
Member

vtjnash commented Mar 25, 2016

this seems to be fixed now

@vtjnash vtjnash closed this as completed Mar 25, 2016
@tkelman tkelman added the needs tests Unit tests are required for this change label Mar 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior needs tests Unit tests are required for this change
Projects
None yet
Development

No branches or pull requests

6 participants