Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference of view may take very long #20848

Closed
martinholters opened this issue Mar 1, 2017 · 6 comments
Closed

Inference of view may take very long #20848

martinholters opened this issue Mar 1, 2017 · 6 comments
Labels
compiler:inference Type inference

Comments

@martinholters
Copy link
Member

I couldn't find an issue specific to this, but ref #20832 (especially #20832 (comment)) and #20334 (comment).

Basic example:

julia> @time view(1:5,[2 3 4 1]);
 21.413390 seconds (90.22 M allocations: 4.049 GiB, 3.75% gc time)

julia> @time view(1:5,[2 3 4 1]);
  0.000080 seconds (17 allocations: 592 bytes)

In this case, it is important that this happens in global scope:

# in a new session
julia> foo() = view(1:5,[2 3 4 1])
foo (generic function with 1 method)

julia> @time foo();
  0.086676 seconds (60.93 k allocations: 2.811 MiB)

julia> @time foo();
  0.000005 seconds (8 allocations: 352 bytes)

Does the REPL force inference of view for other types than the actual parameter types or why is that?

Also note that this is responsible for an arbitrary restriction in abstract_apply being necessary. Running make test-subarray on master, I get

Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
subarray (1)  |  257.68  |  8.81  |  3.4 | 14845.79   | 748.68  

Limiting that restriction just to view with

--- a/base/inference.jl
+++ b/base/inference.jl
@@ -1474,7 +1474,7 @@ function abstract_apply(af::ANY, fargs, aargtypes::Vector{Any}, vtypes::VarTable
     splitunions = countunionsplit(aargtypes) <= sv.params.MAX_APPLY_UNION_ENUM
     ctypes = Any[Any[]]
     for i = 1:nargs
-        if aargtypes[i] === Any
+        if aargtypes[i] === Any && istopfunction(_topmod(sv), af, :view)
             # bail out completely and infer as f(::Any...)
             # instead could keep what we got so far and just append a Vararg{Any} (by just
             # using the normal logic from below), but that makes the time of the subarray

I get

Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
subarray (1)  |  267.43  | 10.15  |  3.8 | 17928.75   | 742.89  

A little longer, so a little more time spent in inference not compensated for by better run-time due to improved type-stability. Not too bad, not too surprising. But compare with removing that restriction altogether with

--- a/base/inference.jl
+++ b/base/inference.jl
@@ -1474,14 +1474,6 @@ function abstract_apply(af::ANY, fargs, aargtypes::Vector{Any}, vtypes::VarTable
     splitunions = countunionsplit(aargtypes) <= sv.params.MAX_APPLY_UNION_ENUM
     ctypes = Any[Any[]]
     for i = 1:nargs
-        if aargtypes[i] === Any
-            # bail out completely and infer as f(::Any...)
-            # instead could keep what we got so far and just append a Vararg{Any} (by just
-            # using the normal logic from below), but that makes the time of the subarray
-            # test explode
-            ctypes = Any[Any[Vararg{Any}]]
-            break
-        end
         ctypes´ = []
         for ti in (splitunions ? uniontypes(aargtypes[i]) : Any[aargtypes[i]])
             cti = precise_container_type(fargs[i], ti, vtypes, sv)

Now that gives... [drumroll]...

Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
subarray (1)  | 2274.95  | 66.29  |  2.9 | 43535.07   | 1424.08 

So inferring view more often for more specific argument types than just Any... (but still ending with Any...) has a dramatic impact.

I conjecture it is worthwhile figuring out the root cause here, to either (preferably) improve inference (add some less arbitrary limitations maybe) or obtain some lessons on how to write inference-friendly code and apply those to the view machinery.

@ararslan ararslan added the compiler:inference Type inference label Mar 1, 2017
@martinholters
Copy link
Member Author

martinholters commented Mar 2, 2017

O, I think I have confirmed that inference is the bottleneck here and not some other part of codegen. But there is a twist I don't understand:

julia> @time Base.return_types(view, (UnitRange{Int64}, Array{Int64, 2}));
  0.284807 seconds (31.38 k allocations: 1.549 MiB)

julia> @time Base.return_types(view, (UnitRange{Int64}, Vararg{Array{Int64, 2}, N} where N));
 28.149748 seconds (92.16 M allocations: 4.107 GiB, 3.27% gc time)

julia> @time view(1:5, [2 3 4 1]);
  0.220726 seconds (239.41 k allocations: 11.738 MiB)

So inference of view(::UnitRange{Int64}, ::Array{Int64, 2}) is reasonably fast, but calling view(1:5, [2 3 4 1]) at the REPL apparently leads to inference of view(::UnitRange{Int64}, ::Vararg{Array{Int64, 2}}). Anyone venture an explanation here?

EDIT: Issue at #20867.

@martinholters martinholters changed the title Inference/codegen of view may take very long Inference of view may take very long Mar 2, 2017
@martinholters
Copy link
Member Author

martinholters commented Mar 2, 2017

Chasing the source of the slowness a bit further, namely which method is the problematic one, I went through _maybe_reshape_parent{N}(A::AbstractArray, ::NTuple{N, Bool}) and reshape{N}(parent::AbstractArray, ndims::Type{Val{N}}) to find the following oddity for rdims:

julia> Base.return_types(Base.rdims, (Tuple{}, Tuple{Base.OneTo{Int64}}, Type{Val{1}}))
1-element Array{Any,1}:
 Tuple{Base.OneTo{Int64}}

julia> Base.return_types(Base.rdims, (Tuple{}, Tuple{Base.OneTo{Int64}}, Type{Val{2}}))
1-element Array{Any,1}:
 Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}

julia> Base.return_types(Base.rdims, (Tuple{}, Tuple{Base.OneTo{Int64}}, Type{Val{N}} where N))
1-element Array{Any,1}:
 Tuple{Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Vararg{Base.OneTo{Int64},N}} where N

For the last one, I think the answer should be Tuple{Vararg{Base.OneTo{Int64},N}} where N. Not sure whether this causes the slowness, but certainly worth fixing.

EDIT: issue at #20869.

@timholy
Copy link
Member

timholy commented Mar 2, 2017

@martinholters, really glad you're chasing this down. I've seen a couple of cases where inference slowness is preventing us from merging things that would otherwise be improvements (e.g., JuliaArrays/AxisArrays.jl#56).

Possibly worth cross-referencing #20714.

@martinholters
Copy link
Member Author

Patching rdims to use Val{N}() and ::Val{N} instead of Val{N} and ::Type{Val{N}}, inference produces just Tuple for the case of unknown N. That is not optimal, but at least correct. So chasing further, I come to

julia> @time Base.return_types(Base.reshape, (UnitRange{Int}, Tuple));
 29.530436 seconds (92.13 M allocations: 4.105 GiB, 2.76% gc time)

(Putting the whole long Tuple{...} in there is equivalent, BTW.)
Now (in a new session)

julia> @time Base.return_types(Base.reshape, (UnitRange{Int}, Tuple{Vararg{Base.OneTo{Int64},N}} where N));
  0.395138 seconds (67.24 k allocations: 3.090 MiB)

so a tighter inference of rdims might help here, but would be more of a work-around for this particular case only, not tackling the root cause.

@martinholters
Copy link
Member Author

Looks like this band-aid does a good job:

--- a/base/reshapedarray.jl
+++ b/base/reshapedarray.jl
@@ -115,18 +115,19 @@ _throw_reshape_colon_dimmismatch(A, pre, post) =
 
 reshape{T,N}(parent::AbstractArray{T,N}, ndims::Type{Val{N}}) = parent
 function reshape{N}(parent::AbstractArray, ndims::Type{Val{N}})
-    reshape(parent, rdims((), indices(parent), Val{N}))
+    reshape(parent, rdims((), indices(parent), Val{N}()))
 end
 # Move elements from inds to out until out reaches the desired
 # dimensionality N, either filling with OneTo(1) or collapsing the
 # product of trailing dims into the last element
-@pure rdims{N}(out::NTuple{N,Any}, inds::Tuple{}, ::Type{Val{N}}) = out
-@pure function rdims{N}(out::NTuple{N,Any}, inds::Tuple{Any, Vararg{Any}}, ::Type{Val{N}})
+@pure rdims{N}(out::NTuple{N,Any}, inds::Tuple{}, ::Val{N}) = out
+@pure function rdims{N}(out::NTuple{N,Any}, inds::Tuple{Any, Vararg{Any}}, ::Val{N})
     l = length(last(out)) * prod(map(length, inds))
     (front(out)..., OneTo(l))
 end
-@pure rdims{N}(out::Tuple, inds::Tuple{}, ::Type{Val{N}}) = rdims((out..., OneTo(1)), (), Val{N})
-@pure rdims{N}(out::Tuple, inds::Tuple{Any, Vararg{Any}}, ::Type{Val{N}}) = rdims((out..., first(inds)), tail(inds), Val{N})
+@pure rdims{N}(out::Tuple, inds::Tuple{}, ::Val{N}) = rdims((out..., OneTo(1)), (), Val{N}())
+@pure rdims{N}(out::Tuple, inds::Tuple{Any, Vararg{Any}}, ::Val{N}) = rdims((out..., first(inds)), tail(inds), Val{N}())
+@pure rdims{N}(out::Tuple{}, inds::Tuple{Vararg{OneTo}}, ::Val{N}) = rdims((out..., first(inds)), tail(inds), Val{N}())::NTuple{N,OneTo}
 
 # _reshape on Array returns an Array
 _reshape(parent::Vector, dims::Dims{1}) = parent

With that, I get

julia> @time view(1:5,[2 3 4 1]);
  0.285394 seconds (272.48 k allocations: 13.268 MiB, 26.40% gc time)

Further, it reduces runtime of the subarray test by a fair amount:

Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
subarray (1)  |  208.88  |  6.19  |  3.0 | 11532.36   | 685.11  

And with the bail-out code removed from abstract_apply, which used to go all crazy:

Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
subarray (1)  |  222.55  |  7.35  |  3.3 | 14031.39   | 694.66  

This would solve the OP case, but before opening a PR, I'd like to further chase the root cause here. Inferring reshape(::UnitRange{Int}, ::Tuple) is still inexplicably slow.

@martinholters
Copy link
Member Author

Drilling deeper, I arrive at

julia> @time Base.return_types(Base._reshape_uncolon, (UnitRange{Int}, Tuple{}, Void, Tuple{}, Tuple{Vararg{Union{Colon, Int64},N} where N}))
 22.490008 seconds (93.57 M allocations: 4.173 GiB, 3.56% gc time)

And I think this is where the problem lies: _reshape_uncolon dispatches on the first element of its last parameter and recurses on its tail, leading to a combinatorical explosion of argument types to infer. In fact, this also shows for known but largish 'N':

julia> [@elapsed Base.return_types(Base._reshape_uncolon, (UnitRange{Int}, Tuple{}, Void, Tuple{}, Tuple{Vararg{Union{Colon, Int64},n}})) for n in 1:16]
16-element Array{Float64,1}:
 0.56456   
 0.0803624 
 0.109848  
 0.257287  
 0.337236  
 0.381101  
 0.512093  
 0.654735  
 0.939945  
 1.12032   
 1.42799   
 1.7748    
 2.39793   
 3.40035   
 9.49273   
 0.00232926

Rewriting to a simpler recursion seems to be beneficial. Will open a PR assuming tests pass locally...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference
Projects
None yet
Development

No branches or pull requests

3 participants