Inference of `view` may take very long #20848

martinholters · 2017-03-01T10:54:57Z

I couldn't find an issue specific to this, but ref #20832 (especially #20832 (comment)) and #20334 (comment).

Basic example:

julia> @time view(1:5,[2 3 4 1]);
 21.413390 seconds (90.22 M allocations: 4.049 GiB, 3.75% gc time)

julia> @time view(1:5,[2 3 4 1]);
  0.000080 seconds (17 allocations: 592 bytes)

In this case, it is important that this happens in global scope:

# in a new session
julia> foo() = view(1:5,[2 3 4 1])
foo (generic function with 1 method)

julia> @time foo();
  0.086676 seconds (60.93 k allocations: 2.811 MiB)

julia> @time foo();
  0.000005 seconds (8 allocations: 352 bytes)

Does the REPL force inference of view for other types than the actual parameter types or why is that?

Also note that this is responsible for an arbitrary restriction in abstract_apply being necessary. Running make test-subarray on master, I get

Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
subarray (1)  |  257.68  |  8.81  |  3.4 | 14845.79   | 748.68

Limiting that restriction just to view with

--- a/base/inference.jl
+++ b/base/inference.jl
@@ -1474,7 +1474,7 @@ function abstract_apply(af::ANY, fargs, aargtypes::Vector{Any}, vtypes::VarTable
     splitunions = countunionsplit(aargtypes) <= sv.params.MAX_APPLY_UNION_ENUM
     ctypes = Any[Any[]]
     for i = 1:nargs
-        if aargtypes[i] === Any
+        if aargtypes[i] === Any && istopfunction(_topmod(sv), af, :view)
             # bail out completely and infer as f(::Any...)
             # instead could keep what we got so far and just append a Vararg{Any} (by just
             # using the normal logic from below), but that makes the time of the subarray

I get

Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
subarray (1)  |  267.43  | 10.15  |  3.8 | 17928.75   | 742.89

A little longer, so a little more time spent in inference not compensated for by better run-time due to improved type-stability. Not too bad, not too surprising. But compare with removing that restriction altogether with

--- a/base/inference.jl
+++ b/base/inference.jl
@@ -1474,14 +1474,6 @@ function abstract_apply(af::ANY, fargs, aargtypes::Vector{Any}, vtypes::VarTable
     splitunions = countunionsplit(aargtypes) <= sv.params.MAX_APPLY_UNION_ENUM
     ctypes = Any[Any[]]
     for i = 1:nargs
-        if aargtypes[i] === Any
-            # bail out completely and infer as f(::Any...)
-            # instead could keep what we got so far and just append a Vararg{Any} (by just
-            # using the normal logic from below), but that makes the time of the subarray
-            # test explode
-            ctypes = Any[Any[Vararg{Any}]]
-            break
-        end
         ctypes´ = []
         for ti in (splitunions ? uniontypes(aargtypes[i]) : Any[aargtypes[i]])
             cti = precise_container_type(fargs[i], ti, vtypes, sv)

Now that gives... [drumroll]...

Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
subarray (1)  | 2274.95  | 66.29  |  2.9 | 43535.07   | 1424.08

So inferring view more often for more specific argument types than just Any... (but still ending with Any...) has a dramatic impact.

I conjecture it is worthwhile figuring out the root cause here, to either (preferably) improve inference (add some less arbitrary limitations maybe) or obtain some lessons on how to write inference-friendly code and apply those to the view machinery.

The text was updated successfully, but these errors were encountered:

martinholters · 2017-03-02T08:03:57Z

O, I think I have confirmed that inference is the bottleneck here and not some other part of codegen. But there is a twist I don't understand:

julia> @time Base.return_types(view, (UnitRange{Int64}, Array{Int64, 2}));
  0.284807 seconds (31.38 k allocations: 1.549 MiB)

julia> @time Base.return_types(view, (UnitRange{Int64}, Vararg{Array{Int64, 2}, N} where N));
 28.149748 seconds (92.16 M allocations: 4.107 GiB, 3.27% gc time)

julia> @time view(1:5, [2 3 4 1]);
  0.220726 seconds (239.41 k allocations: 11.738 MiB)

So inference of view(::UnitRange{Int64}, ::Array{Int64, 2}) is reasonably fast, but calling view(1:5, [2 3 4 1]) at the REPL apparently leads to inference of view(::UnitRange{Int64}, ::Vararg{Array{Int64, 2}}). Anyone venture an explanation here?

EDIT: Issue at #20867.

martinholters · 2017-03-02T09:16:26Z

Chasing the source of the slowness a bit further, namely which method is the problematic one, I went through _maybe_reshape_parent{N}(A::AbstractArray, ::NTuple{N, Bool}) and reshape{N}(parent::AbstractArray, ndims::Type{Val{N}}) to find the following oddity for rdims:

julia> Base.return_types(Base.rdims, (Tuple{}, Tuple{Base.OneTo{Int64}}, Type{Val{1}}))
1-element Array{Any,1}:
 Tuple{Base.OneTo{Int64}}

julia> Base.return_types(Base.rdims, (Tuple{}, Tuple{Base.OneTo{Int64}}, Type{Val{2}}))
1-element Array{Any,1}:
 Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}

julia> Base.return_types(Base.rdims, (Tuple{}, Tuple{Base.OneTo{Int64}}, Type{Val{N}} where N))
1-element Array{Any,1}:
 Tuple{Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Vararg{Base.OneTo{Int64},N}} where N

For the last one, I think the answer should be Tuple{Vararg{Base.OneTo{Int64},N}} where N. Not sure whether this causes the slowness, but certainly worth fixing.

EDIT: issue at #20869.

timholy · 2017-03-02T10:19:52Z

@martinholters, really glad you're chasing this down. I've seen a couple of cases where inference slowness is preventing us from merging things that would otherwise be improvements (e.g., JuliaArrays/AxisArrays.jl#56).

Possibly worth cross-referencing #20714.

martinholters · 2017-03-02T10:25:16Z

Patching rdims to use Val{N}() and ::Val{N} instead of Val{N} and ::Type{Val{N}}, inference produces just Tuple for the case of unknown N. That is not optimal, but at least correct. So chasing further, I come to

julia> @time Base.return_types(Base.reshape, (UnitRange{Int}, Tuple));
 29.530436 seconds (92.13 M allocations: 4.105 GiB, 2.76% gc time)

(Putting the whole long Tuple{...} in there is equivalent, BTW.)
Now (in a new session)

julia> @time Base.return_types(Base.reshape, (UnitRange{Int}, Tuple{Vararg{Base.OneTo{Int64},N}} where N));
  0.395138 seconds (67.24 k allocations: 3.090 MiB)

so a tighter inference of rdims might help here, but would be more of a work-around for this particular case only, not tackling the root cause.

martinholters · 2017-03-03T09:05:35Z

Looks like this band-aid does a good job:

--- a/base/reshapedarray.jl
+++ b/base/reshapedarray.jl
@@ -115,18 +115,19 @@ _throw_reshape_colon_dimmismatch(A, pre, post) =
 
 reshape{T,N}(parent::AbstractArray{T,N}, ndims::Type{Val{N}}) = parent
 function reshape{N}(parent::AbstractArray, ndims::Type{Val{N}})
-    reshape(parent, rdims((), indices(parent), Val{N}))
+    reshape(parent, rdims((), indices(parent), Val{N}()))
 end
 # Move elements from inds to out until out reaches the desired
 # dimensionality N, either filling with OneTo(1) or collapsing the
 # product of trailing dims into the last element
-@pure rdims{N}(out::NTuple{N,Any}, inds::Tuple{}, ::Type{Val{N}}) = out
-@pure function rdims{N}(out::NTuple{N,Any}, inds::Tuple{Any, Vararg{Any}}, ::Type{Val{N}})
+@pure rdims{N}(out::NTuple{N,Any}, inds::Tuple{}, ::Val{N}) = out
+@pure function rdims{N}(out::NTuple{N,Any}, inds::Tuple{Any, Vararg{Any}}, ::Val{N})
     l = length(last(out)) * prod(map(length, inds))
     (front(out)..., OneTo(l))
 end
-@pure rdims{N}(out::Tuple, inds::Tuple{}, ::Type{Val{N}}) = rdims((out..., OneTo(1)), (), Val{N})
-@pure rdims{N}(out::Tuple, inds::Tuple{Any, Vararg{Any}}, ::Type{Val{N}}) = rdims((out..., first(inds)), tail(inds), Val{N})
+@pure rdims{N}(out::Tuple, inds::Tuple{}, ::Val{N}) = rdims((out..., OneTo(1)), (), Val{N}())
+@pure rdims{N}(out::Tuple, inds::Tuple{Any, Vararg{Any}}, ::Val{N}) = rdims((out..., first(inds)), tail(inds), Val{N}())
+@pure rdims{N}(out::Tuple{}, inds::Tuple{Vararg{OneTo}}, ::Val{N}) = rdims((out..., first(inds)), tail(inds), Val{N}())::NTuple{N,OneTo}
 
 # _reshape on Array returns an Array
 _reshape(parent::Vector, dims::Dims{1}) = parent

With that, I get

julia> @time view(1:5,[2 3 4 1]);
  0.285394 seconds (272.48 k allocations: 13.268 MiB, 26.40% gc time)

Further, it reduces runtime of the subarray test by a fair amount:

Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
subarray (1)  |  208.88  |  6.19  |  3.0 | 11532.36   | 685.11

And with the bail-out code removed from abstract_apply, which used to go all crazy:

Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
subarray (1)  |  222.55  |  7.35  |  3.3 | 14031.39   | 694.66

This would solve the OP case, but before opening a PR, I'd like to further chase the root cause here. Inferring reshape(::UnitRange{Int}, ::Tuple) is still inexplicably slow.

martinholters · 2017-03-03T13:19:34Z

Drilling deeper, I arrive at

julia> @time Base.return_types(Base._reshape_uncolon, (UnitRange{Int}, Tuple{}, Void, Tuple{}, Tuple{Vararg{Union{Colon, Int64},N} where N}))
 22.490008 seconds (93.57 M allocations: 4.173 GiB, 3.56% gc time)

And I think this is where the problem lies: _reshape_uncolon dispatches on the first element of its last parameter and recurses on its tail, leading to a combinatorical explosion of argument types to infer. In fact, this also shows for known but largish 'N':

julia> [@elapsed Base.return_types(Base._reshape_uncolon, (UnitRange{Int}, Tuple{}, Void, Tuple{}, Tuple{Vararg{Union{Colon, Int64},n}})) for n in 1:16]
16-element Array{Float64,1}:
 0.56456   
 0.0803624 
 0.109848  
 0.257287  
 0.337236  
 0.381101  
 0.512093  
 0.654735  
 0.939945  
 1.12032   
 1.42799   
 1.7748    
 2.39793   
 3.40035   
 9.49273   
 0.00232926

Rewriting to a simpler recursion seems to be beneficial. Will open a PR assuming tests pass locally...

Fixes #20848

ararslan added the compiler:inference Type inference label Mar 1, 2017

martinholters changed the title ~~Inference/codegen of view may take very long~~ Inference of view may take very long Mar 2, 2017

martinholters mentioned this issue Mar 3, 2017

eval causes spurious specialization #20867

Closed

martinholters added a commit that referenced this issue Mar 3, 2017

Make _reshape_uncolon easier on inference

37cff21

Fixes #20848

martinholters mentioned this issue Mar 3, 2017

Make _reshape_uncolon easier on inference #20874

Merged

martinholters closed this as completed in 470f606 Mar 7, 2017

martinholters mentioned this issue Apr 7, 2017

Inference failure with 0.6 + StaticArrays hcat #21244

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference of `view` may take very long #20848

Inference of `view` may take very long #20848

martinholters commented Mar 1, 2017

martinholters commented Mar 2, 2017 •

edited

Loading

martinholters commented Mar 2, 2017 •

edited

Loading

timholy commented Mar 2, 2017

martinholters commented Mar 2, 2017

martinholters commented Mar 3, 2017

martinholters commented Mar 3, 2017

Inference of view may take very long #20848

Inference of view may take very long #20848

Comments

martinholters commented Mar 1, 2017

martinholters commented Mar 2, 2017 • edited Loading

martinholters commented Mar 2, 2017 • edited Loading

timholy commented Mar 2, 2017

martinholters commented Mar 2, 2017

martinholters commented Mar 3, 2017

martinholters commented Mar 3, 2017

Inference of `view` may take very long #20848

Inference of `view` may take very long #20848

martinholters commented Mar 2, 2017 •

edited

Loading

martinholters commented Mar 2, 2017 •

edited

Loading