-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference of view
may take very long
#20848
Comments
O, I think I have confirmed that inference is the bottleneck here and not some other part of codegen. But there is a twist I don't understand: julia> @time Base.return_types(view, (UnitRange{Int64}, Array{Int64, 2}));
0.284807 seconds (31.38 k allocations: 1.549 MiB)
julia> @time Base.return_types(view, (UnitRange{Int64}, Vararg{Array{Int64, 2}, N} where N));
28.149748 seconds (92.16 M allocations: 4.107 GiB, 3.27% gc time)
julia> @time view(1:5, [2 3 4 1]);
0.220726 seconds (239.41 k allocations: 11.738 MiB) So inference of EDIT: Issue at #20867. |
view
may take very longview
may take very long
Chasing the source of the slowness a bit further, namely which method is the problematic one, I went through julia> Base.return_types(Base.rdims, (Tuple{}, Tuple{Base.OneTo{Int64}}, Type{Val{1}}))
1-element Array{Any,1}:
Tuple{Base.OneTo{Int64}}
julia> Base.return_types(Base.rdims, (Tuple{}, Tuple{Base.OneTo{Int64}}, Type{Val{2}}))
1-element Array{Any,1}:
Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}
julia> Base.return_types(Base.rdims, (Tuple{}, Tuple{Base.OneTo{Int64}}, Type{Val{N}} where N))
1-element Array{Any,1}:
Tuple{Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Base.OneTo{Int64},Vararg{Base.OneTo{Int64},N}} where N For the last one, I think the answer should be EDIT: issue at #20869. |
@martinholters, really glad you're chasing this down. I've seen a couple of cases where inference slowness is preventing us from merging things that would otherwise be improvements (e.g., JuliaArrays/AxisArrays.jl#56). Possibly worth cross-referencing #20714. |
Patching julia> @time Base.return_types(Base.reshape, (UnitRange{Int}, Tuple));
29.530436 seconds (92.13 M allocations: 4.105 GiB, 2.76% gc time) (Putting the whole long julia> @time Base.return_types(Base.reshape, (UnitRange{Int}, Tuple{Vararg{Base.OneTo{Int64},N}} where N));
0.395138 seconds (67.24 k allocations: 3.090 MiB) so a tighter inference of |
Looks like this band-aid does a good job: --- a/base/reshapedarray.jl
+++ b/base/reshapedarray.jl
@@ -115,18 +115,19 @@ _throw_reshape_colon_dimmismatch(A, pre, post) =
reshape{T,N}(parent::AbstractArray{T,N}, ndims::Type{Val{N}}) = parent
function reshape{N}(parent::AbstractArray, ndims::Type{Val{N}})
- reshape(parent, rdims((), indices(parent), Val{N}))
+ reshape(parent, rdims((), indices(parent), Val{N}()))
end
# Move elements from inds to out until out reaches the desired
# dimensionality N, either filling with OneTo(1) or collapsing the
# product of trailing dims into the last element
-@pure rdims{N}(out::NTuple{N,Any}, inds::Tuple{}, ::Type{Val{N}}) = out
-@pure function rdims{N}(out::NTuple{N,Any}, inds::Tuple{Any, Vararg{Any}}, ::Type{Val{N}})
+@pure rdims{N}(out::NTuple{N,Any}, inds::Tuple{}, ::Val{N}) = out
+@pure function rdims{N}(out::NTuple{N,Any}, inds::Tuple{Any, Vararg{Any}}, ::Val{N})
l = length(last(out)) * prod(map(length, inds))
(front(out)..., OneTo(l))
end
-@pure rdims{N}(out::Tuple, inds::Tuple{}, ::Type{Val{N}}) = rdims((out..., OneTo(1)), (), Val{N})
-@pure rdims{N}(out::Tuple, inds::Tuple{Any, Vararg{Any}}, ::Type{Val{N}}) = rdims((out..., first(inds)), tail(inds), Val{N})
+@pure rdims{N}(out::Tuple, inds::Tuple{}, ::Val{N}) = rdims((out..., OneTo(1)), (), Val{N}())
+@pure rdims{N}(out::Tuple, inds::Tuple{Any, Vararg{Any}}, ::Val{N}) = rdims((out..., first(inds)), tail(inds), Val{N}())
+@pure rdims{N}(out::Tuple{}, inds::Tuple{Vararg{OneTo}}, ::Val{N}) = rdims((out..., first(inds)), tail(inds), Val{N}())::NTuple{N,OneTo}
# _reshape on Array returns an Array
_reshape(parent::Vector, dims::Dims{1}) = parent With that, I get julia> @time view(1:5,[2 3 4 1]);
0.285394 seconds (272.48 k allocations: 13.268 MiB, 26.40% gc time) Further, it reduces runtime of the subarray test by a fair amount:
And with the bail-out code removed from
This would solve the OP case, but before opening a PR, I'd like to further chase the root cause here. Inferring |
Drilling deeper, I arrive at julia> @time Base.return_types(Base._reshape_uncolon, (UnitRange{Int}, Tuple{}, Void, Tuple{}, Tuple{Vararg{Union{Colon, Int64},N} where N}))
22.490008 seconds (93.57 M allocations: 4.173 GiB, 3.56% gc time) And I think this is where the problem lies: julia> [@elapsed Base.return_types(Base._reshape_uncolon, (UnitRange{Int}, Tuple{}, Void, Tuple{}, Tuple{Vararg{Union{Colon, Int64},n}})) for n in 1:16]
16-element Array{Float64,1}:
0.56456
0.0803624
0.109848
0.257287
0.337236
0.381101
0.512093
0.654735
0.939945
1.12032
1.42799
1.7748
2.39793
3.40035
9.49273
0.00232926 Rewriting to a simpler recursion seems to be beneficial. Will open a PR assuming tests pass locally... |
I couldn't find an issue specific to this, but ref #20832 (especially #20832 (comment)) and #20334 (comment).
Basic example:
In this case, it is important that this happens in global scope:
Does the REPL force inference of
view
for other types than the actual parameter types or why is that?Also note that this is responsible for an arbitrary restriction in
abstract_apply
being necessary. Runningmake test-subarray
on master, I getLimiting that restriction just to
view
withI get
A little longer, so a little more time spent in inference not compensated for by better run-time due to improved type-stability. Not too bad, not too surprising. But compare with removing that restriction altogether with
Now that gives... [drumroll]...
So inferring
view
more often for more specific argument types than justAny...
(but still ending withAny...
) has a dramatic impact.I conjecture it is worthwhile figuring out the root cause here, to either (preferably) improve inference (add some less arbitrary limitations maybe) or obtain some lessons on how to write inference-friendly code and apply those to the
view
machinery.The text was updated successfully, but these errors were encountered: