-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster reverse for tuples #16604
Faster reverse for tuples #16604
Conversation
The proposed change in `reverse(t::Tuple)` avoids recursion and in particular calling `revargs` for varying number of arguments which especially slows down the first execution of calls like `@time reverse(tuple(1:1000...));`. The change makes definition of `revargs` unnecessary.
👍. The original code was a cute Scheme analogy, but I don't think it's right for Julia. My benchmarking indicates that reverse(t::Tuple) = ([t[i] for i in length(t):-1:1]...) is slightly faster. It's a bit messier though. |
#15695 note that this is not type stable and will be very bad for small tuples |
Yes, unfortunately this definition will be a disaster for small tuples. |
For (heterogenous and homogenous) tuples of length 5 or greater, the current |
This seems to work: reverse(t::Tuple{}) = t
reverse{T}(t::Tuple{T}) = t
reverse{T,U}(t::Tuple{T,U}) = t[2], t[1]
reverse{T,U,V}(t::Tuple{T,U,V}) = t[3], t[2], t[1]
reverse{T,U,V,W}(t::Tuple{T,U,V,W}) = t[4], t[3], t[2], t[1]
reverse(t::Tuple) = ([t[i] for i in length(t):-1:1]...) Is this considered too messy? |
Yes, we will probably need to go with something like that. A slightly better trick can be seen in https://github.com/JuliaLang/julia/pull/16460/files#diff-fd99c9a15dadc23d9200069dadafb8bbR90 |
That is indeed a nice trick. So only one additional line is needed: reverse(t::Tuple{Any,Any,Any,Any,Any,Vararg{Any}}) = ([t[i] for i in length(t):-1:1]...) This seems to have the best performance. I wonder why the size of tuple to reach performance parity is so much smaller in this case than with |
It seems that the definition: reverse{N}(t::NTuple{N,Any}) = tuple(reverse(collect(t))...)::NTuple{N,Any} is at least type-stable for homogeneous tuples. It infers the type from the |
Thank you all for dedication to squeeze maximum performance out of every part of Julia 👍. Having learned again how little I know about Julia I have filed an issue #16631, and not PR :), for two other cases that have performance issues related to large tuples. |
the proposals in this thread seem to be an improvement over the original, should someone update this PR? |
👍. I did not update PR myself as I feel that someone with better understanding of Julia should choose the best approach to fix |
The idea here looks good to me, since we just have to add one line: #16604 (comment) |
@JeffBezanson There is still the type stability problem to deal with though, which I didn't measure in my benchmarks (since it only affects downstream consumers) and is harder to measure in general. Perhaps we should keep the same cutoff as |
There is also @Jutho's suggestion which preserves type stability for homogenous tuples reverse{N}(t::NTuple{N,Any}) = tuple(reverse(collect(t))...)::NTuple{N,Any} which may perhaps be useful to work in. |
I think it will work to add |
How about @generated reverse(x::Tuple) = Expr(:tuple, (:(x[$i]) for i in nfields(x):-1:1)...) ? Fast and type-stable. |
This issue is mostly about compile time, and it's highly doubtful that has better compile time. |
Oh. Really? With the OP's example: julia> @benchmark reverse($(tuple(1:1000...))) # master
BenchmarkTools.Trial:
memory estimate: 18.17 mb
allocs estimate: 373638
--------------
minimum time: 18.886 ms (0.00% GC)
median time: 20.347 ms (0.00% GC)
mean time: 23.854 ms (14.99% GC)
maximum time: 105.331 ms (81.43% GC)
--------------
samples: 210
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
julia> @benchmark myreverse($(tuple(1:1000...))) # as proposed above
BenchmarkTools.Trial:
memory estimate: 0.00 bytes
allocs estimate: 0
--------------
minimum time: 709.319 ns (0.00% GC)
median time: 714.773 ns (0.00% GC)
mean time: 727.437 ns (0.00% GC)
maximum time: 1.767 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 141
time tolerance: 5.00%
memory tolerance: 1.00% Of course, this does not include compile time, but the execution time is improved dramatically. |
Yikes, that is a big difference. |
This reminds me of: #15702 (comment). Here is another benchmark calling the reverse function for the first time (so runtime + compilation time): v = Float64[]
for i in 1:100
t = (rand(i)...)
push!(v, @elapsed reverse(t))
end |
I wanted to go back to this PR and resolve it.
As an additional issue I would like to ask what is the recommended place to document |
base/tuple.jl
Outdated
|
||
reverse(t::Tuple) = revargs(t...) | ||
reverse(t::Tuple{}) = t | ||
reverse(t::Tuple{T}) where {T} = t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or more simply t::NTuple{1}
and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly Tuple{T}
is equivalent to NTuple{1}
, but for larger number of arguments it would not be equivalent. I add a specialized function for NTuple{N}
as it ensures type stability for large N
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, I think it needs to be NTuple{N, Any}
to allow for heterogeneous tuples. I think that should be equivalent.
As @KristofferC pointed out, the cutoff here should probably be something bigger than 18 and not 4. |
@TotalVerb I agree that the break-point is at ~20 for single execution of Of course it is simple to extend the definitions to generation of a larger number predefined reverses - would you judge it worthwhile? (and the question is if having very many such definitions would not have a negative impact on Julia method dispatch code speed in general - here I do not know) |
@bkamins As Jeff mentioned, we may be able to keep the existing definition and simply add a new one, like reverse(t::Tuple{Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Vararg{Any}}) = ([t[i] for i in length(t):-1:1]...)::NTuple{nfields(t), Any} to cover long cases. However this doesn't preserve the type stability of homogenous tuples. |
@TotalVerb I have now unrolled the definitions for tuples not greater than 20. For greater ones the code for homogeneous tuples is type stable. Is this is what you had in mind? |
Looks much better. @JeffBezanson and others more familiar with this should probably have a look. |
Why don't follow the example of revargs() = ()
revargs(x, r...) = (revargs(r...)..., x)
reverse(t::Tuple) = revargs(t...)
function reverse(t::Tuple{Any,Any,Any,Any,Any,Any,Any,Any,
Any,Any,Any,Any,Any,Any,Any,Any,Vararg{Any}})
([t[i] for i in length(t):-1:1]...)::NTuple{nfields(t)}
end In fact, I would argue that there is one too many This seems much better than code generation for generating all cases up to |
Small comment - you need The reason is performance. This is a test I use (
which produces:
The proposed method is uniformly faster for mixed tuples and for I agree that additional constant |
I am not sure if anyone is willing to go back to this very old PR but I would add adding
(I can make a separate PR for |
Honestly, at this point, I think the best approach might be to close this PR and remake it. Performance characteristics change over time, so it's probably good to separate the new correct data from the original. |
I would just close it then. Probably the issue should be resolved when someone hits this problem in practice (given that this PR is open for so long and no one complains probably this is not a serous practical issue). Regarding reverse for |
The proposed change in
reverse(t::Tuple)
avoids recursion and in particular callingrevargs
for varying number of arguments which especially slows down the first execution of calls like@time reverse(tuple(1:1000...));
. The change makes definition ofrevargs
unnecessary.