Faster reverse for tuples #16604

bkamins · 2016-05-26T23:32:46Z

The proposed change in reverse(t::Tuple) avoids recursion and in particular calling revargs for varying number of arguments which especially slows down the first execution of calls like @time reverse(tuple(1:1000...));. The change makes definition of revargs unnecessary.

The proposed change in `reverse(t::Tuple)` avoids recursion and in particular calling `revargs` for varying number of arguments which especially slows down the first execution of calls like `@time reverse(tuple(1:1000...));`. The change makes definition of `revargs` unnecessary.

TotalVerb · 2016-05-26T23:47:27Z

👍. The original code was a cute Scheme analogy, but I don't think it's right for Julia.

My benchmarking indicates that

reverse(t::Tuple) = ([t[i] for i in length(t):-1:1]...)

is slightly faster. It's a bit messier though.

yuyichao · 2016-05-26T23:50:25Z

#15695 note that this is not type stable and will be very bad for small tuples

JeffBezanson · 2016-05-26T23:53:56Z

Yes, unfortunately this definition will be a disaster for small tuples.

TotalVerb · 2016-05-26T23:54:20Z

For (heterogenous and homogenous) tuples of length 5 or greater, the current reverse is worse than the modified proposed one. You're right that it's better for small tuples, by quite a bit for size 1 and 2 tuples. Presumably those tuples are the most important ones.

TotalVerb · 2016-05-27T04:56:02Z

This seems to work:

reverse(t::Tuple{}) = t
reverse{T}(t::Tuple{T}) = t
reverse{T,U}(t::Tuple{T,U}) = t[2], t[1]
reverse{T,U,V}(t::Tuple{T,U,V}) = t[3], t[2], t[1]
reverse{T,U,V,W}(t::Tuple{T,U,V,W}) = t[4], t[3], t[2], t[1]
reverse(t::Tuple) = ([t[i] for i in length(t):-1:1]...)

Is this considered too messy?

JeffBezanson · 2016-05-27T05:02:04Z

Yes, we will probably need to go with something like that. A slightly better trick can be seen in https://github.com/JuliaLang/julia/pull/16460/files#diff-fd99c9a15dadc23d9200069dadafb8bbR90
It's messy but I don't know a better way to do it so far.

TotalVerb · 2016-05-27T05:17:12Z

That is indeed a nice trick. So only one additional line is needed:

reverse(t::Tuple{Any,Any,Any,Any,Any,Vararg{Any}}) = ([t[i] for i in length(t):-1:1]...)

This seems to have the best performance. I wonder why the size of tuple to reach performance parity is so much smaller in this case than with map.

Jutho · 2016-05-27T12:20:34Z

It seems that the definition:

reverse{N}(t::NTuple{N,Any}) = tuple(reverse(collect(t))...)::NTuple{N,Any}

is at least type-stable for homogeneous tuples. It infers the type from the reverse(collect(t)) construction and combines it with the length in the type assertion.

bkamins · 2016-05-28T18:38:30Z

Thank you all for dedication to squeeze maximum performance out of every part of Julia 👍.

Having learned again how little I know about Julia I have filed an issue #16631, and not PR :), for two other cases that have performance issues related to large tuples.

musm · 2016-12-04T19:35:22Z

the proposals in this thread seem to be an improvement over the original, should someone update this PR?

bkamins · 2016-12-04T22:05:32Z

👍. I did not update PR myself as I feel that someone with better understanding of Julia should choose the best approach to fix reverse.

JeffBezanson · 2016-12-06T21:45:18Z

The idea here looks good to me, since we just have to add one line: #16604 (comment)

TotalVerb · 2016-12-06T21:59:24Z

@JeffBezanson There is still the type stability problem to deal with though, which I didn't measure in my benchmarks (since it only affects downstream consumers) and is harder to measure in general. Perhaps we should keep the same cutoff as map?

TotalVerb · 2016-12-06T22:00:15Z

There is also @Jutho's suggestion which preserves type stability for homogenous tuples

reverse{N}(t::NTuple{N,Any}) = tuple(reverse(collect(t))...)::NTuple{N,Any}

which may perhaps be useful to work in.

JeffBezanson · 2016-12-06T22:20:07Z

I think it will work to add ::NTuple{nfields(t),Any} to the comprehension version.

martinholters · 2016-12-07T08:02:20Z

How about

@generated reverse(x::Tuple) = Expr(:tuple, (:(x[$i]) for i in nfields(x):-1:1)...)

? Fast and type-stable.

TotalVerb · 2016-12-07T08:04:35Z

This issue is mostly about compile time, and it's highly doubtful that has better compile time.

martinholters · 2016-12-07T08:10:42Z

This issue is mostly about compile time, and it's highly doubtful that has better compile time.

Oh. Really? With the OP's example:

julia> @benchmark reverse($(tuple(1:1000...))) # master
BenchmarkTools.Trial: 
  memory estimate:  18.17 mb
  allocs estimate:  373638
  --------------
  minimum time:     18.886 ms (0.00% GC)
  median time:      20.347 ms (0.00% GC)
  mean time:        23.854 ms (14.99% GC)
  maximum time:     105.331 ms (81.43% GC)
  --------------
  samples:          210
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia> @benchmark myreverse($(tuple(1:1000...))) # as proposed above
BenchmarkTools.Trial: 
  memory estimate:  0.00 bytes
  allocs estimate:  0
  --------------
  minimum time:     709.319 ns (0.00% GC)
  median time:      714.773 ns (0.00% GC)
  mean time:        727.437 ns (0.00% GC)
  maximum time:     1.767 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     141
  time tolerance:   5.00%
  memory tolerance: 1.00%

Of course, this does not include compile time, but the execution time is improved dramatically.

TotalVerb · 2016-12-07T08:43:13Z

Yikes, that is a big difference.

KristofferC · 2016-12-07T08:50:42Z

This reminds me of: #15702 (comment).

Here is another benchmark calling the reverse function for the first time (so runtime + compilation time):

v = Float64[]
for i in 1:100
   t = (rand(i)...)    
   push!(v, @elapsed reverse(t))
end

bkamins · 2017-09-13T09:02:33Z

I wanted to go back to this PR and resolve it.
Given the above discussion my recommendation would be:

I opt against @generated version as it generates StackOverflow for very large tuples (and is unacceptably slow to compile for for large tuples below StackOverflow level)
Have specialized methods for small tuples (I chose up to 4 elements - as in recommendation above) which is type stable
Have a separate method for NTuple as it is type stable and the most probable use case for large tuples will be NTuple
Accept that for large tuples reverse might not be type stable

As an additional issue I would like to ask what is the recommended place to document reverse in general, as now ?reverse prints help only for AbstractString.

mbauman · 2017-09-15T22:01:09Z

base/tuple.jl

-
-reverse(t::Tuple) = revargs(t...)
+reverse(t::Tuple{}) = t
+reverse(t::Tuple{T}) where {T} = t


Or more simply t::NTuple{1} and so on.

If I understand it correctly Tuple{T} is equivalent to NTuple{1}, but for larger number of arguments it would not be equivalent. I add a specialized function for NTuple{N} as it ensures type stability for large N.

Ah, right, I think it needs to be NTuple{N, Any} to allow for heterogeneous tuples. I think that should be equivalent.

TotalVerb · 2017-09-15T22:06:45Z

As @KristofferC pointed out, the cutoff here should probably be something bigger than 18 and not 4.

bkamins · 2017-09-15T22:31:39Z

@TotalVerb I agree that the break-point is at ~20 for single execution of reverse, and even for 100 elements in a tuple the generation of specialized function is not prohibitively slow (which would give a benefit when reverse is called many times).
I thought that as we have to settle for a suboptimal solution then it is best to keep it relatively simple.

Of course it is simple to extend the definitions to generation of a larger number predefined reverses - would you judge it worthwhile? (and the question is if having very many such definitions would not have a negative impact on Julia method dispatch code speed in general - here I do not know)

TotalVerb · 2017-09-15T22:33:44Z

@bkamins As Jeff mentioned, we may be able to keep the existing definition and simply add a new one, like

reverse(t::Tuple{Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Vararg{Any}}) = ([t[i] for i in length(t):-1:1]...)::NTuple{nfields(t), Any}

to cover long cases. However this doesn't preserve the type stability of homogenous tuples.

bkamins · 2017-09-16T08:53:21Z

@TotalVerb I have now unrolled the definitions for tuples not greater than 20. For greater ones the code for homogeneous tuples is type stable. Is this is what you had in mind?

TotalVerb · 2017-09-16T15:49:46Z

Looks much better. @JeffBezanson and others more familiar with this should probably have a look.

Jutho · 2017-09-24T20:56:46Z

Why don't follow the example of map, i.e. just keep the old code and add your definition using the many Anys trick:

revargs() = ()
revargs(x, r...) = (revargs(r...)..., x)
reverse(t::Tuple) = revargs(t...)
function reverse(t::Tuple{Any,Any,Any,Any,Any,Any,Any,Any,
                         Any,Any,Any,Any,Any,Any,Any,Any,Vararg{Any}})
    ([t[i] for i in length(t):-1:1]...)::NTuple{nfields(t)}
end

In fact, I would argue that there is one too many Any in both the code above and in the code of map, since type inference stops working for the lisp-like definition beyond 14 arguments. But other than that, for the cases where the last definition kicks in, return type declaration gives the correct number of fields, whereas type inference is still able to infer the type of the tuple is homogeneous. So the return type will be concrete for all NTuple{N,T}s.

This seems much better than code generation for generating all cases up to N=20 explicitly, in addition to the act of introducing an arbitrary new constant (i.e. 20).

bkamins · 2017-09-24T22:07:02Z

Small comment - you need NTuple{nfields(t), Any} (as you proposed above) for it to work.

The reason is performance. This is a test I use (new_reverse is my proposed implementation):

function test(i)
    println("Testing tuple size ", i+1)
    for (e, s) in [("a", "mixed"), (0, "homogeneous")]
        t = (e, 1:i...)
        print("\tOld $s:")
        @btime reverse($t)
        print("\tNew $s:")
        @btime new_reverse($t)
    end
end

foreach(test, 1:3:25)

which produces:

Testing tuple size 2
        Old mixed:  10.264 ns (1 allocation: 32 bytes)
        New mixed:  10.263 ns (1 allocation: 32 bytes)
        Old homogeneous:  1.866 ns (0 allocations: 0 bytes)
        New homogeneous:  1.866 ns (0 allocations: 0 bytes)
Testing tuple size 5
        Old mixed:  1.819 μs (9 allocations: 240 bytes)
        New mixed:  12.596 ns (1 allocation: 48 bytes)
        Old homogeneous:  1.773 μs (9 allocations: 240 bytes)
        New homogeneous:  2.332 ns (0 allocations: 0 bytes)
Testing tuple size 8
        Old mixed:  2.955 μs (14 allocations: 480 bytes)
        New mixed:  18.681 ns (1 allocation: 80 bytes)
        Old homogeneous:  2.903 μs (13 allocations: 400 bytes)
        New homogeneous:  4.198 ns (0 allocations: 0 bytes)
Testing tuple size 11
        Old mixed:  5.443 μs (21 allocations: 816 bytes)
        New mixed:  21.525 ns (1 allocation: 96 bytes)
        Old homogeneous:  5.521 μs (21 allocations: 816 bytes)
        New homogeneous:  6.531 ns (0 allocations: 0 bytes)
Testing tuple size 14
        Old mixed:  6.881 μs (26 allocations: 1.17 KiB)
        New mixed:  22.860 ns (1 allocation: 128 bytes)
        Old homogeneous:  6.881 μs (25 allocations: 1.05 KiB)
        New homogeneous:  8.864 ns (0 allocations: 0 bytes)
Testing tuple size 17
        Old mixed:  3.791 μs (13 allocations: 1.08 KiB)
        New mixed:  22.393 ns (1 allocation: 144 bytes)
        Old homogeneous:  1.120 μs (2 allocations: 368 bytes)
        New homogeneous:  11.196 ns (0 allocations: 0 bytes)
Testing tuple size 20
        Old mixed:  4.332 μs (13 allocations: 1.14 KiB)
        New mixed:  27.992 ns (1 allocation: 176 bytes)
        Old homogeneous:  1.260 μs (2 allocations: 416 bytes)
        New homogeneous:  13.529 ns (0 allocations: 0 bytes)
Testing tuple size 23
        Old mixed:  4.199 μs (13 allocations: 1.34 KiB)
        New mixed:  3.674 μs (13 allocations: 1.34 KiB)
        Old homogeneous:  1.399 μs (2 allocations: 528 bytes)
        New homogeneous:  884.458 ns (2 allocations: 528 bytes)
Testing tuple size 26
        Old mixed:  4.332 μs (13 allocations: 1.38 KiB)
        New mixed:  3.791 μs (13 allocations: 1.38 KiB)
        Old homogeneous:  1.540 μs (2 allocations: 560 bytes)
        New homogeneous:  979.700 ns (2 allocations: 560 bytes)

The proposed method is uniformly faster for mixed tuples and for NTuples of small, moderate and large sizes. For tuples of moderate size (~10) the difference is very significant.

I agree that additional constant N=20 is arbitrary. I have chosen it so that compile time is still low, but it can be changed if such a high N is an issue. I am not qualified to fully understand if having too many methods for such a function is problematic (as I believe that reverse for tuples is not a top priority functionality).

bkamins · 2021-11-11T15:11:26Z

I am not sure if anyone is willing to go back to this very old PR but I would add adding reverse to NamedTuple to the list, as currently:

julia> reverse((a=1,b=2))
ERROR: MethodError: no method matching reverse(::NamedTuple{(:a, :b), Tuple{Int64, Int64}})

(I can make a separate PR for reverse for NamedTuple if you feel the tuple issue that was originally raised can be left implemented as it is now and then I would close this PR)

oscardssmith · 2021-11-11T17:22:53Z

Honestly, at this point, I think the best approach might be to close this PR and remake it. Performance characteristics change over time, so it's probably good to separate the new correct data from the original.

bkamins · 2021-11-11T19:18:09Z

I would just close it then. Probably the issue should be resolved when someone hits this problem in practice (given that this PR is open for so long and no one complains probably this is not a serous practical issue).

Regarding reverse for NamedTuples I think it should be added but first an issue I have just opened #43045 probably should be resolved.

yuyichao added performance Must go faster potential benchmark Could make a good benchmark in BaseBenchmarks labels May 26, 2016

improved reverse for tuples

6854bf4

mbauman reviewed Sep 15, 2017

View reviewed changes

make reverse type stable for larger tuples

9518e9f

martinholters mentioned this pull request Jun 22, 2020

Generate reverse for StaticVector types JuliaArrays/StaticArrays.jl#802

Merged

bkamins mentioned this pull request Nov 11, 2021

Add iterate for Reverse{<:NamedTuple} #43038

Merged

bkamins closed this Nov 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster reverse for tuples #16604

Faster reverse for tuples #16604

bkamins commented May 26, 2016

TotalVerb commented May 26, 2016 •

edited

Loading

yuyichao commented May 26, 2016

JeffBezanson commented May 26, 2016

TotalVerb commented May 26, 2016 •

edited

Loading

TotalVerb commented May 27, 2016 •

edited

Loading

JeffBezanson commented May 27, 2016

TotalVerb commented May 27, 2016

Jutho commented May 27, 2016

bkamins commented May 28, 2016

musm commented Dec 4, 2016

bkamins commented Dec 4, 2016

JeffBezanson commented Dec 6, 2016

TotalVerb commented Dec 6, 2016

TotalVerb commented Dec 6, 2016

JeffBezanson commented Dec 6, 2016

martinholters commented Dec 7, 2016

TotalVerb commented Dec 7, 2016

martinholters commented Dec 7, 2016

TotalVerb commented Dec 7, 2016

KristofferC commented Dec 7, 2016 •

edited

Loading

bkamins commented Sep 13, 2017 •

edited

Loading

mbauman Sep 15, 2017 •

edited

Loading

bkamins Sep 15, 2017

mbauman Sep 15, 2017

TotalVerb commented Sep 15, 2017

bkamins commented Sep 15, 2017

TotalVerb commented Sep 15, 2017

bkamins commented Sep 16, 2017

TotalVerb commented Sep 16, 2017

Jutho commented Sep 24, 2017

bkamins commented Sep 24, 2017

bkamins commented Nov 11, 2021

oscardssmith commented Nov 11, 2021

bkamins commented Nov 11, 2021

Faster reverse for tuples #16604

Faster reverse for tuples #16604

Conversation

bkamins commented May 26, 2016

TotalVerb commented May 26, 2016 • edited Loading

yuyichao commented May 26, 2016

JeffBezanson commented May 26, 2016

TotalVerb commented May 26, 2016 • edited Loading

TotalVerb commented May 27, 2016 • edited Loading

JeffBezanson commented May 27, 2016

TotalVerb commented May 27, 2016

Jutho commented May 27, 2016

bkamins commented May 28, 2016

musm commented Dec 4, 2016

bkamins commented Dec 4, 2016

JeffBezanson commented Dec 6, 2016

TotalVerb commented Dec 6, 2016

TotalVerb commented Dec 6, 2016

JeffBezanson commented Dec 6, 2016

martinholters commented Dec 7, 2016

TotalVerb commented Dec 7, 2016

martinholters commented Dec 7, 2016

TotalVerb commented Dec 7, 2016

KristofferC commented Dec 7, 2016 • edited Loading

bkamins commented Sep 13, 2017 • edited Loading

mbauman Sep 15, 2017 • edited Loading

Choose a reason for hiding this comment

bkamins Sep 15, 2017

Choose a reason for hiding this comment

mbauman Sep 15, 2017

Choose a reason for hiding this comment

TotalVerb commented Sep 15, 2017

bkamins commented Sep 15, 2017

TotalVerb commented Sep 15, 2017

bkamins commented Sep 16, 2017

TotalVerb commented Sep 16, 2017

Jutho commented Sep 24, 2017

bkamins commented Sep 24, 2017

bkamins commented Nov 11, 2021

oscardssmith commented Nov 11, 2021

bkamins commented Nov 11, 2021

TotalVerb commented May 26, 2016 •

edited

Loading

TotalVerb commented May 26, 2016 •

edited

Loading

TotalVerb commented May 27, 2016 •

edited

Loading

KristofferC commented Dec 7, 2016 •

edited

Loading

bkamins commented Sep 13, 2017 •

edited

Loading

mbauman Sep 15, 2017 •

edited

Loading