map on tuples is prohibitively slow #15695

shashi · 2016-03-30T15:06:25Z

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0-dev+3333 (2016-03-30 03:42 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 4a13500* (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> t = ntuple(x->x,1000);

julia> @time map(-, t)
488.491589 seconds (4.79 M allocations: 227.712 MB, 0.01% gc time)

julia> @time map(-, t)
  0.065904 seconds (620.06 k allocations: 26.750 MB, 6.09% gc time)

Two alternative implementations (for tuples with > 3 elements) would be:

julia> function map1(f, t::Tuple)
           ([f(x) for x in t]...)
       end
map1 (generic function with 1 method)

julia> @time map1(-, t);
  0.028094 seconds (9.86 k allocations: 449.154 KB)

julia> @time map1(-, t);
  0.000096 seconds (498 allocations: 39.500 KB)

# this one is by @SimonDanisch
@generated function map2(f, x::Tuple)
              len = length(x.parameters)
              expr = [:(f(x[$i])) for i=1:len]
              :(tuple($(expr...)))
       end

julia> @time map2(-, t);
 10.158751 seconds (233.77 k allocations: 10.337 MB)

julia> @time map2(-, t);
  0.000030 seconds (6 allocations: 8.047 KB)

SimonDanisch · 2016-03-30T16:24:35Z

Is that a case for this tail call optimization everyone's talking about?!

JeffBezanson · 2016-03-30T16:27:30Z

No.

Keno · 2016-03-30T16:28:34Z

+1. I had one map on tuples in JuliaParser and it slowed down the whole thing by about 2-3 orders of magnitude.

SimonDanisch · 2016-03-30T16:36:51Z

Oh I see, constructing the tuple is the last action, not the call to map...

JeffBezanson · 2016-03-30T16:38:47Z

We could probably get the equivalent of the generated map2 function by adding inline declarations, but at some point we'll still want an iterative version.

SimonDanisch · 2016-03-30T16:45:21Z

So stack allocated Ref plus setindex!(::Ref{Tuple})? Is that still the trick we want to apply for this kind of problem?

Keno · 2016-03-30T16:48:22Z

Yes

JeffBezanson · 2016-03-30T16:51:08Z

Yes, but the comprehension version might have to do in the mean time.

related: #14126 #15516

Probably all the various definitions for map on tuples should go away, and be replaced by #15516 with a generator. That way we can consolidate a clever implementation in one place.

shashi · 2016-03-30T18:57:29Z

Similarly map(f, ::Tuple...) method seems to be slower than the comprehension alternative

function map2(f, ts::Tuple...)
    ([f([t[i] for t in ts]...) for i in 1:length(ts[1])]...)
end

julia> function foo(f, n)
           t = ntuple(x->x, n)
           @time f(+, t,t)
           @time f(+, t,t,t)
           @time f(+, t,t,t,t)
       end
julia> foo(map, 2)
  0.003090 seconds (725 allocations: 35.893 KB)
  1.004426 seconds (1.73 M allocations: 66.629 MB, 1.60% gc time)
  0.002223 seconds (162 allocations: 7.375 KB)
(4,8)

julia> foo(map, 2)
  0.000001 seconds (1 allocation: 32 bytes) # this has a special case
  0.000133 seconds (86 allocations: 3.516 KB)
  0.000146 seconds (117 allocations: 4.375 KB)
(4,8)
julia> foo(map2, 2)
  0.024458 seconds (23.03 k allocations: 997.798 KB)
  0.000011 seconds (4 allocations: 352 bytes)
  0.000006 seconds (4 allocations: 352 bytes)
(4,8)

julia> foo(map2, 2)
  0.000023 seconds (4 allocations: 320 bytes)
  0.000006 seconds (4 allocations: 352 bytes)
  0.000005 seconds (4 allocations: 352 bytes)
(4,8)
julia> foo(map, 3)
  0.008495 seconds (4.66 k allocations: 232.674 KB)
  0.450469 seconds (699.39 k allocations: 27.190 MB, 2.75% gc time)
  0.000150 seconds (186 allocations: 7.500 KB)
(4,8,12)

julia> foo(map, 3)
  0.000039 seconds (11 allocations: 672 bytes)
  0.000144 seconds (132 allocations: 5.500 KB)
  0.000132 seconds (182 allocations: 7.125 KB)
(4,8,12)
julia> foo(map2, 3)
  0.025736 seconds (23.19 k allocations: 1007.829 KB)
  0.000010 seconds (5 allocations: 480 bytes)
  0.000005 seconds (5 allocations: 480 bytes)
(4,8,12)

julia> foo(map2, 3)
  0.000017 seconds (5 allocations: 432 bytes)
  0.000006 seconds (5 allocations: 480 bytes)
  0.000005 seconds (5 allocations: 480 bytes)
(4,8,12)

JeffBezanson · 2016-03-30T19:12:37Z

Out of curiosity --- how did this come up? Was there an interesting use case?

shashi · 2016-03-30T19:48:23Z

In ComputeFramework there's a type called Thunk which holds a function to run and the arguments to the function which are in a tuple. https://github.com/shashi/ComputeFramework.jl/blob/07fd1a415f666a04ae431862405c660df525aab6/src/thunk.jl#L10-L11

the arguments can themselves be thunks. The scheduler runs a thunk when all input thunks are computed: JuliaParallel/Dagger.jl@93c7e79 (it was using map to look up results, no particular reason)

sometimes the inputs tuple can become pretty big. (e.g. sum on a matrix of size 10^5x10^4 split up into 10^3x10^3 chunks produces 10^3 inputs to a single thunk - so it would just get stuck at the map in the last step of consolidating the sums)

I picked tuples because I could do things like this:
https://github.com/shashi/ComputeFramework.jl/blob/master/src/thunk.jl#L28-L44 (fuse two thunks if second thunk is the only input to the first one) but I don't really use that, because that subverts other optimizations by the scheduler which might use the second thunk.

Do you think inputs as Array{Any} is a better alternative? if tuples are faster in most common cases (1 or 2 inputs), I should keep them. I will need to take a closer look at that before changing.

tl; dr: I didn't need to use map here.

timholy · 2016-03-31T11:08:07Z

There's a tension between efficiency of compilation and efficiency of production. Currently there's an easy test case:

julia> @time ntuple(identity, 200);
  0.010773 seconds (4.50 k allocations: 324.949 KB)

julia> @time ntuple(identity, Val{200});
  4.913740 seconds (359.06 k allocations: 23.602 MB, 0.48% gc time)

The Val version uses recursive compilation, so this is just hammering the compiler.

However, for those of us who work on core array code, we use tuples that tend to be of the length of the dimensionality of an array, not the contents of the array. So our tuples are typically of length 2-4. In which case, this is informative:

julia> sum3(t::NTuple{3}) = t[1] + t[2] + t[3]
sum3 (generic function with 1 method)

julia> function tuple1(n)
           s = 0
           for i = 1:n
               t = ntuple(identity, 3)
               s += sum3(t)
           end
           s
       end
tuple1 (generic function with 1 method)

julia> function tuple2(n)
           s = 0
           for i = 1:n
               t = ntuple(identity, Val{3})
               s += sum3(t)
           end
           s
       end
tuple2 (generic function with 1 method)

julia> tuple1(1)
6

julia> tuple2(1)
6

julia> @time tuple1(10^6)
  0.145727 seconds (2.00 M allocations: 30.515 MB, 12.31% gc time)
6000000

julia> @time tuple2(10^6)
  0.005430 seconds (7 allocations: 208 bytes)
6000000

In other words, there is currently a tension between the needs for users of small tuples and those hoping to use large tuples.

shashi · 2016-03-31T11:34:29Z

Just to be clear, I didn't need to use large tuples in my case. I just happened to because I didn't know this would be a problem.

I agree that for smaller tuples, there should be specific methods (or generated ones) so that core array code dealing with small tuples is really fast, but if a user does (maybe inadvertently) use a bigger tuple something like map should not be so surprisingly slow.

timholy · 2016-03-31T11:56:21Z

Not a bad plan. I was hoping to get away from "breaks" in performance at n=5 (for example), but maybe the alternative is worse.

timholy · 2016-03-31T12:15:12Z

I should acknowledge that one valid criticism of my benchmark above is that in tuple1, t is not type-stable. So I tested this:

julia> ntuple2{N}(f, ::Type{Val{N}}) = ([f(i) for i = 1:N]...)::NTuple{N,typeof(f(1))}
ntuple2 (generic function with 1 method)

and verified that the return type is predictable, switched tuple1 to use it, and verified that tuple1 and tuple2 produce machine code that looks very similar upon casual inspection. However, the same benchmark ran in 0.5 seconds, actually worse than the type-unstable version (100-fold slower than tuple2), presumably because of the intermediate array creation.

shashi · 2016-06-15T22:42:12Z

#16460 fixes this...

simonster added the performance Must go faster label Mar 30, 2016

shashi added a commit to JuliaParallel/Dagger.jl that referenced this issue Mar 30, 2016

don't use map on tuples in admin tasks, ref JuliaLang/julia#15695

93c7e79

shashi added a commit that referenced this issue Mar 30, 2016

Speed up map on tuples. Fix #15695 till #15516 fixes it properly.

1a2da9e

tkelman added the potential benchmark Could make a good benchmark in BaseBenchmarks label Mar 30, 2016

shashi mentioned this issue Mar 31, 2016

Speed up map on tuples. Fixes #15695 #15702

Closed

vtjnash mentioned this issue Apr 27, 2016

NTuple zero-initialization performance #16064

Closed

JeffBezanson mentioned this issue May 6, 2016

faster tuple comparison and searchsorted JuliaData/IndexedTables.jl#5

Closed

yuyichao mentioned this issue May 26, 2016

Faster reverse for tuples #16604

Closed

shashi closed this as completed Jun 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

map on tuples is prohibitively slow #15695

map on tuples is prohibitively slow #15695

shashi commented Mar 30, 2016

SimonDanisch commented Mar 30, 2016

JeffBezanson commented Mar 30, 2016

Keno commented Mar 30, 2016

SimonDanisch commented Mar 30, 2016

JeffBezanson commented Mar 30, 2016

SimonDanisch commented Mar 30, 2016

Keno commented Mar 30, 2016

JeffBezanson commented Mar 30, 2016

shashi commented Mar 30, 2016

JeffBezanson commented Mar 30, 2016

shashi commented Mar 30, 2016

timholy commented Mar 31, 2016

shashi commented Mar 31, 2016

timholy commented Mar 31, 2016

timholy commented Mar 31, 2016

shashi commented Jun 15, 2016

map on tuples is prohibitively slow #15695

map on tuples is prohibitively slow #15695

Comments

shashi commented Mar 30, 2016

SimonDanisch commented Mar 30, 2016

JeffBezanson commented Mar 30, 2016

Keno commented Mar 30, 2016

SimonDanisch commented Mar 30, 2016

JeffBezanson commented Mar 30, 2016

SimonDanisch commented Mar 30, 2016

Keno commented Mar 30, 2016

JeffBezanson commented Mar 30, 2016

shashi commented Mar 30, 2016

JeffBezanson commented Mar 30, 2016

shashi commented Mar 30, 2016

timholy commented Mar 31, 2016

shashi commented Mar 31, 2016

timholy commented Mar 31, 2016

timholy commented Mar 31, 2016

shashi commented Jun 15, 2016