n-arg map performance #17321

davidagold · 2016-07-07T17:29:23Z

While working on map for NullableArrays (JuliaStats/NullableArrays.jl#128 (comment)), I came across some evidence that the current n-arg map implementation in Base might be leaving performance on the table. Here's a simple benchmark for the current Base implementation:

using BenchmarkTools
n = 1_000_000
As = [ rand(n) for i in 1:5 ]
f(xs...) = prod(xs)

julia> @benchmark map(f, As...)
BenchmarkTools.Trial: 
  samples:          155
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  7.63 mb
  allocs estimate:  10
  minimum time:     27.25 ms (0.00% GC)
  median time:      30.81 ms (0.00% GC)
  mean time:        32.41 ms (2.55% GC)
  maximum time:     67.57 ms (0.00% GC)

Here's the same benchmark for an alternative implementation (mymap in https://gist.github.com/davidagold/d7088aae22f23d383e5bf1f26aa1a045) that (1) avoids using ith_all to index into the As and (2) avoids ziping the As together in the construction of a Generator(f, As...) object:

julia> @benchmark mymap(f, As...)
BenchmarkTools.Trial: 
  samples:          814
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  7.63 mb
  allocs estimate:  8
  minimum time:     4.58 ms (0.00% GC)
  median time:      5.44 ms (0.00% GC)
  mean time:        6.15 ms (13.94% GC)
  maximum time:     13.49 ms (22.69% GC)

julia> mymap(f, As...) == map(f, As...)
true

In this case, mymap is 5x faster. However, its implementation involves the use of a macro and generated functions in place of ith_all. Is the speed up here worth introducing such changes into the Base implementation?

cc @nalimilan

The text was updated successfully, but these errors were encountered:

Sacha0 · 2016-07-07T17:45:16Z

broadcast performance appears much better:

julia> @benchmark map(f, As...)
BenchmarkTools.Trial:
  samples:          165
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  7.63 mb
  allocs estimate:  10
  minimum time:     27.52 ms (0.00% GC)
  median time:      30.26 ms (0.00% GC)
  mean time:        30.37 ms (2.57% GC)
  maximum time:     39.38 ms (7.84% GC)

julia> @benchmark broadcast(f, As...)
BenchmarkTools.Trial:
  samples:          887
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  7.63 mb
  allocs estimate:  83
  minimum time:     4.57 ms (0.00% GC)
  median time:      4.89 ms (0.00% GC)
  mean time:        5.63 ms (13.98% GC)
  maximum time:     9.60 ms (26.82% GC)

Again I wonder whether map and broadcast should be collapsed (#4883 (comment)). Best!

JeffBezanson · 2016-07-07T18:09:12Z

Is the speed up here worth introducing such changes into the Base implementation?

No. I'm convinced we can optimize zip and generators sufficiently. See for example #15648, which discusses more general problems with iterating over multiple arrays. There are also already significant improvements in #16622 just from adding some inline declarations.

In any case, performance hacks should be directed at zip. For example, we might need a specialization that shares an index among multiple arrays, as LLVM (understandably) seems to have a hard time proving that two counters always have the same value.

Also noting that ith_all is only used by map!, not map.

davidagold · 2016-07-07T18:17:34Z

Got it. Thank you for the resources, Jeff!

KristofferC · 2018-10-18T03:26:34Z

This regressed quite a lot

julia> @benchmark map(f, As...)
BenchmarkTools.Trial:
  memory estimate:  1.02 GiB
  allocs estimate:  30998464
  --------------
  minimum time:     1.932 s (5.44% GC)

oscardssmith · 2020-12-31T04:57:08Z

Bumping this. Anything we can do here?

vtjnash · 2021-01-08T07:58:25Z

Is there anything to do here? I just ran the above on my machine, and saw 2-4ms on everything posted above.

JeffBezanson mentioned this issue Jul 8, 2016

Another unnecessary GC root #17342

Closed

kshyatt added performance Must go faster arrays [a, r, r, a, y, s] labels Jul 28, 2016

simonster mentioned this issue Aug 14, 2016

map/map! with >2 input arrays is slow #9900

Closed

Sacha0 mentioned this issue Dec 29, 2016

Use broadcast for some operations in arraymath #19746

Merged

KristofferC added the regression Regression in behavior compared to a previous version label Oct 18, 2018

oscardssmith added the fold sum, maximum, reduce, foldl, etc. label Dec 31, 2020

vtjnash closed this as completed Jan 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n-arg map performance #17321

n-arg map performance #17321

davidagold commented Jul 7, 2016

Sacha0 commented Jul 7, 2016 •

edited

Loading

JeffBezanson commented Jul 7, 2016

davidagold commented Jul 7, 2016

KristofferC commented Oct 18, 2018

oscardssmith commented Dec 31, 2020

vtjnash commented Jan 8, 2021

n-arg map performance #17321

n-arg map performance #17321

Comments

davidagold commented Jul 7, 2016

Sacha0 commented Jul 7, 2016 • edited Loading

JeffBezanson commented Jul 7, 2016

davidagold commented Jul 7, 2016

KristofferC commented Oct 18, 2018

oscardssmith commented Dec 31, 2020

vtjnash commented Jan 8, 2021

Sacha0 commented Jul 7, 2016 •

edited

Loading