Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of nested functions #4428

Closed
magistere opened this issue Oct 3, 2013 · 12 comments
Closed

Performance of nested functions #4428

magistere opened this issue Oct 3, 2013 · 12 comments
Assignees
Labels
performance Must go faster

Comments

@magistere
Copy link
Contributor

I've found that nested functions usage cause significant performance degrade. In the following sample script I got increasing run time from 0.4s to 2.6s on my machine.

g(n) = 1.0/n

function f(n)
  sum = 0.0
  g1(n) = 1.0/n
  for i = 1:n
    sum += g(i) # change to g1 to use nested function
  end
  sum
end

tic()
println(f(10_000_000))
toc()
@johnmyleswhite
Copy link
Member

Isn't part of the execution time compiling the nested function?

@magistere
Copy link
Contributor Author

I think it's not the time of compiling nested function. I checked generated code and it is very different for g and g1 functions.

@johnmyleswhite
Copy link
Member

Hopefully @JeffBezanson will chime in then. It's beyond my depth at that point.

@Staross
Copy link

Staross commented Oct 9, 2013

Putting everything in a function gives me a bit different results:

function test(n,g)

function f1(n)
  sum = 0.0
  g1(k) = 1.0/k
  for i = 1:n
    sum += g1(i) 
  end
  sum
end

function f2(n)
  sum = 0.0
  for i = 1:n
    sum += g(i) 
  end
  sum
end

function f3(n)
  sum = 0.0
  for i = 1:n
    sum += 1.0/i 
  end
  sum
end

@time f1(n)
@time f2(n)
@time f3(n)

end

julia> g(k) = 1.0/k

julia> test(1e7,g)
elapsed time: 1.732656711 seconds
elapsed time: 1.798725878 seconds
elapsed time: 0.116713095 seconds
16.695311365857272

Hard-coded inline function wins by a factor >10.

@magistere
Copy link
Contributor Author

That's because you pass function g as parameter and it cannot be inlined. Try the following script:

g(k) = 1.0/k

function test(n)

  function f1(n)
    sum = 0.0
    g1(k) = 1.0/k
    for i = 1:n
      sum += g1(i) 
    end
    sum
  end

  function f2(n)
    sum = 0.0
    for i = 1:n
      sum += g(i) 
    end
    sum
  end

  function f3(n)
    sum = 0.0
    for i = 1:n
      sum += 1.0/i 
    end
    sum
  end

  @time f1(n)
  @time f2(n)
  @time f3(n)

end

test(1e7)

My results:

elapsed time: 2.515007634 seconds (320032508 bytes allocated)
elapsed time: 0.182064481 seconds (16712 bytes allocated)
elapsed time: 0.177499096 seconds (16712 bytes allocated)

Also allocated memory is much more in the first case.

@illerucis
Copy link
Contributor

I ran the script above, doubling n each time:

---------------- f(n)
elapsed time: 1.181162964 seconds (480060496 bytes allocated)
elapsed time: 0.107616771 seconds (27620 bytes allocated)
elapsed time: 0.111685732 seconds (27620 bytes allocated)

----------------- f(2n)
elapsed time: 2.318789561 seconds (959992304 bytes allocated)
elapsed time: 0.215161285 seconds (32 bytes allocated)
elapsed time: 0.216431597 seconds (32 bytes allocated)

----------------- f(4n)
elapsed time: 4.671037716 seconds (1919992304 bytes allocated)
elapsed time: 0.426870211 seconds (32 bytes allocated)
elapsed time: 0.445708897 seconds (32 bytes allocated)

----------------- f(8n)
elapsed time: 9.218067983 seconds (383999230 bytes allocated)
elapsed time: 0.883522286 seconds (32 bytes allocated)
elapsed time: 0.875873957 seconds (32 bytes allocated)

If the overhead were due to compiling the nested function, I think there would be some constant baseline for each iteration of f1...but here the elapsed time doubles each time n is doubled.

@illerucis
Copy link
Contributor

I ran the script removing the sum variable (just making calls to the inline function g1 in f1) for n = 10^7. The memory allocated was ~ (10^7)_32 bytes. For n = 2_10^7, memory allocated was (2_10^7)_32 bytes. It looks like the compiler is not inlining g1, resulting in 32 extra bytes per call to the function.

@ghost ghost assigned JeffBezanson Oct 23, 2013
@magistere
Copy link
Contributor Author

Related issue #1864

@sebastiang
Copy link

I ran into this issue doing some benchmarking. In many cases anonymous or nested functions can either be lifted to global scope, i.e. close over no variables, or don't ever escape as values, i.e. could be re-written as global functions with extra arguments passed at the call site. Does Julia have a lambda lifting pass in its compiler? (My benchmarking is here: http://www.palladiumconsulting.com/2014/09/little-performance-explorations-julia/)

@eschnett
Copy link
Contributor

Very nice series of articles. Thanks for sharing.

-erik

On Tue, Sep 30, 2014 at 11:38 AM, Sebastian Good [email protected]
wrote:

I ran into this issue doing some benchmarking. In many cases anonymous or
nested functions can either be lifted to global scope, i.e. close over no
variables, or don't ever escape as values, i.e. could be re-written as
global functions with extra arguments passed at the call site. Does Julia
have a lambda lifting pass in its compiler? (My benchmarking is here:
http://www.palladiumconsulting.com/2014/09/little-performance-explorations-julia/
)

Reply to this email directly or view it on GitHub
#4428 (comment).

Erik Schnetter [email protected]
http://www.perimeterinstitute.ca/personal/eschnetter/

@timholy
Copy link
Member

timholy commented Sep 30, 2014

For the short term, see https://github.com/timholy/FastAnonymous.jl.

Nice post, by the way; glad you were persistent and figured out how to get the performance you wanted.

@jakebolewski
Copy link
Member

closing as a dup of #1864

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

9 participants