Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default Thread settings (threads=true) cause bad performance on CPU #45

Closed
roflmaostc opened this issue Oct 23, 2020 · 3 comments
Closed

Comments

@roflmaostc
Copy link

Hey,

I'm on a 4 core machine. I started Julia with 4 threads and observe (relatively) poor performance with the default Tullio settings

using Zygote, Tullio
x = randn((2000, 2000));
f(x) = (@tullio y =  (x[i+1, j] - x[i-1, j])^2 + (x[i, j+1] - x[i, j-1])^2)
ft(x) = (@tullio threads=false y = (x[i+1, j] - x[i-1, j])^2 + (x[i, j+1] - x[i, j-1])^2)

@btime Zygote.gradient(f, x)[1];
@btime Zygote.gradient(ft, x)[1]

producing:

  100.746 ms (173 allocations: 30.53 MiB)
  11.788 ms (26 allocations: 30.52 MiB)

What's actually the reason for that? I could observe that none of the 4 Julia Threads achieved 100% CPU Usage. Rather 50 - 80% in the main thread, and around 20% in the other three.

Thanks,

Felix

@mcabbott
Copy link
Owner

mcabbott commented Oct 23, 2020

That isn't good. Here's what I see (on a 2-core machine, Julia 1.5.2, everything updated):

julia> @btime Zygote.gradient(f, x)[1];
  9.570 ms (163 allocations: 30.53 MiB)

julia> @btime Zygote.gradient(ft, x)[1];
  13.836 ms (26 allocations: 30.52 MiB)

# forward only
julia> @btime f($x);
  949.564 μs (92 allocations: 6.03 KiB)

julia> @btime ft($x);
  1.825 ms (0 allocations: 0 bytes)

I wonder what's causing it to be so different? Do you see a slowdown on the forward-only evaluation too?

@roflmaostc
Copy link
Author

Hey,

I've got the same versions on my machine. I'm not sure what exactly happens, but in a fresh REPL I get similar results.

using Zygote, Tullio, BenchmarkTools
x = abs.(randn((500, 500)));
f(x) = (@tullio y =  abs2(x[i+1, j] - x[i-1, j]) + abs2(x[i, j+1] - x[i, j-1]))
ft(x) = (@tullio threads=false y = abs2(x[i+1, j] - x[i-1, j]) + abs2(x[i, j+1] - x[i, j-1]))

@btime f($x);
@btime ft($x); 

@btime Zygote.gradient($f, $x)[1];
@btime Zygote.gradient($ft, $x)[1];

returns

  40.286 μs (92 allocations: 6.36 KiB)
  77.808 μs (0 allocations: 0 bytes)
  574.786 μs (162 allocations: 1.92 MiB)
  783.779 μs (11 allocations: 1.91 MiB)

However, initially I tested this in a larger Jupyter notebook where additional packages were loaded. After testing each package separatly found the source:

using Zygote, Tullio, BenchmarkTools, ImageView
x = abs.(randn((500, 500)));
f(x) = (@tullio y =  abs2(x[i+1, j] - x[i-1, j]) + abs2(x[i, j+1] - x[i, j-1]))
ft(x) = (@tullio threads=false y = abs2(x[i+1, j] - x[i-1, j]) + abs2(x[i, j+1] - x[i, j-1]))

@btime f($x);
@btime ft($x); 

@btime Zygote.gradient($f, $x)[1];
@btime Zygote.gradient($ft, $x)[1];

returns

  8.161 ms (93 allocations: 6.38 KiB)
  74.629 μs (0 allocations: 0 bytes)
  13.679 ms (164 allocations: 1.92 MiB)
  750.308 μs (11 allocations: 1.91 MiB)

Note that for threaded we see ms and not µs.

Profile (with a for loop to increase total computing time) inspection suggests that some GTK functions are involved.

Without Threads:


106 (35 %) | nothing
-- | --
80 (26 %) | /usr/bin/../share/julia/base/./fastmath.jl
70 (23 %) | /usr/bin/../share/julia/base/./array.jl
42 (14 %) | /home/fxw/.julia/packages/IJulia/rWZ9e/src/execute_request.jl

With Threads:


1002 (70 %) | /home/fxw/.julia/packages/Gtk/C22jV/src/events.jl
-- | --
144 (10 %) | nothing
90 (6 %) | /usr/bin/../share/julia/base/./fastmath.jl
85 (6 %) | /usr/bin/../share/julia/base/./array.jl
52 (4 %) | /home/fxw/.julia/packages/IJulia/rWZ9e/src/execute_request.jl

Searching for GTK performance issues brings me to: JuliaGraphics/Gtk.jl#503, JuliaLang/julia#35552

So I'm sorry that I posted it here since it turns out to be nothing caused by Tullio.
But still, that's imo a disappointing issue.

Felix

@mcabbott
Copy link
Owner

Ok, that sounds like a pretty thorny issue. Glad to hear it's not my fault though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants