Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant regression of groupby when threading #2735

Closed
bkamins opened this issue Apr 25, 2021 · 5 comments · Fixed by #2736
Closed

Significant regression of groupby when threading #2735

bkamins opened this issue Apr 25, 2021 · 5 comments · Fixed by #2736

Comments

@bkamins
Copy link
Member

bkamins commented Apr 25, 2021

Here are timings:

$ julia -e "using DataFrames, BenchmarkTools; n=100_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)"
  178.478 ms (80 allocations: 762.94 MiB)

$ julia -t 2 -e "using DataFrames, BenchmarkTools; n=100_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)"
  450.070 ms (681 allocations: 763.00 MiB)

$ julia -t 4 -e "using DataFrames, BenchmarkTools; n=100_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)"
  525.876 ms (682 allocations: 763.00 MiB)

$ julia -t 8 -e "using DataFrames, BenchmarkTools; n=100_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)"
  458.003 ms (681 allocations: 763.00 MiB)

We have a problem with your new macro I think @nalimilan. Maybe it splits the data into too small chunks? This is hard 😞. Can you please look at it as you have implemented this part? Otherwise I can check - please let me know.

@bkamins bkamins added this to the patch milestone Apr 25, 2021
@bkamins
Copy link
Member Author

bkamins commented Apr 25, 2021

same on Int64:

$ julia -e "using DataFrames, BenchmarkTools; n=100_000_000; df=DataFrame(passband=Int64.(rand(Int8,n))); @btime groupby($df,:passband)"
  270.989 ms (80 allocations: 762.94 MiB)

$ julia -t 2 -e "using DataFrames, BenchmarkTools; n=100_000_000; df=DataFrame(passband=Int64.(rand(Int8,n))); @btime groupby($df,:passband)"
  374.933 ms (682 allocations: 763.00 MiB)

$ julia -t 4 -e "using DataFrames, BenchmarkTools; n=100_000_000; df=DataFrame(passband=Int64.(rand(Int8,n))); @btime groupby($df,:passband)"
  593.278 ms (682 allocations: 763.00 MiB)

$ julia -t 8 -e "using DataFrames, BenchmarkTools; n=100_000_000; df=DataFrame(passband=Int64.(rand(Int8,n))); @btime groupby($df,:passband)"
  450.173 ms (682 allocations: 763.00 MiB)

@bkamins
Copy link
Member Author

bkamins commented Apr 25, 2021

I have first benchmarked hashrows. The conclusion is:

  1. for small bitstype (like Bool) using threading leads to a slowdown (but it is not very big - ~ 20%, changing chunk size does not influence this)
  2. for things like String we have a speedup

So we could leave things as is or add something like isbitstype(eltype(v)) && sizeof(eltype(v)) <= 2 (if 1, 2, 4, or 8 should be used we should benchmark) and then do not use threading.

@bkamins
Copy link
Member Author

bkamins commented Apr 25, 2021

for row_group_slots the only scenario in which I could generate speedup with threading was when we have a column with missing and we use skipmissing=true, but it was small. In general - currently I could not find a case where threading convincingly improves speed. Let us discuss what to do with it (maybe I am missing something)

@bkamins
Copy link
Member Author

bkamins commented Apr 26, 2021

First - this is a general problem @inbounds annotations in our code do not propagate as @spawn_for_chunks creates a function barrier. This does not resolve the issue but slows down the code in general.

Second - it seems that we do not distribute work among threads correctly. Here is an example on four threads (I have enabled printing which thread was spawned):

julia> n=20_000_000; df=DataFrame(passband=Int64.(rand(Int8,n)));

julia> groupby(df,:passband);
Threads.threadid() = 2
Threads.threadid() = 1
Threads.threadid() = 3
Threads.threadid() = 4
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2

julia> groupby(df,:passband);
Threads.threadid() = 1
Threads.threadid() = 2
Threads.threadid() = 3
Threads.threadid() = 4
Threads.threadid() = 1
Threads.threadid() = 3
Threads.threadid() = 4
Threads.threadid() = 1
Threads.threadid() = 2
Threads.threadid() = 1
Threads.threadid() = 3
Threads.threadid() = 4
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2

julia> groupby(df,:passband);
Threads.threadid() = 1
Threads.threadid() = 2
Threads.threadid() = 1
Threads.threadid() = 4
Threads.threadid() = 3
Threads.threadid() = 1
Threads.threadid() = 2
Threads.threadid() = 1
Threads.threadid() = 4
Threads.threadid() = 1
Threads.threadid() = 2
Threads.threadid() = 3
Threads.threadid() = 4
Threads.threadid() = 3
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2
Threads.threadid() = 2

as you can see one thread (in this case 2) is overloaded and the distribution of work among threads is not correct.

Here is another example (now printing is disabled) - I want to show a skew in the load distribution. Four threads:

julia> function f()
           x = zeros(Int8, 100_000_000)
           DataFrames.@spawn_for_chunks 1_000 for i in eachindex(x)
               @inbounds x[i] = Threads.threadid()
           end
           combine(groupby(DataFrame(x=x), :x), nrow)
       end
f (generic function with 1 method)

julia> f()
3×2 DataFrame
 Row │ x     nrow     
     │ Int8  Int64    
─────┼────────────────
   1 │    2  36481000
   2 │    3  31765000
   3 │    4  31754000

julia> f()
3×2 DataFrame
 Row │ x     nrow     
     │ Int8  Int64    
─────┼────────────────
   1 │    2  36296000
   2 │    3  31747000
   3 │    4  31957000

the same with 2 threads:

julia> f()
2×2 DataFrame
 Row │ x     nrow     
     │ Int8  Int64    
─────┼────────────────
   1 │    1  34189000
   2 │    2  65811000

julia> f()
2×2 DataFrame
 Row │ x     nrow     
     │ Int8  Int64    
─────┼────────────────
   1 │    1  27055000
   2 │    2  72945000

julia> f()
2×2 DataFrame
 Row │ x     nrow     
     │ Int8  Int64    
─────┼────────────────
   1 │    1  24047000
   2 │    2  75953000

@bkamins
Copy link
Member Author

bkamins commented Apr 26, 2021

One more example showing super strange behavior:

~$ julia -e 'using DataFrames, BenchmarkTools; n=2_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)'
  2.029 ms (68 allocations: 15.26 MiB)
~$ julia -t 2 -e 'using DataFrames, BenchmarkTools; n=2_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)'
  7.255 ms (81 allocations: 15.26 MiB)
~$ julia -t 4 -e 'using DataFrames, BenchmarkTools; n=2_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)'
  6.597 ms (81 allocations: 15.26 MiB)
~$ julia -t 8 -e 'using DataFrames, BenchmarkTools; n=2_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)'
  2.813 ms (81 allocations: 15.26 MiB)
~$ julia -e 'using DataFrames, BenchmarkTools; n=2_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)'
  2.591 ms (68 allocations: 15.26 MiB)
~$ julia -t 2 -e 'using DataFrames, BenchmarkTools; n=2_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)'
  7.998 ms (81 allocations: 15.26 MiB)
~$ julia -t 4 -e 'using DataFrames, BenchmarkTools; n=2_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)'
  3.098 ms (81 allocations: 15.26 MiB)
~$ julia -t 8 -e 'using DataFrames, BenchmarkTools; n=2_000_000; df=DataFrame(passband=rand(Int8,n)); @btime groupby($df,:passband)'
  2.934 ms (81 allocations: 15.26 MiB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant