Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Despecialization part 2 #2709

Merged
merged 7 commits into from
Apr 13, 2021
Merged

Despecialization part 2 #2709

merged 7 commits into from
Apr 13, 2021

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Apr 11, 2021

I changed grouped_reduce! by extracting out the part of code that does have less arguments (this is a relatively clean change and gives an improvement in compilation - but only a small one).

Further despecialization of combine_with_first! and combine_rows_with_first! (but not combine_tables_with_first! as it contains a hot loop). This second change is heavy handed, but gives the following benefits:

this PR

julia> using DataFrames

julia> df = DataFrame(a=[1,2],b=[3,4]);

julia> gdf = groupby(df, :a);

julia> @time combine(gdf, :a => x -> (b=1,) => AsTable);
  2.130731 seconds (6.10 M allocations: 353.595 MiB, 4.26% gc time, 99.98% compilation time)

julia> @time combine(gdf, :a => x -> (b=1,) => AsTable);
  0.101837 seconds (106.97 k allocations: 6.511 MiB, 99.33% compilation time)

julia> @time combine(gdf, :a => x -> (c=1,) => AsTable);
  0.229507 seconds (311.18 k allocations: 19.438 MiB, 99.48% compilation time)

julia> @time combine(gdf, :a => x -> (c=1,) => AsTable);
  0.115103 seconds (106.96 k allocations: 6.512 MiB, 10.16% gc time, 99.41% compilation time)

julia> @time combine(gdf, [:a, :b] => (x,y) -> (b=1, d=1) => AsTable);
  0.566155 seconds (988.28 k allocations: 60.440 MiB, 1.70% gc time, 99.65% compilation time)

julia> @time combine(gdf, [:a, :b] => (x,y) -> (b=1, d=1) => AsTable);
  0.097503 seconds (97.75 k allocations: 5.922 MiB, 99.25% compilation time)

julia> @time combine(gdf, [:a, :b] => (x,y) -> (c=1, d=1) => AsTable);
  0.260902 seconds (302.07 k allocations: 18.859 MiB, 9.65% gc time, 99.52% compilation time)

julia> @time combine(gdf, [:a, :b] => (x,y) -> (c=1, d=1) => AsTable);
  0.094826 seconds (97.75 k allocations: 5.920 MiB, 99.21% compilation time)

main

julia> using DataFrames

julia> df = DataFrame(a=[1,2],b=[3,4]);

julia> gdf = groupby(df, :a);

julia> @time combine(gdf, :a => x -> (b=1,) => AsTable);
  2.539910 seconds (7.96 M allocations: 462.058 MiB, 7.71% gc time, 23.15% compilation time)

julia> @time combine(gdf, :a => x -> (b=1,) => AsTable);
  0.179274 seconds (106.94 k allocations: 6.510 MiB, 34.09% gc time, 99.60% compilation time)

julia> @time combine(gdf, :a => x -> (c=1,) => AsTable);
  0.308809 seconds (462.22 k allocations: 28.670 MiB, 99.60% compilation time)

julia> @time combine(gdf, :a => x -> (c=1,) => AsTable);
  0.110407 seconds (106.94 k allocations: 6.512 MiB, 99.36% compilation time)

julia> @time combine(gdf, [:a, :b] => (x,y) -> (b=1, d=1) => AsTable);
  0.654508 seconds (1.15 M allocations: 70.638 MiB, 2.67% gc time, 99.71% compilation time)

julia> @time combine(gdf, [:a, :b] => (x,y) -> (b=1, d=1) => AsTable);
  0.107497 seconds (97.73 k allocations: 5.919 MiB, 99.30% compilation time)

julia> @time combine(gdf, [:a, :b] => (x,y) -> (c=1, d=1) => AsTable);
  0.318430 seconds (453.12 k allocations: 28.086 MiB, 2.48% gc time, 99.60% compilation time)

julia> @time combine(gdf, [:a, :b] => (x,y) -> (c=1, d=1) => AsTable);
  0.102344 seconds (97.73 k allocations: 5.919 MiB, 99.25% compilation time)

by despecializing we incur a cost of having a loop over threads that is not specialized and will use dynamic dispatch but I assume that this loop is small (up to number of available threads). Is this correct @nalimilan?

@bkamins bkamins added this to the 1.0 milestone Apr 11, 2021
@nalimilan
Copy link
Member

by despecializing we incur a cost of having a loop over threads that is not specialized and will use dynamic dispatch but I assume that this loop is small (up to number of available threads). Is this correct @nalimilan?

Sounds reasonable, but can you check that performance isn't affected for large tables?

src/groupeddataframe/fastaggregates.jl Outdated Show resolved Hide resolved
@assert f isa Base.Callable
@assert incols isa Union{Nothing, AbstractVector, Tuple, NamedTuple}
@assert first isa Union{NamedTuple, DataFrameRow, AbstractDataFrame}
@assert firstmulticol isa Union{Val{true}, Val{false}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A despecialized Val is quite paradoxical. How about making this a Bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point - I will change it. The problem was that for Val compiler genterates methos for Val{true}, Val{false} and Val{T} where T and this last method is not needed by us.

@bkamins
Copy link
Member Author

bkamins commented Apr 11, 2021

Sounds reasonable, but can you check that performance isn't affected for large tables?

Yes. I will do some timings (these always take time) and report here (probably today in the evening or tomorrow).

@bkamins
Copy link
Member Author

bkamins commented Apr 11, 2021

This is super strange.

main timing of grouping tests run twice are:

417.883939 seconds (1.09 G allocations: 61.362 GiB, 5.79% gc time, 0.00% compilation time)
171.446700 seconds (574.79 M allocations: 31.722 GiB, 5.93% gc time, 0.10% compilation time)

with this PR wrapping Val in Ref I get:

394.539858 seconds (1.06 G allocations: 59.078 GiB, 5.72% gc time, 0.00% compilation time)
171.479467 seconds (575.52 M allocations: 31.758 GiB, 5.65% gc time, 0.10% compilation time)

now if I pass Bool and only switch it to Ref at the end I get:

398.629919 seconds (1.08 G allocations: 60.430 GiB, 5.81% gc time, 0.00% compilation time)
177.364990 seconds (590.24 M allocations: 32.977 GiB, 5.87% gc time, 0.10% compilation time)

which is quite puzzling.

Therefore I decided to go for a cleaner design and introduce types that signal if we have a single or multi column aggregation. With them the timing is:

393.826773 seconds (1.06 G allocations: 59.289 GiB, 5.70% gc time, 0.00% compilation time)
174.152069 seconds (578.06 M allocations: 31.944 GiB, 5.83% gc time, 0.10% compilation time)

so this is comparable to passing around Val wrapped in Ref but I would go for it as the design is much more clean I think with it. This is what I have pushed.

Now I am benchmarking the performance on larger split-apply-combine operations.

@bkamins
Copy link
Member Author

bkamins commented Apr 11, 2021

The performance benchmarks show that this PR does not regress against main and in general we have a strict improvement over 0.22.7 release. Could you just quickly have a look at 1 vs 4 threads timings to confirm that the cases where threads speed up things are as as expected (this is my understanding as I understand currently combining rows is only threaded, but please confirm).

Here are the details:

# -------- setup code
julia> using DataFrames, BenchmarkTools
julia> df = DataFrame(a=repeat(1:10^6, 10), b=1:10^7);
julia> gdf = groupby(df, :a);
julia> f1(x) = sum(x);
julia> f2(x) = [sum(x), 1];
julia> f3(x) = (x=sum(x), y=1);
julia> f4(x) = (x=[sum(x), 1], y=[1, 1]);

# -------- PR 1 thread
julia> @btime combine($gdf, :b => sum);
  21.486 ms (210 allocations: 23.86 MiB)
julia> @btime combine($gdf, :b => f1);
  232.539 ms (1000289 allocations: 175.49 MiB)
julia> @btime combine($gdf, :b => f2);
  951.362 ms (15999252 allocations: 644.35 MiB)
julia> @btime combine($gdf, :b => f3 => AsTable);
  287.552 ms (1000337 allocations: 190.75 MiB)
julia> @btime combine($gdf, :b => f4 => AsTable);
  1.050 s (16999331 allocations: 798.68 MiB)

# -------- PR 4 threads
julia> @btime combine($gdf, :b => sum);
  21.261 ms (212 allocations: 23.86 MiB)
julia> @btime combine($gdf, :b => f1);
  98.041 ms (1000334 allocations: 175.49 MiB)
julia> @btime combine($gdf, :b => f2);
  937.994 ms (15999254 allocations: 644.35 MiB)
julia> @btime combine($gdf, :b => f3 => AsTable);
  112.330 ms (1000383 allocations: 190.76 MiB)
julia> @btime combine($gdf, :b => f4 => AsTable);
  1.069 s (16999334 allocations: 798.68 MiB)

# -------- main 1 thread
julia> @btime combine($gdf, :b => sum);
  21.530 ms (208 allocations: 23.86 MiB)
julia> @btime combine($gdf, :b => f1);
  289.125 ms (1000267 allocations: 175.49 MiB)
julia> @btime combine($gdf, :b => f2);
  925.847 ms (15999252 allocations: 644.35 MiB)
julia> @btime combine($gdf, :b => f3 => AsTable);
  292.836 ms (1000315 allocations: 190.75 MiB)
julia> @btime combine($gdf, :b => f4 => AsTable);
  1.058 s (16999331 allocations: 798.68 MiB)

# -------- main 4 threads
julia> @btime combine($gdf, :b => sum);
  21.382 ms (210 allocations: 23.86 MiB)
julia> @btime combine($gdf, :b => f1);
  96.502 ms (1000318 allocations: 175.49 MiB)
julia> @btime combine($gdf, :b => f2);
  951.696 ms (15999254 allocations: 644.35 MiB)
julia> @btime combine($gdf, :b => f3 => AsTable);
  119.321 ms (1000367 allocations: 190.76 MiB)
julia> @btime combine($gdf, :b => f4 => AsTable);
  1.093 s (16999334 allocations: 798.68 MiB)

# -------- 0.22.7 1 thread
julia> @btime combine($gdf, :b => sum);
  21.310 ms (169 allocations: 23.85 MiB)
julia> @btime combine($gdf, :b => f1);
  269.108 ms (9999177 allocations: 312.80 MiB)
julia> @btime combine($gdf, :b => f2);
  1.081 s (20999215 allocations: 842.71 MiB)
julia> @btime combine($gdf, :b => f3 => AsTable);
  388.781 ms (9999193 allocations: 404.36 MiB)
julia> @btime combine($gdf, :b => f4 => AsTable);
  1.268 s (22999248 allocations: 1.02 GiB)

# -------- 0.22.7 4 threads
julia> @btime combine($gdf, :b => sum);
  21.947 ms (169 allocations: 23.85 MiB)
julia> @btime combine($gdf, :b => f1);
  270.780 ms (9999177 allocations: 312.80 MiB)
julia> @btime combine($gdf, :b => f2);
  1.186 s (20999215 allocations: 842.71 MiB)
julia> @btime combine($gdf, :b => f3 => AsTable);
  385.580 ms (9999193 allocations: 404.36 MiB)
julia> @btime combine($gdf, :b => f4 => AsTable);
  1.371 s (22999248 allocations: 1.02 GiB)

in summary - this PR looks good.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks look good. Any failure to specialize in a place where it matters would would have dramatic effects that would be hard to miss.

@bkamins bkamins merged commit 3b45c5b into main Apr 13, 2021
@bkamins bkamins deleted the split_methods branch April 13, 2021 10:05
@bkamins
Copy link
Member Author

bkamins commented Apr 13, 2021

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants