Latency comparison between transform and combine with grouped data frame #2516

pdeffebach · 2020-11-03T16:22:17Z

julia> df = DataFrame(a = [1, 1, 1, 2, 2, 2], b = [1, 2, 3, 100, 200, 300]);

julia> gd = groupby(df, :b);

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  0.045820 seconds (27.15 k allocations: 1.378 MiB, 99.11% compilation time)

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  0.046607 seconds (27.14 k allocations: 1.378 MiB, 98.77% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.085262 seconds (211.92 k allocations: 12.443 MiB, 99.58% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.092077 seconds (211.93 k allocations: 12.444 MiB, 99.41% compilation time)

Side note: I think we are compiling the anonymous function anew for every group, which I can't quite prove but seems consistent with, say

julia> @time transform(df, (d -> (fb = first(d.b),)));
  0.028524 seconds (2.60 k allocations: 175.983 KiB, 99.14% compilation time)

julia> @time transform(df, (d -> (fb = first(d.b),)));
  0.013412 seconds (2.60 k allocations: 175.937 KiB, 98.18% compilation time)

The text was updated successfully, but these errors were encountered:

bkamins · 2020-11-03T16:27:00Z

Can you please check what happens on e.g. a data frame with 10^6 groups? (I do not have access to computer on which I can check it now).

The discrepancy between DataFrame and GroupedDataFrame is expected. I intentionally have a separate code for DataFrame as it is much easier so it can be compiled faster (the downside is that it is much harder to maintain it as we have two separate code bases that have to be checked to produce consistent results).

pdeffebach · 2020-11-03T16:35:34Z

Wow combine is incredible! I'm not sure these results really speak to the problem at hand, but this is very impressive, and shows the need to more optimizations for transform with a grouped data frame

julia> df = DataFrame(a = rand(1:10^6, 10^8), b = randn(10^8));

julia> gd = groupby(df, :a);

julia> @time transform(gd, (d -> (fb = first(d.b),)));
 32.722459 seconds (312.20 M allocations: 11.534 GiB, 3.17% gc time, 0.63% compilation time)

julia> @time transform(gd, (d -> (fb = first(d.b),)));
 17.308288 seconds (312.03 M allocations: 10.763 GiB, 4.79% gc time, 0.24% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  2.061366 seconds (12.21 M allocations: 1.096 GiB, 18.91% gc time, 4.40% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  1.769705 seconds (12.21 M allocations: 1.096 GiB, 4.33% gc time, 5.64% compilation time)

EDIT: Nevermind, it's all allocation time

julia> @time combine(gd, (d -> (fb = fill(first(d.b), nrow(d)),)));
 33.456023 seconds (612.51 M allocations: 23.565 GiB, 11.19% gc time, 1.01% compilation time)

julia> @time combine(gd, (d -> (fb = fill(first(d.b), nrow(d)),)));
 34.060969 seconds (612.21 M allocations: 23.548 GiB, 11.24% gc time, 0.29% compilation time)

pdeffebach · 2020-11-03T16:41:02Z

With a smaller number of rows we have

julia> df = DataFrame(a = rand(1:10^6, 10^6), b = randn(10^6));

julia> gd = groupby(df, :a);

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  1.201928 seconds (10.61 M allocations: 299.578 MiB, 20.88% gc time, 3.73% compilation time)

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  1.049436 seconds (10.61 M allocations: 299.578 MiB, 2.67% gc time, 5.01% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.900990 seconds (7.80 M allocations: 216.306 MiB, 2.14% gc time, 9.89% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  1.110158 seconds (7.80 M allocations: 216.303 MiB, 12.39% gc time, 7.92% compilation time)

bkamins · 2020-11-03T17:17:23Z

Yes - but it shows that there is no dynamic dispatch per group. We simply need to nail down the biggest source of compilation cost.

pdeffebach · 2020-11-03T17:40:17Z

It's coming from the groupby, also

julia> @time @pipe df |>
           @transform(_, y = 10 * :x) |>
           @where(_, :a .> 2) |>
           #groupby(_, :b) |>
           @transform(_, meanX = mean(:x), meanY = mean(:y)) |>
           @orderby(_, :meanX) |>
           @select(_, :meanX, :meanY, var = :b);
  0.054581 seconds (9.93 k allocations: 675.682 KiB, 98.58% compilation time)

julia> @time @pipe df |>
           @transform(_, y = 10 * :x) |>
           @where(_, :a .> 2) |>
           groupby(_, :b) |>
           @transform(_, meanX = mean(:x), meanY = mean(:y)) |>
           @orderby(_, :meanX) |>
           @select(_, :meanX, :meanY, var = :b);
  0.113268 seconds (59.50 k allocations: 3.096 MiB, 99.09% compilation time)

pdeffebach · 2020-11-03T17:43:30Z

I think this last one is the most instructive because in theory there should be no compilation cost difference between the first and the second. I'm worried this has to do with transform in a grouped data frame being type unstable because of the ungroup option.

bkamins · 2020-11-03T18:35:13Z

you can annotate the return value in DataFramesMeta.jl to check, but I think it should not matter.

pdeffebach · 2020-11-03T18:50:42Z

Yeah that didn't help. This one is definitely mysterious.

bkamins · 2021-03-29T12:04:37Z

This is tested on main:

julia> @snoopi_deep transform(gd, (d -> (fb = first(d.b),)))
InferenceTimingNode: 0.071197/0.166758 on InferenceFrameInfo for Core.Compiler.Timings.ROOT() with 3 direct children

julia> @snoopi_deep combine(gd, (d -> (fb = first(d.b),)))
InferenceTimingNode: 0.068899/0.208943 on InferenceFrameInfo for Core.Compiler.Timings.ROOT() with 4 direct children

One of the reasons of the difference in inference is that for combine the following method is inferred:

 InferenceTiming: 0.022692/0.063976 on InferenceFrameInfo for DataFrames._combine_with_first(::NamedTuple{(:fb,), _A} where _A<:Tuple{Any}, #17::var"#17#18", ::GroupedDataFrame{DataFrame}, nothing::Nothing, Val{true}()::Val{true}, nothing::Nothing)

yet - it is never called with this signature (I have made sure this signature is never used). So we spend 0.044 second out of 0.208 total seconds on inference of a method that never gets called with such a signature.

We call a method with this signature:

 InferenceTiming: 0.018361/0.018361 on InferenceFrameInfo for DataFrames._combine_with_first(::NamedTuple{(:fb,), _A} where _A<:Tuple{Any}, #17::var"#17#18", ::GroupedDataFrame{DataFrame}, nothing::Nothing, Val{true}()::Val{true}, ::Vector{Int64})

(note that the signature is identical except the last argument). I have tried to do several things to disable compilation of unused method (doing @nospecialize, method splitting etc.) but I was unable to. I can change the signature of:

function _combine_with_first(first::Union{NamedTuple, DataFrameRow, AbstractDataFrame},
                             f::Base.Callable, gd::GroupedDataFrame,
                             incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
                             firstmulticol::Val, idx_agg::Union{Nothing, AbstractVector{<:Integer}})

to

function _combine_with_first(first::Union{NamedTuple, DataFrameRow, AbstractDataFrame},
                             f::Base.Callable, gd::GroupedDataFrame,
                             incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
                             firstmulticol::Val, idx_agg::AbstractVector{<:Integer})

(to remove the Nothing option), but the original implementation used Nothing as I thought such unions are "harmless" (apparently they are not).

@timholy - sorry for bothering you again, but do you see any method to solve the following issue? We have two methods:

f(::TypeA) = ...
f(::TypeB) = ...

now in calling code potentially values of both types TypeA or TypeB can be passed to f, but in practice only TypeA is passed given the arguments passed to the calling code. Is there a way to avoid compiling f for TypeB?

bkamins added grouping performance priority labels Nov 3, 2020

bkamins added this to the 1.0 milestone Nov 3, 2020

This was referenced Mar 4, 2021

Release 1.0 tracking #2640

Closed

Re-generate precompile statemenst before 1.0 release #2642

Closed

bkamins mentioned this issue Mar 30, 2021

Inference improvements #2691

Merged

bkamins closed this as completed in #2691 Apr 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency comparison between transform and combine with grouped data frame #2516

Latency comparison between transform and combine with grouped data frame #2516

pdeffebach commented Nov 3, 2020

bkamins commented Nov 3, 2020

pdeffebach commented Nov 3, 2020 •

edited

Loading

pdeffebach commented Nov 3, 2020

bkamins commented Nov 3, 2020

pdeffebach commented Nov 3, 2020

pdeffebach commented Nov 3, 2020

bkamins commented Nov 3, 2020

pdeffebach commented Nov 3, 2020

bkamins commented Mar 29, 2021

Latency comparison between transform and combine with grouped data frame #2516

Latency comparison between transform and combine with grouped data frame #2516

Comments

pdeffebach commented Nov 3, 2020

bkamins commented Nov 3, 2020

pdeffebach commented Nov 3, 2020 • edited Loading

pdeffebach commented Nov 3, 2020

bkamins commented Nov 3, 2020

pdeffebach commented Nov 3, 2020

pdeffebach commented Nov 3, 2020

bkamins commented Nov 3, 2020

pdeffebach commented Nov 3, 2020

bkamins commented Mar 29, 2021

pdeffebach commented Nov 3, 2020 •

edited

Loading