Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latency comparison between transform and combine with grouped data frame #2516

Closed
pdeffebach opened this issue Nov 3, 2020 · 9 comments · Fixed by #2691
Closed

Latency comparison between transform and combine with grouped data frame #2516

pdeffebach opened this issue Nov 3, 2020 · 9 comments · Fixed by #2691

Comments

@pdeffebach
Copy link
Contributor

julia> df = DataFrame(a = [1, 1, 1, 2, 2, 2], b = [1, 2, 3, 100, 200, 300]);

julia> gd = groupby(df, :b);

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  0.045820 seconds (27.15 k allocations: 1.378 MiB, 99.11% compilation time)

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  0.046607 seconds (27.14 k allocations: 1.378 MiB, 98.77% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.085262 seconds (211.92 k allocations: 12.443 MiB, 99.58% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.092077 seconds (211.93 k allocations: 12.444 MiB, 99.41% compilation time)

Side note: I think we are compiling the anonymous function anew for every group, which I can't quite prove but seems consistent with, say

julia> @time transform(df, (d -> (fb = first(d.b),)));
  0.028524 seconds (2.60 k allocations: 175.983 KiB, 99.14% compilation time)

julia> @time transform(df, (d -> (fb = first(d.b),)));
  0.013412 seconds (2.60 k allocations: 175.937 KiB, 98.18% compilation time)
@bkamins
Copy link
Member

bkamins commented Nov 3, 2020

Can you please check what happens on e.g. a data frame with 10^6 groups? (I do not have access to computer on which I can check it now).

The discrepancy between DataFrame and GroupedDataFrame is expected. I intentionally have a separate code for DataFrame as it is much easier so it can be compiled faster (the downside is that it is much harder to maintain it as we have two separate code bases that have to be checked to produce consistent results).

@pdeffebach
Copy link
Contributor Author

pdeffebach commented Nov 3, 2020

Wow combine is incredible! I'm not sure these results really speak to the problem at hand, but this is very impressive, and shows the need to more optimizations for transform with a grouped data frame

julia> df = DataFrame(a = rand(1:10^6, 10^8), b = randn(10^8));

julia> gd = groupby(df, :a);

julia> @time transform(gd, (d -> (fb = first(d.b),)));
 32.722459 seconds (312.20 M allocations: 11.534 GiB, 3.17% gc time, 0.63% compilation time)

julia> @time transform(gd, (d -> (fb = first(d.b),)));
 17.308288 seconds (312.03 M allocations: 10.763 GiB, 4.79% gc time, 0.24% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  2.061366 seconds (12.21 M allocations: 1.096 GiB, 18.91% gc time, 4.40% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  1.769705 seconds (12.21 M allocations: 1.096 GiB, 4.33% gc time, 5.64% compilation time)

EDIT: Nevermind, it's all allocation time

julia> @time combine(gd, (d -> (fb = fill(first(d.b), nrow(d)),)));
 33.456023 seconds (612.51 M allocations: 23.565 GiB, 11.19% gc time, 1.01% compilation time)

julia> @time combine(gd, (d -> (fb = fill(first(d.b), nrow(d)),)));
 34.060969 seconds (612.21 M allocations: 23.548 GiB, 11.24% gc time, 0.29% compilation time)

@pdeffebach
Copy link
Contributor Author

With a smaller number of rows we have

julia> df = DataFrame(a = rand(1:10^6, 10^6), b = randn(10^6));

julia> gd = groupby(df, :a);

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  1.201928 seconds (10.61 M allocations: 299.578 MiB, 20.88% gc time, 3.73% compilation time)

julia> @time transform(gd, (d -> (fb = first(d.b),)));
  1.049436 seconds (10.61 M allocations: 299.578 MiB, 2.67% gc time, 5.01% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  0.900990 seconds (7.80 M allocations: 216.306 MiB, 2.14% gc time, 9.89% compilation time)

julia> @time combine(gd, (d -> (fb = first(d.b),)));
  1.110158 seconds (7.80 M allocations: 216.303 MiB, 12.39% gc time, 7.92% compilation time)

@bkamins
Copy link
Member

bkamins commented Nov 3, 2020

Yes - but it shows that there is no dynamic dispatch per group. We simply need to nail down the biggest source of compilation cost.

@pdeffebach
Copy link
Contributor Author

It's coming from the groupby, also

julia> @time @pipe df |>
           @transform(_, y = 10 * :x) |>
           @where(_, :a .> 2) |>
           #groupby(_, :b) |>
           @transform(_, meanX = mean(:x), meanY = mean(:y)) |>
           @orderby(_, :meanX) |>
           @select(_, :meanX, :meanY, var = :b);
  0.054581 seconds (9.93 k allocations: 675.682 KiB, 98.58% compilation time)

julia> @time @pipe df |>
           @transform(_, y = 10 * :x) |>
           @where(_, :a .> 2) |>
           groupby(_, :b) |>
           @transform(_, meanX = mean(:x), meanY = mean(:y)) |>
           @orderby(_, :meanX) |>
           @select(_, :meanX, :meanY, var = :b);
  0.113268 seconds (59.50 k allocations: 3.096 MiB, 99.09% compilation time)

@pdeffebach
Copy link
Contributor Author

I think this last one is the most instructive because in theory there should be no compilation cost difference between the first and the second. I'm worried this has to do with transform in a grouped data frame being type unstable because of the ungroup option.

@bkamins
Copy link
Member

bkamins commented Nov 3, 2020

you can annotate the return value in DataFramesMeta.jl to check, but I think it should not matter.

@pdeffebach
Copy link
Contributor Author

Yeah that didn't help. This one is definitely mysterious.

@bkamins
Copy link
Member

bkamins commented Mar 29, 2021

This is tested on main:

julia> @snoopi_deep transform(gd, (d -> (fb = first(d.b),)))
InferenceTimingNode: 0.071197/0.166758 on InferenceFrameInfo for Core.Compiler.Timings.ROOT() with 3 direct children

julia> @snoopi_deep combine(gd, (d -> (fb = first(d.b),)))
InferenceTimingNode: 0.068899/0.208943 on InferenceFrameInfo for Core.Compiler.Timings.ROOT() with 4 direct children

One of the reasons of the difference in inference is that for combine the following method is inferred:

 InferenceTiming: 0.022692/0.063976 on InferenceFrameInfo for DataFrames._combine_with_first(::NamedTuple{(:fb,), _A} where _A<:Tuple{Any}, #17::var"#17#18", ::GroupedDataFrame{DataFrame}, nothing::Nothing, Val{true}()::Val{true}, nothing::Nothing)

yet - it is never called with this signature (I have made sure this signature is never used). So we spend 0.044 second out of 0.208 total seconds on inference of a method that never gets called with such a signature.

We call a method with this signature:

 InferenceTiming: 0.018361/0.018361 on InferenceFrameInfo for DataFrames._combine_with_first(::NamedTuple{(:fb,), _A} where _A<:Tuple{Any}, #17::var"#17#18", ::GroupedDataFrame{DataFrame}, nothing::Nothing, Val{true}()::Val{true}, ::Vector{Int64})

(note that the signature is identical except the last argument). I have tried to do several things to disable compilation of unused method (doing @nospecialize, method splitting etc.) but I was unable to. I can change the signature of:

function _combine_with_first(first::Union{NamedTuple, DataFrameRow, AbstractDataFrame},
                             f::Base.Callable, gd::GroupedDataFrame,
                             incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
                             firstmulticol::Val, idx_agg::Union{Nothing, AbstractVector{<:Integer}})

to

function _combine_with_first(first::Union{NamedTuple, DataFrameRow, AbstractDataFrame},
                             f::Base.Callable, gd::GroupedDataFrame,
                             incols::Union{Nothing, AbstractVector, Tuple, NamedTuple},
                             firstmulticol::Val, idx_agg::AbstractVector{<:Integer})

(to remove the Nothing option), but the original implementation used Nothing as I thought such unions are "harmless" (apparently they are not).

@timholy - sorry for bothering you again, but do you see any method to solve the following issue? We have two methods:

f(::TypeA) = ...
f(::TypeB) = ...

now in calling code potentially values of both types TypeA or TypeB can be passed to f, but in practice only TypeA is passed given the arguments passed to the calling code. Is there a way to avoid compiling f for TypeB?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants