Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure aggregation uses fast path if a single column is used #2423

Closed
bkamins opened this issue Sep 11, 2020 · 4 comments
Closed

make sure aggregation uses fast path if a single column is used #2423

bkamins opened this issue Sep 11, 2020 · 4 comments
Labels
non-breaking The proposed change is not breaking performance
Milestone

Comments

@bkamins
Copy link
Member

bkamins commented Sep 11, 2020

I will fix it. Currently :x => sum uses a fast path, but [:x] => sum does not.

@bkamins bkamins added this to the 1.0 milestone Sep 11, 2020
@bkamins bkamins added the non-breaking The proposed change is not breaking label Sep 11, 2020
@bkamins bkamins mentioned this issue Mar 4, 2021
19 tasks
@sa-
Copy link
Contributor

sa- commented Mar 25, 2021

Looking into this. Pasting some of your Slack messages here in case the messages get deleted:

fast path is implemented here: https://github.com/JuliaData/DataFrames.jl/blob/main/src/groupeddataframe/fastaggregates.jl

10:30
In the fast path you do not have to calculate grouping views. You can just pre-allocate the target container (whose eltype you know upfront) and just iterate groups and perform reduction.

@bkamins
Copy link
Member Author

bkamins commented Mar 25, 2021

The key thing to do here is to rewrite code like:

@chain df begin
    groupby(:a)
    combine([:b] => sum)
end

to:

@chain df begin
    groupby(:a)
    combine(:b => sum)
end

before executing it.

We should check in the syntax src => fun and src => fun => dst if src is Union{AbstractVector{<:AbstractString}}, AbstractVector{Symbol}, AbstractVector{<:Union{Signed, Unsigned}} of length 1 and if this is so its first element should be fetched with only before further processing.

Or even better - check that src after preprocessing the column list using a standard mechanism of src preprocessing is a single element vector (it then should be Vector{Int} as this is what Index should ensure).

@bkamins bkamins modified the milestones: 1.0, 1.x Mar 25, 2021
sa- added a commit to sa-/DataFrames.jl that referenced this issue Apr 10, 2021
@sa-
Copy link
Contributor

sa- commented Apr 14, 2021

Can this be closed now?

@bkamins
Copy link
Member Author

bkamins commented Apr 14, 2021

yes

@bkamins bkamins closed this as completed Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-breaking The proposed change is not breaking performance
Projects
None yet
Development

No branches or pull requests

2 participants