Skip to content

Commit

Permalink
Improve performance of by() using NamedTuples
Browse files Browse the repository at this point in the history
Remove GroupApplied and deprecate combine in favor of map(f, ::GroupedDataFrame). This avoids
storing a copy of the per-group data returned by the user-provided function. Take advantage of
this by allowing that function to return a NamedTuple. Introduce two completely different
code paths depending on whether the first returned object is DataFrame or a NamedTuple, as
the latter allows for more efficient operation by assuming that it represents a single row.
Use the same progressive eltype widening approach as Base.map so that we fill column vectors
whose types are known inside the kernel functions. This does not eliminate the type unstability
due to the fact that the user-provided function takes a DataFrame, but ensuring type stability for
half of the operations still improves performance significantly.

Also parameterize GroupedDataFrame on the type of data frame it wraps, and make its column index
have a concrete type. Deprecate an old map method for SubDataFrame. Fix a type unstability in hcat!.
  • Loading branch information
nalimilan committed Sep 20, 2018
1 parent eb21906 commit 671e69a
Show file tree
Hide file tree
Showing 8 changed files with 369 additions and 181 deletions.
2 changes: 1 addition & 1 deletion docs/src/lib/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ meltdf
```@docs
allowmissing!
categorical!
combine
map
completecases
deleterows!
describe
Expand Down
2 changes: 1 addition & 1 deletion src/DataFrames.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ module DataFrames
##############################################################################

using Reexport, StatsBase, SortingAlgorithms, Compat, Statistics, Unicode, Printf
using Base.Iterators
@reexport using CategoricalArrays, Missings
using Base.Sort, Base.Order

Expand All @@ -28,7 +29,6 @@ export AbstractDataFrame,
by,
categorical!,
colwise,
combine,
completecases,
deleterows!,
describe,
Expand Down
2 changes: 1 addition & 1 deletion src/dataframe/dataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -851,7 +851,7 @@ end

# definition required to avoid hcat! ambiguity
function hcat!(df1::DataFrame, df2::DataFrame; makeunique::Bool=false)
invoke(hcat!, Tuple{DataFrame, AbstractDataFrame}, df1, df2, makeunique=makeunique)
invoke(hcat!, Tuple{DataFrame, AbstractDataFrame}, df1, df2, makeunique=makeunique)::DataFrame
end

hcat!(df::DataFrame, x::AbstractVector; makeunique::Bool=false) =
Expand Down
5 changes: 5 additions & 0 deletions src/deprecated.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1370,3 +1370,8 @@ import Base: show
@deprecate showall(io::IO, df::GroupedDataFrame) show(io, df, allgroups=true)
@deprecate showall(df::GroupedDataFrame) show(df, allgroups=true)

import Base: map
@deprecate map(f::Function, sdf::SubDataFrame) f(sdf)

@deprecate combine(f::Function, gd::GroupedDataFrame) map(f, gd)
@deprecate combine(gd::GroupedDataFrame) map(identity, gd)
Loading

0 comments on commit 671e69a

Please sign in to comment.