Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the way grouped transforms work #101

Merged
merged 25 commits into from
Oct 18, 2018
Merged
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
1965c7b
Change the way grouped transforms work
pdeffebach Sep 17, 2018
15df9ff
Return to using promote_type for differing types
pdeffebach Sep 26, 2018
9c95239
Respond to Milan 2
pdeffebach Sep 28, 2018
1894c86
add length check for every iteration
pdeffebach Sep 28, 2018
e48c355
Fix indentation and add test
pdeffebach Sep 28, 2018
2226c96
Fix so that copyto! doesn't allocate
pdeffebach Sep 28, 2018
e08207b
Fix spacing and indentation
nalimilan Sep 28, 2018
bc72eb9
Test for function with different return types
pdeffebach Sep 28, 2018
9ee9d22
Merge branch 'group-transform-overhaul' of https://github.com/pdeffeb…
pdeffebach Sep 28, 2018
b39ab11
Do `using Tables` to fix CI failure
nalimilan Sep 28, 2018
105d846
More checks, better tests. Still catarray issue
pdeffebach Sep 29, 2018
0298c3d
Merge branch 'group-transform-overhaul' of https://github.com/pdeffeb…
pdeffebach Sep 29, 2018
e87647d
Change Ref to fill, more tests
pdeffebach Sep 30, 2018
3ae01a0
Fix tests, add @views
pdeffebach Sep 30, 2018
0db98f1
More test fixes. @inbounds, and throw arg error
pdeffebach Oct 1, 2018
cf79e39
Add type check for CategoricalVector
pdeffebach Oct 1, 2018
2dea7c4
Remove unneeded CategoricalArray call
nalimilan Oct 2, 2018
9a4d5c1
Fix throw
nalimilan Oct 2, 2018
26cac9c
Make everything recursive
pdeffebach Oct 2, 2018
28a775b
Improve recursion, need some comments
pdeffebach Oct 3, 2018
6839b9b
Do recursive correctly following #1520
pdeffebach Oct 16, 2018
dfc4a2f
Respond to recursion comments
pdeffebach Oct 16, 2018
df6503a
add @noinline, fix some names
pdeffebach Oct 17, 2018
bdf4a79
fix line breaks
pdeffebach Oct 17, 2018
6aea278
Final fixes
pdeffebach Oct 18, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 40 additions & 39 deletions src/DataFramesMeta.jl
Original file line number Diff line number Diff line change
Expand Up @@ -405,9 +405,29 @@ end

function _transform!(t::AbstractVector, first::AbstractVector, start::Int,
g::GroupedDataFrame, v::Function, starts::Vector, ends::Vector)
@inline function fill_column_vec!(t::AbstractVector, out, startpoint::Int, endpoint::Int, len::Int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean when I said you could move the code inside the function is that if you use @inline, you can just drop the function barrier and put the code directly in the parent function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should probably still be a function, since we call it twice per _transform! function. Once for the first case and once for the rest. We need to call it twice because we have to compute first in transform(::GroupedDataFrame,...) so not treating the first case separately would require calculating first twice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Should be OK as-is then.

if !(out isa AbstractVector)
throw(ArgumentError("Return value must be an `AbstractVector` for all groups or" *
"for none of them"))
elseif length(out) != len
throw(ArgumentError("If a function returns a vector, the result " *
"must have the same length as the groups it operates on"))
end
elout = eltype(out)
T = eltype(t)
newtype = promote_type(elout, T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can make things slightly faster by moving this call after || below, that the type is only computed when it's needed. You could even try adding elout === T || as a first condition in case it's faster (not sure).

Also elout sounds like it's an element, not a type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow thats really smart that julia allows for that. But alas there isn't any speed gain from this:

        if typout <: T || (newtype = promote_type(typout, T)) <: T
            t[startpoint:endpoint] .= Ref(out)
            return nothing
        else 
            return newtype
        end

Copy link
Member

@nalimilan nalimilan Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, too bad.

if elout <: T || newtype <: T
t[startpoint:endpoint] = out
return nothing
else
return newtype
end
return nothing
end

# handle the first case
newtype = fill_column_vec!(t, first, starts[start], ends[start], size(g[start], 1))
@assert newtype === nothing
newtype_first = fill_column_vec!(t, first, starts[start], ends[start], size(g[start], 1))
#@assert newtype_first === nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect this check to be costly. Is it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry. no its not.

@inbounds for i in (start+1):length(g)
out = v(g[i])
newtype = fill_column_vec!(t, out, starts[i], ends[i], size(g[i], 1))
Expand All @@ -422,9 +442,24 @@ end

function _transform!(t::AbstractVector, first::Any, start::Int,
g::GroupedDataFrame, v::Function, starts::Vector, ends::Vector)
@inline function fill_column_any!(t::AbstractVector, out, startpoint::Int, endpoint::Int)
if out isa AbstractVector
throw(ArgumentError("Return value must be an `AbstractVector` for all groups or" *
"for none of them"))
end
typout = typeof(out)
T = eltype(t)
newtype = promote_type(typout, T)
if typout <: T || newtype <: T
t[startpoint:endpoint] .= Ref(out)
return nothing
else
return newtype
end
end
# handle the first case
newtype = fill_column_any!(t, first, starts[start], ends[start])
@assert newtype === nothing
newtype_first = fill_column_any!(t, first, starts[start], ends[start])
#@assert newtype_first === nothing
@inbounds for i in (start+1):length(g)
out = v(g[i])
newtype = fill_column_any!(t, out, starts[i], ends[i])
Expand All @@ -435,42 +470,8 @@ function _transform!(t::AbstractVector, first::Any, start::Int,
end
end
return t
end

function fill_column_vec!(t::AbstractVector, out, startpoint::Int, endpoint::Int, len::Int)
if !(out isa AbstractVector)
throw(ArgumentError("Return value must be an `AbstractVector` for all groups or" *
"for none of them"))
elseif length(out) != len
throw(ArgumentError("If a function returns a vector, the result " *
"must have the same length as the groups it operates on"))
end
elout = eltype(out)
T = eltype(t)
newtype = promote_type(elout, T)
if elout <: T || newtype <: T
t[startpoint:endpoint] = out
return nothing
else
return newtype
end
end

function fill_column_any!(t::AbstractVector, out, startpoint::Int, endpoint::Int)
if out isa AbstractVector
throw(ArgumentError("Return value must be an `AbstractVector` for all groups or" *
"for none of them"))
end
typout = typeof(out)
T = eltype(t)
newtype = promote_type(typout, T)
if typout <: T || newtype <: T
t[startpoint:endpoint] .= Ref(out)
return nothing
else
return newtype
end
end
end

function transform_helper(x, args...)
quote
Expand Down