Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcasting in DataFrames #1804

Merged
merged 32 commits into from
Jun 8, 2019
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
aaa9b92
broadcasting in DataFrames
bkamins May 6, 2019
1477b5b
add two new files to the repo
bkamins May 6, 2019
43d0e2e
fix copyto!
bkamins May 6, 2019
428e219
try to fix Julia 1.0 error
bkamins May 6, 2019
554ea3b
small fixes
bkamins May 7, 2019
bf2a5f1
new design after the review
bkamins May 10, 2019
2f27a3d
Update src/other/broadcasting.jl
bkamins May 11, 2019
9efdb29
Update src/other/broadcasting.jl
bkamins May 11, 2019
d7ccdb3
corrections after code review
bkamins May 11, 2019
8fd53ba
fix tests using removed RHS broadcasting
bkamins May 11, 2019
9216a1b
Apply suggestions from code review
bkamins May 14, 2019
1ab4bfd
changes after the code review
bkamins May 14, 2019
ed85d26
error fixes
bkamins May 14, 2019
a80527b
clean up tests
bkamins May 18, 2019
1023e3c
Update test/broadcasting.jl
bkamins May 20, 2019
3f8c8ed
Reword
nalimilan May 29, 2019
c9a4960
Apply suggestions from code review
bkamins May 29, 2019
974466f
improve tests
bkamins May 29, 2019
0ea04e9
Merge remote-tracking branch 'origin/broadcasting_dataframe' into bro…
bkamins May 29, 2019
06d86ce
Update test/broadcasting.jl
bkamins May 31, 2019
1ce5d2f
tests of single column matrix in broadcasting
bkamins May 31, 2019
09af7d9
performance improvements
bkamins May 31, 2019
e0db92e
improved copyto!
bkamins Jun 5, 2019
4917012
add another fast branch
bkamins Jun 5, 2019
9e9aaad
some additional tests
bkamins Jun 6, 2019
181c5fb
fix typo
bkamins Jun 6, 2019
0c6843b
use new categoricalarrays broadcasting support
bkamins Jun 7, 2019
e3c5d57
minor code improvements after the review
bkamins Jun 7, 2019
7cdee94
fix similar
bkamins Jun 7, 2019
1c11e70
Update src/other/broadcasting.jl
bkamins Jun 7, 2019
43a383a
up CatArrays version
bkamins Jun 7, 2019
0caa3ca
Merge branch 'master' into broadcasting_dataframe
bkamins Jun 7, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions docs/src/lib/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,3 +79,20 @@ For performance reasons, accessing, via `getindex` or `view`, a single `row` and
## `setindex!`

Under construction

## Broadcasting

It is possible to assign a value to `AbstractDataFrame` and `DataFrameRow` objects using the `.=` operator.
In such an operation `AbstractDataFrame` is considered as two-dimensional and `DataFrameRow` as single-dimensional.

!!! note

The rule above means that, similar to single-dimensional objects in Base (e.g. vectors),
`DataFrameRow` is considered to be column-oriented.

If column indexing using `Symbol` names is performed the order of columns in the operation is specified
by the order of names.

`df[col] .= value` is allowed when `col` is a `Symbol` even if `col` is not present in the `DataFrame`
under the condition that `df` is not empty: a new column will be created.
On the contrary, `df.col .= value` is not allowed if `col` is not present in `df`.
2 changes: 2 additions & 0 deletions src/DataFrames.jl
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ include("dataframerow/dataframerow.jl")
include("groupeddataframe/grouping.jl")
include("dataframerow/utils.jl")

include("other/broadcasting.jl")

include("abstractdataframe/iteration.jl")
include("abstractdataframe/join.jl")
include("abstractdataframe/reshape.jl")
Expand Down
72 changes: 72 additions & 0 deletions src/other/broadcasting.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
struct LazyNewColDataFrame
df::DataFrame
col::Symbol
end

# we allow LazyNewColDataFrame only for data frames with at least one column
Base.axes(x::LazyNewColDataFrame) = axes(x.df[1])

Base.maybeview(df::AbstractDataFrame, idxs...) = view(df, idxs...)

function Base.maybeview(df::AbstractDataFrame, idxs)
nalimilan marked this conversation as resolved.
Show resolved Hide resolved
if ncol(df) == 0
throw(ArgumentError("Broadcasting into a data frame with no columns is not allowed"))
end
if idxs isa Symbol
if !haskey(df, idxs)
bkamins marked this conversation as resolved.
Show resolved Hide resolved
if !(df isa DataFrame)
# this will throw an appropriate error message
df[idxs]
end
return LazyNewColDataFrame(df, idxs)
end
end
view(df, idxs)
end

function Base.copyto!(lazydf::LazyNewColDataFrame, bc::Base.Broadcast.Broadcasted)
if isempty(lazydf.df)
throw(ArgumentError("creating a column via broadcasting is not allowed on empty data frames"))
end
T = mapreduce(i -> typeof(bc[i]), promote_type, eachindex(bc); init=Union{})
bkamins marked this conversation as resolved.
Show resolved Hide resolved
col = Tables.allocatecolumn(T, nrow(lazydf.df))
copyto!(col, bc)
lazydf.df[lazydf.col] = col
end

function _copyto_helper!(dfcol::AbstractVector, bc::Base.Broadcast.Broadcasted, col::Int)
if axes(dfcol, 1) != axes(bc)[1]
# this should never happen unless data frame is corrupted (has unequal column lengths)
throw(ArgumentError("Dimension mismatch in broadcasting. " *
"The updated data frame is invalid and should not be used"))
end
@inbounds for row in eachindex(dfcol)
dfcol[row] = bc[CartesianIndex(row, col)]
end
end

function Base.copyto!(df::AbstractDataFrame, bc::Base.Broadcast.Broadcasted)
for col in axes(df, 2)
_copyto_helper!(df[col], bc, col)
end
df
end

function Base.copyto!(df::AbstractDataFrame, bc::Base.Broadcast.Broadcasted{<:Base.Broadcast.AbstractArrayStyle{0}})
# special case of fast approach when bc is providing an untransformed scalar
if bc.f === identity && bc.args isa Tuple{Any} && Base.Broadcast.isflat(bc)
bkamins marked this conversation as resolved.
Show resolved Hide resolved
for col in axes(df, 2)
fill!(df[col], bc.args[1][])
end
df
else
copyto!(df, convert(Broadcasted{Nothing}, bc))
bkamins marked this conversation as resolved.
Show resolved Hide resolved
end
end

function Base.copyto!(dfr::DataFrameRow, bc::Base.Broadcast.Broadcasted)
for I in eachindex(bc)
nalimilan marked this conversation as resolved.
Show resolved Hide resolved
dfr[I] = bc[I]
end
dfr
end
Loading