Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add broadcasting of AbstractDataFrame #1840

Merged
merged 30 commits into from
Jun 23, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
8bb061d
add broadcasting of AbstractDataFrame
bkamins Jun 8, 2019
6a6e954
switch to Tables.allocatecolumn
bkamins Jun 8, 2019
6f151ac
one more fix
bkamins Jun 8, 2019
628229b
revert similar and fix tests
bkamins Jun 8, 2019
8af3a0f
Apply suggestions from code review
bkamins Jun 9, 2019
c9a98bd
corrections after code review
bkamins Jun 9, 2019
5b4cc36
fix typo
bkamins Jun 9, 2019
24566e4
fix broadcasting assignment bug
bkamins Jun 20, 2019
2b82f79
fix SubDataFrame case
bkamins Jun 20, 2019
3810f75
add unaliasing of data frame against data frame
bkamins Jun 20, 2019
8e335f1
small fixes in legacy code
bkamins Jun 20, 2019
80a131e
optimized broadcasting
bkamins Jun 20, 2019
82e53a6
correct unaliasing
bkamins Jun 20, 2019
87206a2
small performance optimization
bkamins Jun 20, 2019
fba7cef
performance improvements
bkamins Jun 20, 2019
0e63fb8
add more broadcasting tests
bkamins Jun 20, 2019
bb4862b
more tests
bkamins Jun 20, 2019
5b8d2ec
Merge branch 'master' into new_dataframe_broadcasting
bkamins Jun 21, 2019
699cb6b
Merge branch 'master' into new_dataframe_broadcasting
bkamins Jun 21, 2019
6780b26
even more tests
bkamins Jun 21, 2019
432a530
getcolbc cleanup
bkamins Jun 21, 2019
3fdf733
fix after a code review
bkamins Jun 21, 2019
a67a3f5
unalias optimizations
bkamins Jun 21, 2019
5784100
more tests for common cases
bkamins Jun 21, 2019
b1813e1
improve helper signature
bkamins Jun 21, 2019
6a87c42
minor improvements
bkamins Jun 22, 2019
27c730a
minor improvements 2
bkamins Jun 22, 2019
9a68d30
Apply suggestions from code review
bkamins Jun 23, 2019
5d3ec40
fixes after code review
bkamins Jun 23, 2019
1f46086
Fix indentation
nalimilan Jun 23, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/src/lib/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ Under construction
The following broadcasting rules apply to `AbstractDataFrame` objects:
* `AbstractDataFrame` behaves in broadcasting like a two-dimensional collection compatible with matrices.
* If an `AbstractDataFrame` takes part in broadcasting then a `DataFrame` is always produced as a result.
In this case the requested broadcasting operation must produce exactly two dimensional object.
bkamins marked this conversation as resolved.
Show resolved Hide resolved
An exception is when `AbstractDataFrame` is used only as a source of broadcast assignment into an object
bkamins marked this conversation as resolved.
Show resolved Hide resolved
of dimensionality higher than two. In such a case this is allowed and a data frame is treated as two-dimensional
bkamins marked this conversation as resolved.
Show resolved Hide resolved
collection compatible with matrices.
* If multiple `AbstractDataFrame` objects take part in broadcasting then they have to have identical column names.

It is possible to assign a value to `AbstractDataFrame` and `DataFrameRow` objects using the `.=` operator.
Expand Down
28 changes: 27 additions & 1 deletion src/other/broadcasting.jl
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ function getcolbc(bcf::Base.Broadcast.Broadcasted{Style}, colind) where {Style}
end

function Base.copy(bc::Base.Broadcast.Broadcasted{DataFrameStyle})
ndim = length(axes(bc))
if ndim != 2
throw(DimensionMismatch("cannot broadcast a data frame into $ndim dimensions"))
end
bcf = Base.Broadcast.flatten(bc)
colnames = unique([_names(df) for df in bcf.args if df isa AbstractDataFrame])
if length(colnames) != 1
Expand Down Expand Up @@ -128,6 +132,29 @@ function Base.Broadcast.broadcast_unalias(dest::AbstractDataFrame, src)
src
end

function Base.Broadcast.broadcast_unalias(dest, src::AbstractDataFrame)
wascopied = false
for (i, col) in enumerate(eachcol(src))
if Base.mightalias(dest, col)
if src isa SubDataFrame
if !wascopied
src =SubDataFrame(copy(parent(src), copycols=false),
bkamins marked this conversation as resolved.
Show resolved Hide resolved
index(src), rows(src))
end
parentidx = parentcols(index(src), i)
parent(src)[parentidx] = Base.unaliascopy(parent(src)[parentidx])
else
if !wascopied
src = copy(src, copycols=false)
end
src[i] = Base.unaliascopy(col)
end
wascopied = true
end
end
src
end

function _broadcast_unalias_helper(dest::AbstractDataFrame, scol::AbstractVector,
src::AbstractDataFrame, col2::Int, wascopied::Bool)
# col1 can be checked till col2 point as we are writing broadcasting
Expand All @@ -143,7 +170,6 @@ function _broadcast_unalias_helper(dest::AbstractDataFrame, scol::AbstractVector
end
parentidx = parentcols(index(src), col2)
parent(src)[parentidx] = Base.unaliascopy(parent(src)[parentidx])

else
if !wascopied
src = copy(src, copycols=false)
Expand Down
61 changes: 61 additions & 0 deletions test/broadcasting.jl
Original file line number Diff line number Diff line change
Expand Up @@ -870,4 +870,65 @@ end
@test df == DataFrame(m)
end

@testset "data frame only on left hand side broadcasting assignment" begin
Random.seed!(1234)

m = rand(3,4);
m2 = copy(m);
m3 = copy(m);
df = DataFrame(a=view(m, :, 1), b=view(m, :, 1),
c=view(m, :, 1), d=view(m, :, 1), copycols=false);
df2 = copy(df)
mdf = Matrix(df)

@test m .+ df == m2 .+ df
@test Matrix(m .+ df) == m .+ mdf
@test sin.(m .+ df) .+ 1 .+ m2 == sin.(m2 .+ df) .+ 1 .+ m
@test Matrix(m .+ df ./ 2 .* df2) == m .+ mdf ./ 2 .* mdf

m2 .+= df .+ 1 ./ df2
m .+= df .+ 1 ./ df2
@test m2 == m
for col in eachcol(df)
@test col == m[:, 1]
end
for col in eachcol(df2)
@test col == m3[:, 1]
end

m = rand(3,4);
m2 = copy(m);
m3 = copy(m);
df = view(DataFrame(a=view(m, :, 1), b=view(m, :, 1),
c=view(m, :, 1), d=view(m, :, 1), copycols=false),
[3,2,1], :)
df2 = copy(df)
mdf = Matrix(df)

@test m .+ df == m2 .+ df
@test Matrix(m .+ df) == m .+ mdf
@test sin.(m .+ df) .+ 1 .+ m2 == sin.(m2 .+ df) .+ 1 .+ m
@test Matrix(m .+ df ./ 2 .* df2) == m .+ mdf ./ 2 .* mdf

m2 .+= df .+ 1 ./ df2
m .+= df .+ 1 ./ df2
@test m2 == m
for col in eachcol(df)
@test col == m[3:-1:1, 1]
end
for col in eachcol(df2)
@test col == m3[3:-1:1, 1]
end
end

@testset "broadcasting with 3-dimensional object" begin
y = zeros(4,3,2)
df = DataFrame(ones(4,3))
@test_throws DimensionMismatch df .+ y
@test_throws DimensionMismatch y .+ df
@test_throws DimensionMismatch df .+= y
y .+= df
@test all(y .== 1)
bkamins marked this conversation as resolved.
Show resolved Hide resolved
end

end # module