Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrameRow: fix iterate, ndims, iterate #1637

Merged
merged 8 commits into from
Dec 18, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions src/dataframerow/dataframerow.jl
Original file line number Diff line number Diff line change
Expand Up @@ -80,21 +80,20 @@ index(r::DataFrameRow) = index(parent(r))
Base.size(r::DataFrameRow) = (size(parent(r), 2),)
Base.size(r::DataFrameRow, i) = size(r)[i]
Base.length(r::DataFrameRow) = size(parent(r), 2)
Base.ndims(r::DataFrameRow) = 1
Base.ndims(::Type{DataFrameRow{T}}) where {T<:AbstractDataFrame} = 1
bkamins marked this conversation as resolved.
Show resolved Hide resolved

Compat.lastindex(r::DataFrameRow) = size(parent(r), 2)
Base.lastindex(r::DataFrameRow) = size(parent(r), 2)

function Base.iterate(r::DataFrameRow)
Base.depwarn("iteration over DataFrameRow will return values in the future:" *
"use pairs(r::DataFrameRow) to get the current behavior",
:start)
iterate(r, 1)
end
Base.iterate(r::DataFrameRow) = iterate(r, 1)

function Base.iterate(r::DataFrameRow, st)
st > length(r) && return nothing
return (r[st], st + 1)
end

Base.eltype(r::DataFrameRow) = reduce(typejoin, eltypes(parent(r)))
nalimilan marked this conversation as resolved.
Show resolved Hide resolved

Base.convert(::Type{Array}, dfr::DataFrameRow) =
[dfr[i] for j in 1:1, i in 1:length(dfr)]
Base.convert(::Type{Vector}, dfr::DataFrameRow) =
Expand All @@ -104,7 +103,6 @@ Base.convert(::Type{Vector{T}}, dfr::DataFrameRow) where T =
Base.Vector(dfr::DataFrameRow) = convert(Vector, dfr)
Base.Vector{T}(dfr::DataFrameRow) where T = convert(Vector{T}, dfr)


Base.keys(r::DataFrameRow) = names(parent(r))
Base.values(r::DataFrameRow) = ntuple(col -> parent(r)[col][row(r)], length(r))

Expand Down
3 changes: 0 additions & 3 deletions src/deprecated.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1328,9 +1328,6 @@ import Base: vcat
@deprecate showcols(df::AbstractDataFrame, all::Bool=false, values::Bool=true) describe(df, stats = [:eltype, :nmissing, :first, :last])
@deprecate showcols(io::IO, df::AbstractDataFrame, all::Bool=false, values::Bool=true) show(io, describe(df, stats = [:eltype, :nmissing, :first, :last]), all)

import Base: collect
@deprecate collect(r::DataFrameRow) collect(pairs(r))

import Base: show
@deprecate show(io::IO, df::AbstractDataFrame, allcols::Bool, rowlabel::Symbol, summary::Bool) show(io, df, allcols=allcols, rowlabel=rowlabel, summary=summary)
@deprecate show(io::IO, df::AbstractDataFrame, allcols::Bool, rowlabel::Symbol) show(io, df, allcols=allcols, rowlabel=rowlabel)
Expand Down
24 changes: 23 additions & 1 deletion test/dataframerow.jl
Original file line number Diff line number Diff line change
Expand Up @@ -89,14 +89,21 @@ module TestDataFrameRow
r = DataFrameRow(df, 1)
@test r[:] == r

# keys, values and iteration
# keys, values and iteration, size
@test keys(r) == names(df)
@test values(r) == (df[1, 1], df[1, 2], df[1, 3], df[1, 4])
@test collect(pairs(r)) == [:a=>df[1, 1], :b=>df[1, 2], :c=>df[1, 3], :d=>df[1, 4]]

@test haskey(r, :a)
@test !haskey(r, :zzz)

@test length(r) == 4
@test ndims(r) == 1
@test ndims(typeof(r)) == 1
@test size(r) == (4,)
@test size(r, 1) == 4
@test_throws BoundsError size(r, 2)

df = DataFrame(a=nothing, b=1)
io = IOBuffer()
show(io, DataFrameRow(df, 1))
Expand All @@ -116,4 +123,19 @@ module TestDataFrameRow
c=["A", "B", "C", "A", "B", missing])
@test copy(DataFrameRow(df, 1)) == (a = 1, b = 2.0, c = "A")
@test isequal(copy(DataFrameRow(df, 2)), (a = 2, b = missing, c = "B"))

# iteration and collect
ref = ["a", "b", "c"]
df = DataFrame(permutedims(ref))
dfr = df[1, :]
@test collect(dfr) == ref
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test the type? When you say the eltype is Any, I guess this applies only to heterogeneous column types, right? It would be useful to cover that case too.

(BTW, Vector doesn't promote AFAICT, as it uses a comprehension.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it and test it. Actually it should be narrowest type (as in comprehensions).

What is problematic (and confused me) that in Matrix(df) we use T = reduce(promote_type, eltypes(df)) and probably we should use T = reduce(typejoin, eltypes(df)). Should I fix it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added eltype for DataFrameRow and now it will be as you wanted (and it is OK to add it given the interface specification).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion at #1641. Maybe we should use promotion here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed 😄 in #1641, if someone wants a promotion to T one should use Vector{T}(dfr) in my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we use promote_type here too for consistency with #1641?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that this was exactly the point we wanted to have a difference here.

Essentially this is the same as:

julia> collect((1, 1.0, true))
3-element Array{Real,1}:
    1
    1.0
 true

julia> [v for v in (1, 1.0, true)]
3-element Array{Real,1}:
    1
    1.0
 true

But I am not too attached to what we have now. But it we want to do the promotion here we should define eltype (and this is exactly what I have removed some 2 commits ago 😄).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should keep collect as-is for consistency with tuples, but have convert use promotion to be consistent with data frames?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we do. But it is implemented in #1640.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, too many PRs... :-p

@test eltype(collect(dfr)) === String
for (v1, v2) in zip(ref, dfr)
@test v1 == v2
end
for (i, v) in enumerate(dfr)
@test v == ref[i]
end
dfr = DataFrame(a=1, b=true, c=1.0)[1,:]
@test eltype(collect(dfr)) === Real
end