Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve inference in vcat #2559

Merged
merged 6 commits into from
Nov 25, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 15 additions & 8 deletions src/abstractdataframe/abstractdataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -1547,12 +1547,17 @@ Base.vcat(dfs::AbstractDataFrame...;
AbstractVector{<:AbstractString}}=:setequal) =
reduce(vcat, dfs; cols=cols)

Base.reduce(::typeof(vcat),
dfs::Union{AbstractVector{<:AbstractDataFrame},
Tuple{Vararg{AbstractDataFrame}}};
function Base.reduce(::typeof(vcat),
dfs::Union{AbstractVector{DF},
Tuple{Vararg{DF}}};
cols::Union{Symbol, AbstractVector{Symbol},
AbstractVector{<:AbstractString}}=:setequal) =
_vcat([df for df in dfs if ncol(df) != 0]; cols=cols)
AbstractVector{<:AbstractString}}=:setequal) where DF<:AbstractDataFrame
if @isdefined(DF) && isconcretetype(DF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why calling @isfefined(DF) is required here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple inputs with 2 or more different types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then isconcretetype(DF) should return false - right? (or would it error)

Copy link
Contributor Author

@timholy timholy Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I take the @isdefined check out and preface that conditional with a @show typeof(dfs), here's what I get from running the cat.jl test:

julia> include("cat.jl")
[ Info: Precompiling DataFrames [a93c6f00-e57d-5684-b7b6-d8193f3e46c0]
Test Summary: | Pass  Total
hcat          |   10     10
Test Summary: | Pass  Total
hcat: copying |   26     26
Test Summary:            | Pass  Total
hcat ::AbstractDataFrame |    2      2
Test Summary:            | Pass  Total
hcat ::AbstractDataFrame |    2      2
Test Summary:          | Pass  Total
hcat ::AbstractVectors |   11     11
Test Summary:  | Pass  Total
hcat: copycols |   76     76
typeof(dfs) = Tuple{DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = NTuple{4, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.DataFrame, DataFrames.DataFrame}
typeof(dfs) = Tuple{DataFrames.SubDataFrame{DataFrames.DataFrame, DataFrames.Index, UnitRange{Int64}}, DataFrames.SubDataFrame{DataFrames.DataFrame, DataFrames.SubIndex{DataFrames.Index, Vector{Int64}, Vector{Int64}}, UnitRange{Int64}}}
vcat: Error During Test at /home/tim/.julia/dev/DataFrames/test/cat.jl:269
  Test threw exception
  Expression: vcat(view(df, 1:2, :), view(df, 3:5, [3, 2, 1])) == df
  UndefVarError: DF not defined
  Stacktrace:
   [1] reduce(::typeof(vcat), dfs::Tuple{DataFrames.SubDataFrame{DataFrames.DataFrame, DataFrames.Index, UnitRange{Int64}}, DataFrames.SubDataFrame{DataFrames.DataFrame, DataFrames.SubIndex{DataFrames.Index, Vector{Int64}, Vector{Int64}}, UnitRange{Int64}}}; cols::Symbol)
     @ DataFrames ~/.julia/dev/DataFrames/src/abstractdataframe/abstractdataframe.jl:1556
...

So as soon as it gets a heterogeneous tuple (one has an Index and the other a SubIndex), it throws an UndefVarError.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what I do not understand is the fact that you use DF <: AbstractDataFrame (instead of the original implementation). And if I try to do a MWE I get:

julia> using DataFrames

julia> f(::Tuple{Vararg{DF}}) where {DF<:AbstractDataFrame} = "matched"
f (generic function with 1 method)

julia> df = DataFrame(a=1,b=2)
1×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      2

julia> df1 = @view df[:, :]
1×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      2

julia> df2 = @view df[:, 1:1]
1×1 SubDataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1

julia> f((df, df))
"matched"

julia> f((df1, df1))
"matched"

julia> f((df2, df2))
"matched"

julia> f((df, df1))
ERROR: MethodError: no method matching f(::Tuple{DataFrame,SubDataFrame{DataFrame,DataFrames.Index,Base.OneTo{Int64}}})

julia> f((df1, df2))
ERROR: MethodError: no method matching f(::Tuple{SubDataFrame{DataFrame,DataFrames.Index,Base.OneTo{Int64}},SubDataFrame{DataFrame,DataFrames.SubIndex{DataFrames.Index,UnitRange{Int64},UnitRange{Int64}},Base.OneTo{Int64}}})

julia> h(::Tuple{Vararg{AbstractDataFrame}}) = "matched"
h (generic function with 1 method)

julia> h((df, df))
"matched"

julia> h((df, df1))
"matched"

julia> h((df2, df1))
"matched"

so it seems that the pattern Tuple{Vararg{DF}} where DF <: AbstractDataFrame is not the same as the original and intended Tuple{Vararg{AbstractDataFrame}}

Copy link
Contributor Author

@timholy timholy Nov 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this seems to exploit a relatively recent & subtle change in how type parameters. Basically the Tuple{Vararg{DF}} is requiring a single concrete value to set DF, but signature matching is the same as <:AbstractDataFrame. EDIT: but your example shows the matching is incomplete.

Another way we could do this is nominal-runtime:

DF = typeof(first(dfs))
if isconcretetype(DF) && all(df->typeof(df) === DF, dfs) ... end

Do you like that better?

My workflow for these things isn't very deliberative; I typically write it first without the @isdefined, and then when I get an error like the one above I add @isdefined. If you grep this in Base you'll see quite a few similar uses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear, now I see your point with the two different dfs. Very interesting. I'll do the runtime version perhaps.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate AbstractVector and Tuple implementations then.
Also for Tuple please make sure that in the run-time version a case when the tuple is empty is correctly handled.

I have now noticed a subtle bug that should be fixed:

julia> reduce(vcat, ())
ERROR: ArgumentError: reducing over an empty collection is not allowed

julia> using DataFrames

julia> reduce(vcat, ())
0×0 DataFrame

To be decided if we keep:

julia> reduce(vcat, AbstractDataFrame[])
0×0 DataFrame

or also error here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is weird. It does match in situ, even though your example fails for me. 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show me the output of the same commands in the fresh Julia session (and what Julia are you using I am on 1.5.3). Thank you! (extending functions from Base is hard)

return _vcat(DF[df for df in dfs if ncol(df) != 0]; cols=cols)
else
return _vcat(AbstractDataFrame[df for df in dfs if ncol(df) != 0]; cols=cols)
end
end

function _vcat(dfs::AbstractVector{<:AbstractDataFrame};
cols::Union{Symbol, AbstractVector{Symbol},
Expand Down Expand Up @@ -1586,13 +1591,15 @@ function _vcat(dfs::AbstractVector{<:AbstractDataFrame};

if !isempty(coldiff)
# if any DataFrames are a full superset of names, skip them
filter!(u -> !issetequal(u, header), uniqueheaders)
let header=header # julia #15276
filter!(u -> !issetequal(u, header), uniqueheaders)
end
estrings = map(enumerate(uniqueheaders)) do (i, head)
matching = findall(h -> head == h, allheaders)
headerdiff = setdiff(coldiff, head)
cols = join(headerdiff, ", ", " and ")
badcols = join(headerdiff, ", ", " and ")
args = join(matching, ", ", " and ")
return "column(s) $cols are missing from argument(s) $args"
return "column(s) $badcols are missing from argument(s) $args"
end
throw(ArgumentError(join(estrings, ", ", ", and ")))
end
Expand Down
2 changes: 1 addition & 1 deletion src/other/index.jl
Original file line number Diff line number Diff line change
Expand Up @@ -422,7 +422,7 @@ Base.@propagate_inbounds SubIndex(parent::AbstractIndex, cols) =
SubIndex(parent, parent[cols])

Base.length(x::SubIndex) = length(x.cols)
Base.names(x::SubIndex) = string.(_names(x))
Base.names(x::SubIndex) = string.(_names(x))::Vector{String}
bkamins marked this conversation as resolved.
Show resolved Hide resolved
_names(x::SubIndex) = view(_names(x.parent), x.cols)

function Base.haskey(x::SubIndex, key::Symbol)
Expand Down