`push!` which promotes type #1716

pdeffebach · 2019-02-11T17:56:04Z

Currently push!ing something to a DataFrame just calls push! on each individual vectors. This means that if a new row contains missing, it will throw an error.

Obviously the user can call allowmissing! on columns before they start using push!, but I wonder if we should have function that does this type promotion automatically.

Maybe we should deprecate push! and use only append! and vcat? Then have both functions work with NamedTuples, DataFrameRows etc.

The text was updated successfully, but these errors were encountered:

bkamins · 2019-02-11T18:02:15Z

Actually I like push! exactly because it does not do any magic 😄 - and it helps me catch bugs.
But we can have different functions with different behaviors.

Does append! do auto promotion (as I do not remember if it does)?

nalimilan · 2019-02-11T18:11:16Z

append! doesn't promote either. I've already wondered whether that was a good idea. The main advantage I can find is that if you push/append repeatedly, growing existing vectors is more efficient that making a copy (since push! grows exponentially to avoid copying on each call).

Related to #1695 (maybe even a duplicate).

pdeffebach · 2019-02-11T18:19:26Z

My impression is that append! is kind of like a performant vcat, and not promoting types is part of why it's performant.

Deprecating push! makes sense because I think that append!(v1::Vector, a::Any) covers all the behavior we would want from push!

The behavior of push! and append! for vectors differ for vectors of vectors.

t = Any[[1, 2], [3, 4]]
s = copy(t)

julia> push!(t, [5, 6])
3-element Array{Any,1}:
 [1, 2]
 [3, 4]
 [5, 6]

julia> append!(s, [5, 6])
4-element Array{Any,1}:
  [1, 2]
  [3, 4]
 5
 6

We don't have heterogenous element types in DataFrames (every element is a row) so there is not ambiguity about what to add.

bkamins · 2019-02-11T18:29:25Z

Actually I would leave push! as is because it does exactly what the contract for push! in Base specifies.

But append! could be extended to allow passing it an iterable and it would try to push the elements of this iterable as rows to the DataFrame. We do not support this fully now, but it would be consistent with the contract for append! in Base.

nalimilan · 2019-02-11T20:45:50Z

I agree we need to keep both append! and push! for consistency with Base, as they have very clear definitions. An example of a possible ambiguity between them would be, if we follow the Tables.jl approach, appending a vector of named tuples should add one row per tuple, while pushing it should add one row with named tuples as entries (not saying we need to support this, but that's a possibility).

pdeffebach · 2019-02-11T20:51:17Z

What about adding a method for vcat to accept a named tuple or even just plain vector? I want a way to add a new row without worrying about errors due to missing.

My intuition is that for missing-heavy data manipulation, like I work with consistently and which I would bet is the largest audience of DataFrames, it's reasonable to push people towards a super flexible vcat where they don't have to worry about types.

bkamins · 2019-02-11T21:01:55Z

while pushing it should add one row with named tuples as entries

This is exactly what we support now. And your example is general the reason why they need to stay different (we have the same duality in setindex! discussion in #1646)

appending a vector of named tuples should add one row per tuple

I am OK with this (again along the setindex! rule of thumb that if something could be converted to DataFrame using the constructor we could support it also without calling the constructor to get the same result)

@pdeffebach If I understand what @nalimilan said in #1695 correctly we could consider adding push!! and append!! methods that would perform autopromotion.

However, if it is only the case about missing then I think the reasonable thing is to say to people to use eltypes that allow missing (e.g. I guess this is the reason why CSV.jl does this by default) and learn allowmissing! in general.

nalimilan · 2019-02-11T21:20:52Z

What about adding a method for vcat to accept a named tuple or even just plain vector? I want a way to add a new row without worrying about errors due to missing.

Yes, why not. A vector of named tuples would have to be interpreted as several rows I guess (in practice I don't think it matters a lot).

@pdeffebach If I understand what @nalimilan said in #1695 correctly we could consider adding push!! and append!! methods that would perform autopromotion.

Actually that's the opposite. :-)

pdeffebach · 2019-02-11T21:22:54Z

Okay I will see if this works with my current open PR.

I wonder if all this stuff should be just pushed off to Tables.jl since it's about the relationship between arrays of named tuples and tables.

bkamins · 2019-02-11T21:28:10Z

Actually that's the opposite. :-)

So is your proposal to make push! to do autopromotion (and create a new vector) and push!! to mutate the vector in place? Then I guess we should list out what would have changed (but let us move this discussion to #1695), because I find it counter-intuitive as push! and append! in base do not perform autopromotion.

bkamins · 2019-07-26T20:10:21Z

This is what I propose:

leave append! and push! as is (as they conform to API specification form Base and the notion of row-orientation of data frames)
allow append! to take any Tables.jl conforming second argument (there is no ambiguity here)
allow vcat with first argument being a DataFrame to take any Tables.jl conforming second argument (similarly to append!)

If we are OK with this I can propose a PR doing this. Please let me know what you think.

bkamins mentioned this issue Feb 11, 2019

Policy regarding in-place operations #1695

Closed

bkamins added the non-breaking The proposed change is not breaking label Feb 12, 2020

bkamins added this to the 2.0 milestone Feb 12, 2020

kleinschmidt mentioned this issue Mar 11, 2020

add missing columns when push! ing? #2150

Closed

bkamins linked a pull request Mar 21, 2020 that will close this issue

allow :union as cols kwarg in push! and append! #2152

Merged

bkamins closed this as completed in #2152 Apr 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`push!` which promotes type #1716

`push!` which promotes type #1716

pdeffebach commented Feb 11, 2019

bkamins commented Feb 11, 2019

nalimilan commented Feb 11, 2019

pdeffebach commented Feb 11, 2019

bkamins commented Feb 11, 2019

nalimilan commented Feb 11, 2019

pdeffebach commented Feb 11, 2019

bkamins commented Feb 11, 2019

nalimilan commented Feb 11, 2019

pdeffebach commented Feb 11, 2019

bkamins commented Feb 11, 2019

bkamins commented Jul 26, 2019

push! which promotes type #1716

push! which promotes type #1716

Comments

pdeffebach commented Feb 11, 2019

bkamins commented Feb 11, 2019

nalimilan commented Feb 11, 2019

pdeffebach commented Feb 11, 2019

bkamins commented Feb 11, 2019

nalimilan commented Feb 11, 2019

pdeffebach commented Feb 11, 2019

bkamins commented Feb 11, 2019

nalimilan commented Feb 11, 2019

pdeffebach commented Feb 11, 2019

bkamins commented Feb 11, 2019

bkamins commented Jul 26, 2019

`push!` which promotes type #1716

`push!` which promotes type #1716