Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions taking collections of column names always require them to be in AbstractVectors #1769

Closed
oxinabox opened this issue Apr 8, 2019 · 11 comments
Labels
non-breaking The proposed change is not breaking
Milestone

Comments

@oxinabox
Copy link
Contributor

oxinabox commented Apr 8, 2019

For example: disallow_missing,

function disallowmissing!(df::DataFrame, cols::AbstractVector{<: ColumnIndex}=1:size(df, 2))

I wanted to use a tuple of column names,
because tuples are immutable, and have much better performance,
and I knew all the columns when i wrote the code.

cc: @glennmoy

@bkamins
Copy link
Member

bkamins commented Apr 8, 2019

We have recently switched DataFrame constructor to accept tuples in some cases. I guess we could widen this approach to other functions. Let us keep this open to be decided before 1.0 release.

The only question is whether it should be AbstractVector or Tuple or a general iterator and we dynamically check what eltype it has (in DataFrame constructor we strictly require AbstractVector or a Tuple).

@oxinabox
Copy link
Contributor Author

oxinabox commented Apr 8, 2019

I think it is fine to do a general iterator, and implictly check the type.
It is not like we would ever want to dispatch on this, would we?

@bkamins
Copy link
Member

bkamins commented Apr 8, 2019

@nalimilan - we need a general decision here (I am OK with accepting a general iterator and doing a type check, which will be tricky but doable). The downside is that the code will not be type stable (but I think it will not be the end of the world as DataFrames.jl are not type stable anyway).

@nalimilan
Copy link
Member

I agree we should support any iterator where possible (even if there won't probably be any performance gain). In the present case we could probably just remove the type annotation and rely on the fact that the one for ::ColumnIndex is more specific and will therefore be called when passed a symbol or integer, right?

@bkamins
Copy link
Member

bkamins commented Apr 8, 2019

The trick is that normally we require homogeneous type in other methods (or in short: that symbols and numbers are not mixed). That is why I have said that we have do do a type check anyway.

I think we should retain this guarantee. I do not see much sense to allow selection of [1, :some_col, 2, :some_other_col. Agreed?

@bkamins
Copy link
Member

bkamins commented Apr 8, 2019

Although we currently allow it for AbstractVector with the signature we have now (but we could clean it up if we make the change how the method works).

@nalimilan
Copy link
Member

Yeah, but it sounds weird to make the code much more complex just to prevent that. With some functions things can go wrong if duplicates are passed, but that's not the case here.

@bkamins
Copy link
Member

bkamins commented Apr 8, 2019

OK (in normal code it will probably not happen anyway).

@bkamins bkamins added the non-breaking The proposed change is not breaking label Feb 12, 2020
@bkamins bkamins added this to the 2.0 milestone Feb 12, 2020
@bkamins
Copy link
Member

bkamins commented Mar 21, 2020

As discussed in #2156 we can start allowing passing tuple instead of a vector. Just note that you should not expect a huge performance gain here (anything that happens with the selector will allocate much more and take much more time) - so this would be just a convenience functionality to allow tuples. Is the request then still valid, or vectors are enough?

@oxinabox
Copy link
Contributor Author

I guess there is always static vectors.
Since this is non-breaking its not worth thinking about this til after 1.0

@bkamins
Copy link
Member

bkamins commented Dec 4, 2022

This is now supported with Cols, e.g. Cols(:a, :b, :c).

@bkamins bkamins closed this as completed Dec 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-breaking The proposed change is not breaking
Projects
None yet
Development

No branches or pull requests

3 participants