-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Tables.jl interface for DataFrame(Rows|Columns) #2055
Conversation
In general - this should not be problematic to add these definitions. Can you please elaborate in what cases you would find this useful (the issue is that you can always call |
Thanks for the review!
I need this interface to make this test for I can't use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thank you. Let us just wait for @quinnj to confirm that all is OK with the usage of Tables.jl interface.
@@ -182,7 +182,7 @@ end | |||
|
|||
df2 = DataFrame!(eachrow(df)) | |||
@test df == df2 | |||
@test !any(((a,b),) -> a === b, zip(eachcol(df), eachcol(df2))) | |||
@test all(((a,b),) -> a === b, zip(eachcol(df), eachcol(df2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR, df2 = DataFrame!(eachrow(df))
does not copy columns any more. Is it an OK change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that it is even better. @nalimilan - are you OK with this?
If someone writes DataFrame!
an explicit opt-out from copying, if possible, is assumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense.
src/other/tables.jl
Outdated
@@ -48,6 +48,18 @@ DataFrame!(x::Vector{<:NamedTuple}) = | |||
"`$(typeof(x))` without allocating new columns: use " * | |||
"`DataFrame(x)` instead")) | |||
|
|||
for T in [DataFrameRows, DataFrameColumns] | |||
@eval begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need a loop? Isn't ::Union{DataFrameRows, DataFrameColumns}
enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I hadn't seen @bkamins's comment above. I'd just repeat the Union
without defining a custom type alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with both approaches. I wrote this as @bkamins preferred this approach. Ref: #2055 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with both - the @eval approach is used in Base often. But using Union
without defining the alias is also OK (I just prefer not to introduce the alias here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does 22e32db look good?
@@ -182,7 +182,7 @@ end | |||
|
|||
df2 = DataFrame!(eachrow(df)) | |||
@test df == df2 | |||
@test !any(((a,b),) -> a === b, zip(eachcol(df), eachcol(df2))) | |||
@test all(((a,b),) -> a === b, zip(eachcol(df), eachcol(df2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, makes sense.
Shouldn't |
My understanding is that |
Maybe it doesn't have to, but should it? Or why shouldn't it? :-) |
Actually |
Well |
As I have commented - whether it makes sense depends on the contract @quinnj wants |
@bkamins I just noticed your comment after writing a patch to do (something like) The reason why I thought returning a |
src/other/tables.jl
Outdated
Tables.materializer(itr::DataFrameRows) = | ||
eachrow ∘ prefer_singleton_callable(Tables.materializer(parent(itr))) | ||
Tables.materializer(itr::DataFrameColumns) = | ||
eachcol ∘ prefer_singleton_callable(Tables.materializer(parent(itr))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @quinnj comments that we should return here DataFrameColumns
and not DataFrame
then we should inherit from itr
if it was created with names
positional argument set to true
or false
.
We don't currently have strict requirements on These changes seem fine by me, though I will note that I recently tried to use Anyway, I'm good w/ this. |
It cannot because it inherits from |
Actually, can we go back to
@bkamins I just realized that your point totally makes sense. I needed something I can Sorry, I should've tried to re-implement JuliaFolds/Transducers.jl#107 after the API was changed... |
It is a simple rule: both objects are So I understand the change should be made to make Also just to confirm. This new |
I guess returning a |
The "rule" sounds arbitrary to me, in the sense that you made it so you can change it. I can't find any explicit API contracts defined for |
I see your point, so let us wait for other to comment about their preference. My thinking was that Also note that |
@quinnj Do you think |
I think |
Yeah it's a tough choice. It's not ideal that if you passed a table which iterates rows ( |
@bkamins @nalimilan @quinnj Thanks for the discussion! I opened #2058 |
I would prefer in the future to squash-merge PRs, as otherwise it is hard for me to write appropriate release notes. Thank you! |
Huh......I think I did a rebase-merge on another repo and then it must remember your latest preference regardless of repo? I almost always squash, but there was a case on another repo where I wanted to rebase-merge and then I must have just hit the default here. Sorry about that. |
This PR adds Tables.jl interface for
DataFrameRows
andDataFrameColumns
. It is useful for defining data manipulation functions expecting iterators. Example: