-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add select, select! and deletecols #1772
Conversation
I get a strange error on Julia 0.7 - for some reason DataFrames.jl does not recognize the |
OK - I see why this fails on Julia 0.7. It defined (although deprecated) @nalimilan - so the decision is if we want to keep supporting 0.7 (I would drop it), then I simply change the REQUIRE and testset specification. If you think we should keep supporting 0.7 could you hint what is a proper way to handle such a case using Compat.jl (especially as old |
Thanks. I'm hesitant about whether The advantage of returning a data frame is that
Let's drop 0.7, it doesn't seem very useful at this point. |
Co-Authored-By: bkamins <[email protected]>
That is why I like to make a PR as then things are on the table. My position is that we should return a
@piever - could you please comment on the I also envision that in the long run Regarding dropping 0.7 - I will make a separate PR (to keep things atomic) and then we can rebase this one (as probably we need some more discussion on this PR with the community). |
Regarding
|
In the IndexedTables case, I think design-wise IndexedTables generally prefers named tuples to many keyword arguments, for example I'd tend to agree that the trade off is that if one wants the "many args or many kwargs" version, than I'll list what I believe may be reasons for the current IndexedTables design:
julia> t = table((a=1:10, b=rand(10)));
julia> select(t, (:a, :b => log, :c => rand(10)))
Table with 10 rows, 3 columns:
a b c
────────────────────────
1 -2.02031 0.854743
2 -0.67662 0.726542
3 -0.318742 0.793672
4 -1.0202 0.0611947
5 -0.54456 0.987932
6 -0.312883 0.766104
7 -0.627879 0.0454886
8 -1.12356 0.907651
9 -1.42856 0.864524
10 -1.23728 0.734384
But I agree that these rules require some getting used to and maybe DataFrames user prefer a simpler: |
@piever - thank you for a detailed explanation (actually many of these things (or similar) in the long run should land in However, as you have commented at the end - I feel that having a guarantee that |
Thanks @piever. I see the logic behind the design of At any rate, it would be nice if we could have a least one common method to access a column vector in JuliaDB and DataFrames. If that's not |
We chose |
The simplest JuliaDB syntax for selecting just one column is I'm not sure |
We could support But |
I am ok to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with that then unless somebody objects.
I will wait a bit for comments and add |
I have added it to show the changes (if we decide to drop it we can remove last 2 commits) |
In terms of "working with columns" API, JuliaDB uses the singular ( |
Actually we chose the plural at #1514 (comment) (and following comments), and we already use it for |
Given the discussion in #1695 we might drop adding |
Yeah, let's merge this PR without |
Reading https://discourse.julialang.org/t/common-api-for-tabular-data-backends/21546 again, I recalled that Then these three functions would cover all interesting combinations of these two dimensions: 1) keep or drop remaining columns, 2) recycle scalars or not. The missing function is the one which keeps columns but doesn't recycle, which is the one which doesn't make sense since remaining columns have by definition one value per row. Does that make sense? |
@nalimilan - could you please expand on this, as I am not 100% clear which functions in which use cases you mean. |
31c35ed
to
7686bfd
Compare
OK - I have reverted the |
I was referring to #1727 (comment). |
So you mean the triplet |
If there will be no more comments on this I am going to merge this PR. |
A follow up of #1753.
Adds
select
,select!
anddeletecols
. In this way we have a flexible range of functions for column subsetting in aDataFrame
- in-place and creating a newDataFrame
, with column copying or without it.The functions always return a
DataFrame
(as opposed todf[col]
vsdf[cols]
).