-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transposing DataFrame #2743
Comments
I'll try this out in my next project and let you know what I think! |
@sl-solution - just to give you a perspective. We are aware that our current design (and also in other ecosystems) is not ideal. However, it will take some time to test/give recommendations for your proposal, as this requires field testing. |
@bkamins that seems ok, I am currently working on its performance and keep updating it. |
This is a long standing issue #1181 which I hope to resolve in the future. Simply there was no agreement about the best design for it yet. However, I expect that the requirement:
will be the hardest to handle in DataFrames.jl (I have not thought about it much yet though). |
Modifying PrettyTables to have multi-column headers is very high on my wishlist of data utilities, particularly for latex, which may be the easiest case. |
I was thinking about this issue and come to the conclusion that the good API for all we need here is as follows (the arg/kwarg order/naming can be changed, but I want to pass a general idea). It generally goes along what @sl-solution proposed but extends it and builds on a basic intuition that people have and our API in data frames:
I assume that With Now the
The only challenge is to decide on two things: how do we want to handle when multiple rows are returned (probably as currently - just populate several rows with the same
This approach - if I understand things correctly - extends the proposal of @sl-solution and essentially makes everything that pivot table e.g. in Excel provides (and even more as we would allow not to do aggregation - pivot tables normally enforce aggregation, but in What do you think about it (and if you like it what do you think about the key design decisions I have outlined)? |
Maybe it needs a little more thinking (BTW I think something like
|
Just a comment on the API here, I would like as may keyword arguments as possible instead of positional arguments. Given all arguments will be the same type (column selectors) and the order of inputs is not obvious, keyword arguments are best.
I think people use pivots for more than just displaying. I think cell-based transformations can be handled separately than reshaping. |
For me this functionality is mostly for transforming data. If we want a flexible data structure that supports tabulating with sub-aggregations, mulit-level row and column names, possibly other number of dimensions than just two etc. this should be a separate package I think. FreqTables.jl is an example of such a package, where
This is what I assumed you would say 😄. Do you have good names in mind?
Yes - they could. The point is that reshaping is just |
Ah, sorry. I tihnk I am confused. Is this a replacement for |
The point is that, as noted by @sl-solution, we could have one function that would cover what |
Ah okay. So the key benefit here is the nesting based on the This is interesting, but I'm not sure its conceptually similar enough to be the same function as |
yes - and the ability to:
And all these three features are something that users ask about. But what I want to avoid is:
Finally if more than one values column is passed we need to decide how they are shown (Excel shows them as additional columns, but I am not sure it is the best approach if we do not support multil-level column index) |
It is indeed just aggregation and transposing in some sense. And aggregation and the proposed df_transpose() can handle it already, but it would not be displayed properly and it would lack some subtle features. I guess this is very interesting feature (and already the main computation parts are available inside
I can't think of any other scenario? may you give some examples.
Those bold numbers are also part of this. My draft idea is: The tabulation of data is nothing more than using
probably this proposal can handle the above scenario, however, the following scenarios don't fit in it
Maybe we need to gather information about the variables first, roughly like, user selects which variables are defining categories, and which variables are going to be used for aggregation and then arranges them in what ever order s/he likes, for example, lets call the names of the variables in our example as [V1, V2,[sum mean]] ,[V3, V4] creates something like the first output above, and [V1, V2] means they nested with the order given, [V1 V2] means they are at the same level. Thus, [V1, V2, [sum mean]] means, sum and mean are at the same level however they are nested in V2 which is nested in V1. Sub-aggregation may have a special names but their place can follow the same rule. Also something like [V1, V2, [V3 V4], [mean sum], V6] is valid. |
Closing it as I feel this will not get ported to DataFrames.jl. Feel free to re-open if you think otherwise. |
I developed a package to explore the idea of implementing a function to transpose
DataFrame
s. The README.md file of the developed package contains some examples for reference, and can be access fromhttps://github.com/sl-solution/DFTranspose.jl
The transposing of data has been discuss previously, e.g. #2732,#1181, #2698, #2422, #2215, #2205, #2148, #1839, and the implemented function is trying to tackle some of them.
I think it should be a good idea to include this functionality in
DataFrames.jl
.The text was updated successfully, but these errors were encountered: