-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: unstack multiple :values columns #2215
Comments
naïve implementation: function DataFrames.unstack(df::DataFrame, rowKeys::AbstractVector{Symbol}, colKey::Symbol, valueCols::AbstractVector{Symbol}, renameCols)
unstacked = [
unstack(df, rowKeys, colKey, valueCol, renamecols=(c) -> renameCols(c, valueCol))
for valueCol in valueCols
]
return join(unstacked..., on=rowKeys)
end |
This is a duplicate of #2148, so I am closing this issue. Please comment there if you find something more to add. Thank you for reporting this. |
I don't think it's a duplicate - I read that issue as wanting to pass multiple columns as |
OK - you are right. |
+1, I've wanted to do this many times |
#2743 is trying to give a solution for some of similar problems. |
This would be very useful. Has there been progress on this in the meantime? |
No. But let us start with defining what we want exactly on a working example. Is this what you want:
|
Almost. Instead of taking the data from a "stacked column" (
modulo the order of the columns, maybe. I'll describe that in the next comment to allow people to give separate feedback. |
Ideally, I would say:
All but the default for (3.) could be implemented later, though. |
Having some sort of "multi-level headers" oder indices would be nice, but I don't know what would be a nice user interface for that. Simply concatenating the column names would suffice for me (I would write the data to CSV and format the table header directly in LaTeX, I guess). Maybe this could be added on top later. |
"multi-level headers" are not possible to be supported in any near future (as opposed to pandas). We need to generate column names. I was not rushing with the implementation of this request because we need to make the following design decisions. In general users want:
This requires us to decide on:
and all these decisions need to be made before making any changes to make sure we will not have to make breaking changes later. |
Example - unstack multiple value columns (Please remove if not helpful)
should work like
producing
Though as mentioned by others the column suffix should be given by the value field giving column names
I think the duplicate handling for the single value field implementation should work equally well for multiple value fields |
I am closing this in favor of #3237 (to have a single place to discuss all related issues) |
in Pandas:
Given this, you can unstack on multiple value columns by just passing e.g.
df.set_index(['paddockId', 'color']).unstack('color')
The equivalent operation seems pretty hard in DataFrames.jl at the moment (unstack n times then join the results?). API-wise, it would be good if the
values
argument could take a vector of values. Result formatting could be difficult, this particular operation is one that makes MultiIndexes shine... I guess renameCols could become a function of(colKey value, valueColumnName)
(so in this instance,(color, valueCol) -> ...
e.g.('red', :count') -> ....
?The text was updated successfully, but these errors were encountered: