-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accepting array element in rows specificed by named tuples, in combine
#3335
Comments
Please correct me if I am wrong, but this would imply that |
@bkamins I guess there are implicitly two suggestions here for the output of the function argument to
The rational for 2 is that it is easy to explicitly wrap the output (of the function argument to My preference is shaped by my usage of |
It was discussed on Slack.
rather wrap all in additional vectors
or equivalently:
(or directly in definition of I am aware that it is a bit inconvenient. We will not make the option 2. (breaking change) because it is breaking. Option 1. could be acceptable, but it is problematic as the rule determining how named tuple is handled would become even more complex than it is now (in general: users already find Indeed in the code when have these two errors thrown because we wanted to leave the freedom in the future to decide what to do in that case (@nalimilan style 😄). So the conclusion is that we can discuss it, but we want to reach a consensus before making such a fundamental change. One of the reasons for this is that typically users could expect that |
So if I understand you correctly @bkamins: the best way of having a single row output from the combining function, which is robust against the potential content of that row (in the above mentioned case: arrays), is to wrap that row into a DataFrame with one single row, or to wrap it into a named tuple which would be expanded into a single rowed DataFrame in the DataFrame constructor? If that is the case: wouldn't it be more natural to have a convention for explicitly returning a row? If named tuples are out of the question, could we have a |
Indeed, though, it would be useful to add support for:
where your Do you think it would be useful to add something like this? CC @nalimilan, @pdeffebach, @ararslan The change it would require in the minilanguage is to add support for I think it would be OK to add it. |
I wrote a package for this purpose at some point. I didn't make it very far clearly, but the bones are here in DataRows.jl. But party of the reason it was abandoned was precisely because I mentioned this on Slack, but I do think the long term solution is 2.0, where As for the current behavior, I think it's not a big deal to require the use of arrays. If performance is important, |
The reason is that
Still, I think recognizing My question is, for 1.x release branch of DataFrames.jl if we feel that adding |
@pdeffebach - to expand. |
@bkamins I'm not sure I understand the implications of your suggestion. How I would use Also, perhaps a bit tangential, why do you consider a trait based approach to be fragile? |
Just as I have written above:
passing return value of
For two reasons:
(but this is unrelated to your issue, as table-row is not a table, so |
sorry for being an idiot, but I don't get what |
|
@bkamins Thanks, and thanks for elaborating on the trait issue. I think your proposal would be very useful. |
My only comment is that a DataFrames-specific type, like |
What would be the benefit of The definition would be |
I did not understand if such a workaround was considered and deemed unsatisfactory for some reason.
|
OP wants |
I get that you're looking for this, isn't it?
|
@sprmnt21 : Yes. This issue is a request to avoid having to construct a full |
Ah - I missed
and (this is a different output, but the same intention)
What we would need to change is the "legacy" syntax (i.e. taking a single function), where the current rule is that we only recognize:
(and this is a closed list of recognized objects) |
If I'm not getting too off topic, I'd like to try to put the question in a more general way. Here are some fictitious examples:
The "expansion" of a NamedTuple or a DataFrame or any other "complex" structure should be explicitly required/constructed by the user. |
combine(gdf, src => AsRow() do x
# some great stuff happens here
end => AsTable) With combine(gdf, src => Tables.Row ∘ (x -> begin
# some other stuff happens here
end) => AsTable) which is a bit more awkward to read and to write. |
Adding support for |
Creating a DataFrame from a list of named tuple works fine, even when the nt elements contain arrays
But that is not the case if the DataFrame is created by a
combine
operationsgives
The above example would work as expected if one removes these two lines:
DataFrames.jl/src/groupeddataframe/complextransforms.jl
Line 52 in fcd98c6
DataFrames.jl/src/groupeddataframe/callprocessing.jl
Line 69 in fcd98c6
An obvious use-case for needing this is when you want some ordered indices or values from the group, in a single row for each group.
Would you be willing to reconsider these exceptions, and allow arrays as named tuple elements when using
combine
?The text was updated successfully, but these errors were encountered: