-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add AsTable wrapper, disallow NamedTuple in ByRow #2183
Conversation
@nalimilan - the PR should be good to review. Thank you! |
Traditionally only coveralls fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. My main remark is about showing more useful examples. Otherwise people may not understand the point of this selector.
I'd also like to be sure we won't want to change the behavior when the function returns a named tuple.
Let me start with the example on v0.20:
in other words now we are reproducing the behavior we already have. Also on master
In summary If we disallowed this then there would be no possibility to return a produce a vector of Note that we disallow returning In summary: we currently treat a But if you feel that |
I will try to think of something, but I really believe that the default API is more useful most of the time (except for row aggregations - I will try to come up with some better cases). @pdeffebach - can you please suggest something as I know that you had many good ideas here. |
Regarding
|
Sum and mean by row are the two use-cases, I think. A cool thing about
This returns a vector of the symbols equal to the max. Also, I agree with Milan that unwrapping a named tuple would be convenient. A whole workflow around named-tuple to named-tuple functions seems very useful. I think I understand the logic of your objections though and they are valid points. |
Thank you for the suggestion. I will think of something along these lines and add it.
It will be allowed in the future. I just did not plan for it in |
With the kinds of surveys I worked with the past two years, maybe something like
I can't say with complete confidence how much I would use this (I am not analyzing survey data on a day-to-day basis any more). But it seems like a natural workflow that is consistent with using |
Yeah - so it is essentially, when you want to conditionally on row filter out some columns, and based on these filtered columns produce multiple outputs. This is a valid use case. The question is how often it is needed (as a normal way to do such things would be to transform from wide to long format and then do aggregation). As a side note a |
Added such an example to getting started section. |
Thanks for the explanation. I'm not sure the consistency is so visible for users, as they may also expect that returning a I wonder what kind of solution we could use to allow both behaviors in the future if needed. Wrapping named tuples in Anyway for now I'd err on the safe side and throw an error (as usual), returning named tuples from |
OK. Later you will be able to pass |
Co-Authored-By: Milan Bouchet-Valat <[email protected]>
CI passes here (nightly fails due to #2184). |
Co-Authored-By: Milan Bouchet-Valat <[email protected]>
Thank you! |
I have a use case where I'm doing something like a coordinate transformation (it can't be treated as an aggregation). In v0.20, I was able to do something like the following (boiled down considerably): using Statistics, DataFrames, Polynomials
function forcecalc(df, weight_offsets)
α = mean(df.α) # angle of attack
AF = mean(df.AF) - weight_offsets[1](α) # axial force
NF = mean(df.NF) - weight_offsets[2](α) # normal force
(L = -NF*cos(α) - AF*sin(α), # lift
D = -NF*sin(α) + AF*cos(α)) # drag
end
α = vcat([fill(v, 30) for v in -2:2:16]...)
AF = 10sind.(α) .+ rand(length(α))
NF = 100cosd.(α) .+ rand(length(α))
df = DataFrame(α = α, AF = AF, NF = NF)
wt = [Polynomial(rand(3)), Polynomial(rand(3))] # weight offsets
forces = by(df, :α, [:α, :AF, :NF] => df -> forcecalc(df, wt)) I'm subtracting a weight offset polynomial from raw force measurements (:AF and :NF), then rotating the force vector. I haven't found an elegant way to do this in v0.21, but my desired syntax is something like the following: function forcecalc(nt::NamedTuple, weight_offsets)
α = nt.α # angle of attack
AF = nt.AF - weight_offsets[1](α) # axial force
NF = nt.NF - weight_offsets[2](α) # normal force
(L = -NF*cos(α) - AF*sin(α), # lift
D = -NF*sin(α) + AF*cos(α)) # drag
end
varnames = [:AF, :NF]
gdf = groupby(df, :α)
measurements = combine(gdf, varnames .=> mean .=> varnames)
forces = transform(measurements, [:α, :AF, :NF] => ByRow(v -> forcecalc(v, wt))) Note that in the former case, I'm both averaging measurements and applying the transformation in the 'apply' step of split-apply-combine, whereas in the latter case, I'm applying the transformation after s-a-c. |
I have not analyzed your code in detail, but given I know the design assumptions :) the code that should work without changing anything in your 0.20 code except the last line is:
Can you please confirm that it works as expected? (still maybe it is possible to write it in a cleaner way - I am just doing code transformation that should be equivalent between 0.20 and 0.21) |
Save for a typo (should be
That said, I would prefer to perform reduction and transformation in two separate steps (as in my "v0.22" example), since the broader workflow involves applying the same reduction operation, followed by a variety of transformations. |
This is indeed allowed and has a different behavior:
|
Does the second behavior suffer the same performance impact for |
No for |
Fixes #2121
I still need to write tests and update manual and documentation