-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix groupreduce with var and std for Unitful types #2601
Conversation
I think adding Unitful.jl as [extras] dependency is not a problem. |
@nalimilan - when we merge this we should backport it to https://github.com/JuliaData/DataFrames.jl/tree/0.22_patches and make a release. Therefore can you please change the version to 0.22.3 in Project.toml in this PR? (I can do the backport and the release later, unless you would prefer to do it) |
So now we have:
On Julia 1.0 😢. |
Actually that's a legitimate failure that must be fixed on other versions too. The random values just happened to hit that case on Julia 1.0. I've pushed a fix and an adaptation of tests to more reliably cover the situation where a group contains a single row. |
test/grouping.jl
Outdated
df = DataFrame(a = [rand([1:4;missing], 19); 5], | ||
x1 = rand(1:100, 20), | ||
x2 = rand(1:100, 20) + im*rand(1:100, 20), | ||
x4 = rand(1:100, 20) .* u"m") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also add x5
column that would be like x4
but also contain missing
value and then below, in particular we would test also var∘skipmissing
etc. (and make sure that we have a group that has only missing
values in it).
In general I would want to test what happens if fast aggregation functions get a vector of zero length (which is only possible with missing
s as currently our groups must have a positive length).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tested below. Do you mean we should also test it for Unitful? I'm afraid that will require duplicating the whole block, as prod
doesn't apply to these so we can't just loop over different types of columns. And then there's also Complex
and CategoricalArray
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I meant a test with Unitful
. Maybe then just add a small test with e.g. var∘skipmissing
on a concrete test DataFrame like: one group only, column is union of missing and Unitful
and contains only missing
values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's bite the bullet and add systematic tests like for floats. I've pushed a commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow - that was fast and precise. I would just add one more test as commented (hopefully it will pass so I approve the PR).
For `Number`s for which squaring changes the type, we need to use different types for the mean, the variance and the standard deviation.
Co-authored-by: Bogumił Kamiński <[email protected]>
# Test reduction over group with only missing values | ||
gd = groupby_checked(df, :a, skipmissing=skip, sort=sort) | ||
indices && @test gd.idx !== nothing # Trigger computation of indices | ||
gd[1][:, :x2m] .= missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 😄
will you make a patch release or should I do it? |
Please go ahead. :-) |
For `Number`s for which squaring changes the type, we need to use different types for the mean, the variance and the standard deviation.
For
Number
s for which squaring changes the type, we need to use different types for the mean, the variance and the standard deviation.Fixes #2600.
@bkamins Do you think we should add Unitful as a test dependency to test this? I wonder whether there are types in Base that would allow reproducing the problem. Otherwise we could create a custom type just for this.