Fix groupreduce with var and std for Unitful types #2601

nalimilan · 2021-01-16T11:53:12Z

For Numbers for which squaring changes the type, we need to use different types for the mean, the variance and the standard deviation.

Fixes #2600.

@bkamins Do you think we should add Unitful as a test dependency to test this? I wonder whether there are types in Base that would allow reproducing the problem. Otherwise we could create a custom type just for this.

bkamins · 2021-01-16T11:55:28Z

I think adding Unitful.jl as [extras] dependency is not a problem.

src/groupeddataframe/fastaggregates.jl

bkamins · 2021-01-16T13:34:35Z

@nalimilan - when we merge this we should backport it to https://github.com/JuliaData/DataFrames.jl/tree/0.22_patches and make a release. Therefore can you please change the version to 0.22.3 in Project.toml in this PR?

(I can do the backport and the release later, unless you would prefer to do it)

bkamins · 2021-01-16T16:10:28Z

So now we have:

m^2 and NaN are not dimensionally compatible.

On Julia 1.0 😢.

nalimilan · 2021-01-16T17:00:42Z

Actually that's a legitimate failure that must be fixed on other versions too. The random values just happened to hit that case on Julia 1.0. I've pushed a fix and an adaptation of tests to more reliably cover the situation where a group contains a single row.

test/grouping.jl

bkamins · 2021-01-16T23:06:31Z

test/grouping.jl

+    df = DataFrame(a = [rand([1:4;missing], 19); 5],
+                   x1 = rand(1:100, 20),
+                   x2 = rand(1:100, 20) + im*rand(1:100, 20),
+                   x4 = rand(1:100, 20) .* u"m")


can we also add x5 column that would be like x4 but also contain missing value and then below, in particular we would test also var∘skipmissing etc. (and make sure that we have a group that has only missing values in it).

In general I would want to test what happens if fast aggregation functions get a vector of zero length (which is only possible with missings as currently our groups must have a positive length).

This is tested below. Do you mean we should also test it for Unitful? I'm afraid that will require duplicating the whole block, as prod doesn't apply to these so we can't just loop over different types of columns. And then there's also Complex and CategoricalArray...

Yes - I meant a test with Unitful. Maybe then just add a small test with e.g. var∘skipmissing on a concrete test DataFrame like: one group only, column is union of missing and Unitful and contains only missing values?

OK, let's bite the bullet and add systematic tests like for floats. I've pushed a commit.

bkamins

Wow - that was fast and precise. I would just add one more test as commented (hopefully it will pass so I approve the PR).

For `Number`s for which squaring changes the type, we need to use different types for the mean, the variance and the standard deviation.

Co-authored-by: Bogumił Kamiński <[email protected]>

bkamins · 2021-01-21T15:00:39Z

test/grouping.jl

+        # Test reduction over group with only missing values
+        gd = groupby_checked(df, :a, skipmissing=skip, sort=sort)
+        indices && @test gd.idx !== nothing # Trigger computation of indices
+        gd[1][:, :x2m] .= missing


bkamins · 2021-01-21T15:11:57Z

will you make a patch release or should I do it?

nalimilan · 2021-01-21T15:43:14Z

Please go ahead. :-)

For `Number`s for which squaring changes the type, we need to use different types for the mean, the variance and the standard deviation.

nalimilan mentioned this pull request Jan 16, 2021

combine() error with std() on GroupedDataFrame with Unitful columns #2600

Closed

bkamins reviewed Jan 16, 2021

View reviewed changes

src/groupeddataframe/fastaggregates.jl Outdated Show resolved Hide resolved

bkamins added the bug label Jan 16, 2021

bkamins added this to the 1.0 milestone Jan 16, 2021

bkamins added grouping backport labels Jan 16, 2021

nalimilan marked this pull request as ready for review January 16, 2021 17:42

bkamins reviewed Jan 16, 2021

View reviewed changes

test/grouping.jl Show resolved Hide resolved

bkamins reviewed Jan 16, 2021

View reviewed changes

bkamins approved these changes Jan 16, 2021

View reviewed changes

nalimilan and others added 6 commits January 21, 2021 09:52

Fix groupreduce with var and std for Unitful types

dfa189e

For `Number`s for which squaring changes the type, we need to use different types for the mean, the variance and the standard deviation.

Add tests

3d67a1c

Bump version

b688aa4

More precise eltype

6ce6c0f

Co-authored-by: Bogumił Kamiński <[email protected]>

Fix NaN

ba5ed69

Add tests

86b43ab

nalimilan force-pushed the nl/agg branch from 268fc47 to 86b43ab Compare January 21, 2021 08:53

nalimilan requested a review from bkamins January 21, 2021 09:56

bkamins reviewed Jan 21, 2021

View reviewed changes

bkamins approved these changes Jan 21, 2021

View reviewed changes

nalimilan merged commit 34e53e0 into main Jan 21, 2021

nalimilan deleted the nl/agg branch January 21, 2021 15:05

bkamins pushed a commit that referenced this pull request Jan 21, 2021

Fix groupreduce with var and std for Unitful types (#2601)

89b4b89

For `Number`s for which squaring changes the type, we need to use different types for the mean, the variance and the standard deviation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix groupreduce with var and std for Unitful types #2601

Fix groupreduce with var and std for Unitful types #2601

nalimilan commented Jan 16, 2021

bkamins commented Jan 16, 2021

bkamins commented Jan 16, 2021

bkamins commented Jan 16, 2021

nalimilan commented Jan 16, 2021

bkamins Jan 16, 2021

nalimilan Jan 17, 2021

bkamins Jan 17, 2021

nalimilan Jan 21, 2021

bkamins left a comment

bkamins Jan 21, 2021

bkamins commented Jan 21, 2021

nalimilan commented Jan 21, 2021

Fix groupreduce with var and std for Unitful types #2601

Fix groupreduce with var and std for Unitful types #2601

Conversation

nalimilan commented Jan 16, 2021

bkamins commented Jan 16, 2021

bkamins commented Jan 16, 2021

bkamins commented Jan 16, 2021

nalimilan commented Jan 16, 2021

bkamins Jan 16, 2021

Choose a reason for hiding this comment

nalimilan Jan 17, 2021

Choose a reason for hiding this comment

bkamins Jan 17, 2021

Choose a reason for hiding this comment

nalimilan Jan 21, 2021

Choose a reason for hiding this comment

bkamins left a comment

Choose a reason for hiding this comment

bkamins Jan 21, 2021

Choose a reason for hiding this comment

bkamins commented Jan 21, 2021

nalimilan commented Jan 21, 2021