Fix float grouping #2791

bkamins · 2021-06-18T12:02:21Z

This reverts commit 289fadf.

nalimilan · 2021-06-18T15:40:56Z

Woops. Though maybe it would be worth checking whether the data contains -0.0, and use the fast path if not? Is there also a problem with NaN and Inf?

bkamins · 2021-06-18T19:56:34Z

Is there also a problem with NaN and Inf?

No, it is OK, because:

julia> isinteger(Inf)
false

julia> isinteger(NaN)
false

We could add a check against -0.0. However, I feel that doing groupby floats is rare (and in general discouraged) therefore I thought we can skip the fast path in this case.

If you would prefer to add special handling of -0.0 instead please let me know and I can add this instead of what I propose now.

nalimilan · 2021-06-19T19:51:04Z

As you prefer, but I tend to think that it's good to support the fast path in as many cases as possible unless that's hard to do. For example, R uses floats by default when reading CSV files, so people could end up with float columns with only integers instead of Integer columns when transferring from or loading Arrow files written in R. That may even end up in a blog post comparing DataFrames.jl's performance against R. It's hard to anticipate what weird things people might do...

test/grouping.jl

bkamins · 2021-06-19T20:35:18Z

As you prefer, but I tend to think that it's good to support the fast path in as many cases as possible unless that's hard to do.

I have implemented it and added tests showing the consequences. Let us decide which approach is better.

src/groupeddataframe/utils.jl

bkamins · 2021-06-24T09:19:22Z

I will finalize this PR with the current design (treat float as integer if possible). After thinking I think it is less breaking, as we just have to add only the following condition to the current rules:

-0.0 is considered not to be an integer

(so doing it this way is less breaking)

Co-authored-by: Milan Bouchet-Valat <[email protected]>

…aFrames.jl into fix_float_grouping

bkamins · 2021-06-26T18:58:01Z

@nalimilan - I have made the requested cleanups. This should be good to be merged. Thank you!

src/groupeddataframe/groupeddataframe.jl

src/groupeddataframe/utils.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

src/groupeddataframe/groupeddataframe.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

bkamins · 2021-06-28T11:43:38Z

Thank you!

bkamins added 2 commits June 11, 2021 19:50

add setindex! rules

289fadf

fix float grouping bug

c6bee1b

bkamins added the bug label Jun 18, 2021

bkamins added this to the patch milestone Jun 18, 2021

bkamins added 2 commits June 18, 2021 14:03

Revert "add setindex! rules"

9c9ed11

This reverts commit 289fadf.

fix version

d57eff8

bkamins requested a review from nalimilan June 18, 2021 12:05

bkamins added 2 commits June 18, 2021 15:25

Merge branch 'main' into fix_float_grouping

af484e3

fix tests

3641c99

allow fast grouping of floats

9b7e9a6

bkamins commented Jun 19, 2021

View reviewed changes

test/grouping.jl Show resolved Hide resolved

bkamins mentioned this pull request Jun 19, 2021

Clean up precompile statements #2792

Closed

fix typo

df6f3cd

nalimilan reviewed Jun 20, 2021

View reviewed changes

src/groupeddataframe/utils.jl Outdated Show resolved Hide resolved

src/groupeddataframe/utils.jl Outdated Show resolved Hide resolved

bkamins mentioned this pull request Jun 22, 2021

Use standard Tables.Schema constructor instead of constructing directly #2797

Merged

pdeffebach approved these changes Jun 25, 2021

View reviewed changes

bkamins and others added 4 commits June 26, 2021 20:47

Update src/groupeddataframe/utils.jl

96febcf

Co-authored-by: Milan Bouchet-Valat <[email protected]>

update news

b3ab07d

Merge branch 'fix_float_grouping' of https://github.com/JuliaData/Dat…

cd1d7e6

…aFrames.jl into fix_float_grouping

code cleanup after the review

c5173f1

nalimilan reviewed Jun 27, 2021

View reviewed changes

src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved

src/groupeddataframe/utils.jl Outdated Show resolved Hide resolved

bkamins and others added 2 commits June 27, 2021 22:27

Update src/groupeddataframe/utils.jl

bdd4b1c

Co-authored-by: Milan Bouchet-Valat <[email protected]>

improve docstring

2b7e9f6

nalimilan approved these changes Jun 28, 2021

View reviewed changes

src/groupeddataframe/groupeddataframe.jl Outdated Show resolved Hide resolved

Update src/groupeddataframe/groupeddataframe.jl

aee506a

Co-authored-by: Milan Bouchet-Valat <[email protected]>

bkamins merged commit f0b5a57 into main Jun 28, 2021

bkamins deleted the fix_float_grouping branch June 28, 2021 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix float grouping #2791

Fix float grouping #2791

bkamins commented Jun 18, 2021

nalimilan commented Jun 18, 2021

bkamins commented Jun 18, 2021

nalimilan commented Jun 19, 2021

bkamins commented Jun 19, 2021

bkamins commented Jun 24, 2021

bkamins commented Jun 26, 2021

bkamins commented Jun 28, 2021

Fix float grouping #2791

Fix float grouping #2791

Conversation

bkamins commented Jun 18, 2021

nalimilan commented Jun 18, 2021

bkamins commented Jun 18, 2021

nalimilan commented Jun 19, 2021

bkamins commented Jun 19, 2021

bkamins commented Jun 24, 2021

bkamins commented Jun 26, 2021

bkamins commented Jun 28, 2021