-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mode #453
Add mode #453
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets particularly hairy in a groupby scenario where each group may have different lengths
I think this is fine, because I think the computation is going to occur inside the group's context. Maybe adding a test - actually change an existing one with another column - to the summarise
or mutate
can clarify a little bit. WDYT?
Yes, between groups, but not within groups. So take the example where you summarise down to mean and mode. If mode is length 0 or >1, then you have a problem where within groups the aggregations are different lengths. Adding a test or two now to clarify as suggested 👍. |
Well it clarified in a different direction: summarise w/ mode returns a single value per group -- a list! We'll have to pick up #401 anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM although it seems we are waiting on lists?
Change map helper functions' arguments
@doc """ | ||
Gets the most common value of the series. | ||
|
||
## Supported dtypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s just say all except lists? Otherwise it is easy for this to go out of date!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep fair 😂
non_list_dtypes = [ | ||
non_list_dtypes = non_list_types() | ||
list_dtypes = for dtype <- non_list_dtypes, do: {:list, dtype} | ||
non_list_dtypes ++ list_dtypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List types are recursive so this is theoretically incomplete. Instead of doing this list, what if we just append {:list, :any}
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fine by me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like this is used for tests? I'm going to leave it how it was (I just wrapped the non-list ones into a separate function) and then I think we should revisit it in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woo! It's cool to see this PR get across the finish line!
I just had one suggestion about the docs: I think it's worth calling out ties.
Co-authored-by: Billy Lanchantin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it and I can look at the dtypes stuff. :)
See: #452. This is a WIP because I'm not sure how to handle the fact that you can have no mode or multiple. This gets particularly hairy in a groupby scenario where each group may have different lengths (and thus would cause problems with other summary statistics like mean, median where you know the length will be 1).Pandas:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mode.html
https://pandas.pydata.org/docs/reference/api/pandas.Series.mode.html#pandas.Series.mode
https://stackoverflow.com/a/54304691
R:
https://stackoverflow.com/questions/66972590/how-to-find-mean-median-mode-based-on-distinctive-groups-in-r
https://cran.r-project.org/web/packages/modeest/modeest.pdf
Okay! Thanks to #725 this is good to go. Closes #452.