-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add groupby::replace_nulls(replace_policy)
api
#7118
Conversation
Requires NVIDIA/thrust#1374 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach seems sub-optimal to me as it requires deep-copying the group labels just to do a groupby fillna. If instead you could request the fillna directly from the groupby object you could avoid that deep copy.
I thought about both - current impl seems less intrusive to the |
replace_nulls
with key
parameter, segmented null replacementsgroupby::replace_nulls
, segmented null replacements
@isVoid can you add more clarity to the PR description? In particular, I don't understand what "segmented null replacement" means. Are you replacing nulls in only some groups? If it is all groups, then I would think a regular "flat" null replacement would work. |
groupby::replace_nulls
, segmented null replacementsgroupby::replace_nulls
, null value replacements within groups
@harrism Yes, it includes all groups. Does this look good? |
Since the documentation is not written yet, I still don't understand, but I'll wait. |
Codecov Report
@@ Coverage Diff @@
## branch-21.06 #7118 +/- ##
===============================================
Coverage ? 82.89%
===============================================
Files ? 105
Lines ? 17875
Branches ? 0
===============================================
Hits ? 14818
Misses ? 3057
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of very tiny things.
Co-authored-by: nvdbaranec <[email protected]>
@@ -202,6 +207,32 @@ cdef class GroupBy: | |||
|
|||
return Table(data=result_data, index=grouped_keys) | |||
|
|||
def replace_nulls(self, Table values, object method): | |||
cdef table_view val_view = values.view() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still scratching my head to make sure but I think this might unnecessarily materialize a RangeIndex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might, but it depends on whether the upstream passes in a RangeIndex
in values
. Here the meaning of values
are the value columns in the groupby.replace_nulls
operation, RangeIndex
shouldn't exist. (Python interface is yet to be added so it is a bit hard to see).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question otherwise cython LGTM
@gpucibot merge |
Part 1 of #4896, follow up of #6907
This PR provides a groupby version of the
replace_nulls(replace_policy)
function. A regularreplace_nulls(replace_policy)
operation updates the nulls with the first non-null value that precedes/follows the null. The groupby version is similar, with an exception that the non-null value look-up is bounded by groups.Here is an example to illustrate the API input/output behavior: