-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of values=None in pylibcudf GroupBy.get_groups #14998
Fix handling of values=None in pylibcudf GroupBy.get_groups #14998
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're at it, can we save a reference to the Table in the GroupBy object?
The columns to get group labels for. If not specified, the group | ||
labels for the group keys are returned. | ||
The columns to get group labels for. If not specified, | ||
an empty table is returned for the group values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return an empty table or should we not return anything at all? On one hand I tend to prefer the return type not changing based on arguments, on the other I would prefer not creating an extra object for no reason and there is precedent for this kind of thing in numpy APIs (like np.unique).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No preference. I changed t oreturn None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also saved a reference to keys table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought for your consideration, but I don't have a strong opinion so feel free to address however you want, just wanted to raise the question. Thanks for the fix!
- A table of group values or None | ||
- A list of integer offsets into the tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thought: if we returned a pair instead of a triple when values is None, we would probably reorder these outputs to be keys, offsets, values so that the first two outputs would always be the same. Do we still want to do that even if we do return None, such that it's the last return value that gets discarded? keys, offset, _ = ...
seems more natural than keys, _, offsets = ...
if we do have to discard an object (or if you're indexing, using a contiguous index set res[0]
and res[1]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I've reordered to (offsets, keys, values)
as it feels awkward to separate keys
and values
.
/merge |
A small bug in our previous implementation leads to a segfault when calling
.get_groups()
with novalues
. Thankfully, the cuDF Python API always calls this function with a value, but it's possiblepylibcudf
consumers will not.