-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nunique for MultiIndex
, DataFrame
, and all NA case with dropna=False
#15962
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will have to investigate 1+% decrease in pass rate too..
data = [np.nan, np.nan] | ||
pd_ser = pd.Series(data) | ||
cudf_ser = cudf.Series(data, nan_as_null=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are actually testing nan
case here or do you want to test only NA
cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to just test the NA case here i.e. where dropna
would impact the nunique count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, okay. Can we use None
then instead of np.nan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. Changed to use None
@@ -2162,3 +2162,14 @@ def test_multi_index_contains_hashable(): | |||
lfunc_args_and_kwargs=((),), | |||
rfunc_args_and_kwargs=((),), | |||
) | |||
|
|||
|
|||
@pytest.mark.parametrize("array", [[1, 2], [1, np.nan], [np.nan] * 2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern as below
As of 72b7ac3, it looks like the diff job is reporting a 0% change. https://github.com/rapidsai/cudf/actions/runs/9486926113 (Looks like there's still some noise in test failures rate between commits e.g. https://github.com/rapidsai/cudf/actions/runs/9473554707 vs https://github.com/rapidsai/cudf/actions/runs/9486926113) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python approval. I think we need two cpp approvals for the libcudf changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor suggestions, generally looks good.
Co-authored-by: David Wendt <[email protected]>
Co-authored-by: Bradley Dice <[email protected]>
Co-authored-by: Bradley Dice <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
/merge |
Description
Fixes 3 bugs with
nunique
MultiIndex.nunique
returning adict
instead of anint
.nunique(dropna=False)
with allNA
s returning 0 instead of 1DataFrame.nunique
preserving column class and type in the resultingSeries.index
Checklist