-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow fillna
to validate for CategoricalColumn.fillna
#15683
Conversation
/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to hit go on this one. There are no really substantive comments here though.
# Validation of `fill_value` will have to be performed | ||
# before returning self. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this comment suggests that validation of the value must still be performed. But I think it has been in the code above.
So perhaps
Even if there are no nulls, we must validate that the provided value is valid (done above) to match pandas
WDYT?
) | ||
or method is not None | ||
or ( | ||
isinstance(col, cudf.core.column.CategoricalColumn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This special-cases categorical columns for fillna both to not care about whether the column has nulls and whether the column name is in the provided dict-like (?) of values.
It so happens that the _is_null_host_scalar
check will be true in the latter case, but it's somewhat confusing.
Do you want to just provide a carve-out for categorical columns in the first condition as:
col_name in value
and (col.has_nulls(include_nan=True) or isinstance(col, CategoricalColumn))
and not libcudf.scalar._is_null_host_scalar(replace_val)
?
…fillna()` based on latest cuDF changes (#4408) This handles a [recent cuDF change](rapidsai/cudf#15683) by applying non-dict and non-Series values for a `fillna()` call on `PropertyGraph` instances only to the user-defined columns, with the assumption that savvy users that intend to update the "internal" columns, or users that are aware of their own categorical dtype columns, will use a dict or Series value to properly apply dtypes as needed. This also updates code in `cugraph-service` that serializes dataframes to numpy bytes to properly convert NA values when categoricals are present. Notes: * This is only applied to the SG `PropertyGraph` class. The MG class needs further review as to how to best apply the same policy (and because there are other MG failing tests that need addressed). Since this is blocking CI for the SG case only, this PR is being submitted now and MG will be addressed later, which should be okay since `PropertyGraph` is experimental. * This could be considered a breaking change if `PropertyGraph` was not experimental. Authors: - Rick Ratzel (https://github.com/rlratzel) Approvers: - Alex Barghi (https://github.com/alexbarghi-nv) URL: #4408
Description
Fixes: #15666
This PR validates values passed to
fillna
even if there are no null values in a categorical column.Forks from #14534
Checklist