-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nan_as_null
not being respected when passing cudf object
#14687
Changes from 3 commits
46a61c3
324376e
45bb9e3
65cbaa1
64b5a32
b569b62
f3cf614
5289adf
8855f40
79c318f
4c7e3a5
9456db9
302df52
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Copyright (c) 2018-2023, NVIDIA CORPORATION. | ||
# Copyright (c) 2018-2024, NVIDIA CORPORATION. | ||
|
||
from __future__ import annotations | ||
|
||
|
@@ -1939,20 +1939,18 @@ def as_column( | |
* pyarrow array | ||
* pandas.Categorical objects | ||
""" | ||
if isinstance(arbitrary, ColumnBase): | ||
if dtype is not None: | ||
return arbitrary.astype(dtype) | ||
if isinstance(arbitrary, (ColumnBase, cudf.Series, cudf.BaseIndex)): | ||
if isinstance(arbitrary, cudf.Series): | ||
column = arbitrary._column | ||
elif isinstance(arbitrary, cudf.BaseIndex): | ||
column = arbitrary._values | ||
else: | ||
return arbitrary | ||
|
||
elif isinstance(arbitrary, cudf.Series): | ||
data = arbitrary._column | ||
column = arbitrary | ||
if column.dtype.kind == "f" and (nan_as_null is None or nan_as_null): | ||
column = column.nans_to_nulls() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mroeschke - The CI failure is a result of us hitting this check when we call compute() in the dask-cudf test. When a dask collection is "computed", the partitions need to be concatenated together. It seems that calling One possible fix may be to modify this line to something like: return cls(data=col, index=index, name=name, nan_as_null=False) However, I'm not immediately sure if there are other corner cases that this doesn't catch. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I suppose the best way to test this outside of dask is to add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the insight! The |
||
if dtype is not None: | ||
data = data.astype(dtype) | ||
elif isinstance(arbitrary, cudf.BaseIndex): | ||
data = arbitrary._values | ||
if dtype is not None: | ||
data = data.astype(dtype) | ||
column = column.astype(dtype) | ||
return column | ||
|
||
elif hasattr(arbitrary, "__cuda_array_interface__"): | ||
desc = arbitrary.__cuda_array_interface__ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually I tend to prefer some parameter handling at the start of the function, such as: