Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Instantiate cuDF objects from python objects containing cudf.NA #8287

Closed
brandon-b-miller opened this issue May 19, 2021 · 0 comments · Fixed by #8442
Closed

[FEA] Instantiate cuDF objects from python objects containing cudf.NA #8287

brandon-b-miller opened this issue May 19, 2021 · 0 comments · Fixed by #8442
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@brandon-b-miller
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When trying to construct a series, index, or other cuDF object, we should be able to include cudf.NA in the data and end up with the right series.

Describe the solution you'd like
Plumb cuDF so that it masks according to cudf.NA as opposed to None:

x = cudf.Series([1, cudf.NA, 3])
# [1, <NA>, 3]

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

>>> x= cudf.Series([1, cudf.NA, 3])
pyarrow.lib.ArrowInvalid: Could not convert <NA> with type _NAType: did not recognize Python value type when inferring an Arrow data type
@brandon-b-miller brandon-b-miller added feature request New feature or request Python Affects Python cuDF API. labels May 19, 2021
@brandon-b-miller brandon-b-miller self-assigned this May 19, 2021
rapids-bot bot pushed a commit that referenced this issue Jun 11, 2021
Closes #8287

PyArrow knows how to handle the `pd.NA` singleton and knows it represents nulls if `from_pandas=True` during array construction. There's not an option to choose what sentinel or value is used to represent null however and the 'detection' of which values are exactly this object is implemented at the c++ level in pyarrow, limiting our options for 'tricking' pyarrow into seeing `cudf.NA` as null. 

As such it is probably best that our `NA` be identically the pandas `NA`. This also makes `cudf.NA is pd.NA` return true, which is probably what we want as well.

Authors:
  - https://github.com/brandon-b-miller

Approvers:
  - Marlene  (https://github.com/marlenezw)
  - Michael Wang (https://github.com/isVoid)

URL: #8442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants