New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Allow `None` when `nan_as_null=False` in column constructor #15709

Merged

rapids-bot merged 17 commits into rapidsai:branch-24.06 from galipremsagar:15708

May 15, 2024

Contributor

galipremsagar commented May 9, 2024

Description

Fixes: #15708

This PR fixes an issue where we were throwing an error when None is present and nan_as_null=False, this is a bug because of using pd.isna, this returns True for nan, None and NA. Whereas we are only looking for np.nan and not None and pd.NA

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.


          allow None values when nan_as_null=False

4beec73

galipremsagar added bug non-breaking labels

galipremsagar added this to the cudf.pandas API coverage milestone

galipremsagar requested a review from mroeschke

May 9, 2024 19:55

galipremsagar self-assigned this

galipremsagar requested a review from a team as a code owner

May 9, 2024 19:55

galipremsagar requested a review from isVoid

May 9, 2024 19:55

galipremsagar changed the title ~~Allow None when nan_as_null=False~~ Allow None when nan_as_null=False in column constructor

github-actions bot added the Python label

galipremsagar commented

View reviewed changes

python/cudf/cudf/core/column/column.py Outdated

Comment on lines 1959 to 1963

                           elif nan_as_null is False and (
                               pd.isna(arbitrary).any()
-                              and inferred_dtype not in ("decimal", "empty")
+                              and inferred_dtype not in ("decimal", "empty", "string")
                           ):
                               # Decimal can hold float("nan")

Contributor Author

galipremsagar May 9, 2024 •

edited

Loading

@mroeschke After the changes in this PR, do you think this block might be redundant or does it still capture some error scenarios?


          Merge branch 'branch-24.06' into 15708

d737cd2

bdice reviewed

View reviewed changes

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

galipremsagar and others added 2 commits

May 9, 2024 15:16


          Update column.py

8499dac

Co-authored-by: Bradley Dice <[email protected]>


          Update column.py

Co-authored-by: Bradley Dice <[email protected]>

bdice reviewed

View reviewed changes

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved


          Update column.py

4ed3f05

mroeschke reviewed

View reviewed changes

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

galipremsagar added 2 commits

May 10, 2024 00:34


          improve if condition


          Merge remote-tracking branch 'upstream/branch-24.06' into 15708

383ea3d

galipremsagar requested review from mroeschke and bdice

May 10, 2024 13:16

galipremsagar commented

View reviewed changes

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

galipremsagar added 2 commits

May 10, 2024 08:17


          Update python/cudf/cudf/core/column/column.py

aeba17b


          Update column.py

a1d11a2

mroeschke reviewed

View reviewed changes

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

mroeschke reviewed

View reviewed changes

python/cudf/cudf/core/column/column.py Outdated

-                              pd.isna(arbitrary).any()
+                              any(
+                                  (isinstance(x, (np.floating, float)) and np.isnan(x))
+                                  or (inferred_dtype == "boolean" and pd.isna(arbitrary))

Contributor

mroeschke May 10, 2024

Would be good to have (inferred_dtype == "boolean" and pd.isna(arbitrary)) be evaluated outside the loop.

Also what case is this condition trying to catch?

Contributor Author

galipremsagar May 14, 2024

Done.

Also what case is this condition trying to catch?

It is trying to catch this case:

pd.Series(["a", "b", np.nan], dtype='object')

galipremsagar added 3 commits

May 10, 2024 21:35


          Merge remote-tracking branch 'upstream/branch-24.06' into 15708

dffae45


          Merge remote-tracking branch 'upstream/branch-24.06' into 15708

e881370


          separate booleans


          Simplify

d88c23a

galipremsagar requested a review from mroeschke

May 14, 2024 18:46


          Merge branch 'branch-24.06' into 15708

1d46749

mroeschke approved these changes

View reviewed changes

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

galipremsagar added the 5 - Ready to Merge label

bdice reviewed

View reviewed changes

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/column/column.py

+                                      f"Cannot have mixed values with {inferred_dtype}"
+                                  )
+                          elif (
+                              nan_as_null is False

Contributor

bdice May 14, 2024

Don't compare to booleans with is.

Suggested change

      
                            nan_as_null is False
          
                            not nan_as_null

Contributor Author

galipremsagar May 14, 2024

Same as https://github.com/rapidsai/cudf/pull/15709/files#r1600749515

python/cudf/cudf/core/column/column.py

+                                      raise MixedTypeError(
+                                          f"Cannot have mixed values with {inferred_dtype}"
+                                      )
+                              elif nan_as_null is False and _has_any_nan(arbitrary):

Contributor

bdice May 14, 2024

Suggested change

      
                            elif nan_as_null is False and _has_any_nan(arbitrary):
          
                            elif not nan_as_null and _has_any_nan(arbitrary):

Contributor Author

galipremsagar May 14, 2024

We need this comparison because None is also a supported parameter and it is similar to True behavior.

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

python/cudf/cudf/core/column/column.py Outdated Show resolved Hide resolved

galipremsagar and others added 2 commits

May 14, 2024 17:40


          Apply suggestions from code review

dd60e3e

Co-authored-by: Bradley Dice <[email protected]>


          Simplify

61e95bc

bdice approved these changes

View reviewed changes


          Merge branch 'branch-24.06' into 15708

a4dce14

Contributor Author

galipremsagar commented May 15, 2024

/merge

rapids-bot bot merged commit fa9d028 into rapidsai:branch-24.06

70 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge bug non-breaking Python