-
Notifications
You must be signed in to change notification settings - Fork 915
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix
Series
comparison vs scalars (#12519)
Fixes an issue where this happens: ```python import cudf cudf.Series(['a','b','c']) == 1 ``` ``` File "/raid/brmiller/anaconda/envs/cudf_dev/lib/python3.9/site-packages/cudf/core/mixins/mixin_factory.py", line 11, in wrapper return method(self, *args1, *args2, **kwargs1, **kwargs2) File "/raid/brmiller/anaconda/envs/cudf_dev/lib/python3.9/site-packages/cudf/core/indexed_frame.py", line 3278, in _binaryop ColumnAccessor(type(self)._colwise_binop(operands, op)), File "/raid/brmiller/anaconda/envs/cudf_dev/lib/python3.9/site-packages/cudf/core/column_accessor.py", line 124, in __init__ column_length = len(data[next(iter(data))]) TypeError: object of type 'bool' has no len() ``` It turns out this happens because `StringColumn`'s `normalize_binop_value` method returns `NotImplemented` for scalars that are not of dtype `object`. This eventually causes python to dispatch to the python scalar class' `__eq__` which returns the scalar `False` when encountering a cuDF object. cuDF expects a column object at this point but has a scalar. This in turn causes cuDF to try and construct a `ColumnAccessor` around a dict that looks like `{'name', False}` ultimately throwing the error. This PR proposes to earlystop this behavior according to the rules for comparing python string scalars with other objects: - Always return `False` for `__eq__` even if the character in the string is equivalent to whatever is being compared - Always return `True` for `__ne__` ditto above. - Copy the input mask This should align us with pandas behavior for this case: ```python >>> pd.Series(['a','b', 'c'], dtype='string') == 1 0 False 1 False 2 False dtype: boolean >>> pd.Series(['a','b', 'c'], dtype='string') != 1 0 True 1 True 2 True dtype: boolean ``` EDIT: Updating this PR to handle a similar issue resulting in the same error when comparing datetime series to strings that contain valid datetimes, such as `20110101`. Authors: - https://github.com/brandon-b-miller Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Lawrence Mitchell (https://github.com/wence-) URL: #12519
- Loading branch information
1 parent
048f936
commit c4a1389
Showing
4 changed files
with
104 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters