-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Series
comparison vs scalars
#12519
Changes from 9 commits
a56cc27
9b25ba1
a694b2b
8fb8c11
671e7e5
8c30b11
1783753
2394d84
12bdb88
cfee35b
aa1c6a5
7172f04
06cf9af
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -5665,7 +5665,7 @@ def normalize_binop_value( | |||||||||||||||||||
and other.dtype == "object" | ||||||||||||||||||||
): | ||||||||||||||||||||
return other | ||||||||||||||||||||
if isinstance(other, str): | ||||||||||||||||||||
if is_scalar(other): | ||||||||||||||||||||
return cudf.Scalar(other) | ||||||||||||||||||||
return NotImplemented | ||||||||||||||||||||
|
||||||||||||||||||||
|
@@ -5701,6 +5701,24 @@ def _binaryop( | |||||||||||||||||||
return NotImplemented | ||||||||||||||||||||
|
||||||||||||||||||||
if isinstance(other, (StringColumn, str, cudf.Scalar)): | ||||||||||||||||||||
if isinstance(other, cudf.Scalar) and other.dtype != "O": | ||||||||||||||||||||
if op in { | ||||||||||||||||||||
"__eq__", | ||||||||||||||||||||
"__lt__", | ||||||||||||||||||||
"__le__", | ||||||||||||||||||||
"__gt__", | ||||||||||||||||||||
"__ge__", | ||||||||||||||||||||
"__ne__", | ||||||||||||||||||||
}: | ||||||||||||||||||||
val = False | ||||||||||||||||||||
if op == "__ne__": | ||||||||||||||||||||
val = True | ||||||||||||||||||||
return column.full(len(self), val, dtype="bool").set_mask( | ||||||||||||||||||||
self.mask | ||||||||||||||||||||
) | ||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||||
else: | ||||||||||||||||||||
return NotImplemented | ||||||||||||||||||||
|
||||||||||||||||||||
if op == "__add__": | ||||||||||||||||||||
if isinstance(other, cudf.Scalar): | ||||||||||||||||||||
other = cast( | ||||||||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# Copyright (c) 2018-2022, NVIDIA CORPORATION. | ||
# Copyright (c) 2018-2023, NVIDIA CORPORATION. | ||
|
||
import decimal | ||
import operator | ||
|
@@ -320,13 +320,35 @@ def test_series_compare_nulls(cmpop, dtypes): | |
utils.assert_eq(expect, got) | ||
|
||
|
||
def string_series_compare_test_cases(): | ||
cases = [] | ||
pd_sr = pd.Series(["a", "b", None, "d", "e", None], dtype="string") | ||
all_cmpop_cases = [ | ||
(pd_sr, pd_sr), | ||
(pd_sr, "a"), | ||
("a", pd_sr), | ||
] | ||
|
||
for op in _cmpops: | ||
for case in all_cmpop_cases: | ||
cases.append((*case, op)) | ||
|
||
eq_neq_cases = ( | ||
(pd_sr, 1), | ||
(1, pd_sr), | ||
(pd_sr, 1.5), | ||
(1.5, pd_sr), | ||
(pd_sr, True), | ||
(True, pd_sr), | ||
) | ||
for case in eq_neq_cases: | ||
cases += [(*case, operator.eq), (*case, operator.ne)] | ||
|
||
return cases | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use a fixture to generate these cases, something like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you help me understand a bit more why this approach is better? I've worked it up locally a few different ways and it seems like this forces some kind of programmatic skipping inside the test body since we're not dealing with a cartesian product of the parameterization, whereas the current approach just generates the exact tests we need. Is it the test output that you're concerned about here? (I think this would show up as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We're trying to move away from a setup where test collection creates heavyweight objects (e.g. the series here). I see that you don't actually need the cartesian product. Does it work to separate the eq/neq cases from the other? You can then have two test functions that handle the general case and then the eq/neq case taking different parameters. There's some discussion here https://docs.rapids.ai/api/cudf/nightly/developer_guide/testing.html There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it, thanks for the context. An attempt at 7172f04 |
||
|
||
|
||
@pytest.mark.parametrize( | ||
"obj", [pd.Series(["a", "b", None, "d", "e", None], dtype="string"), "a"] | ||
) | ||
@pytest.mark.parametrize("cmpop", _cmpops) | ||
@pytest.mark.parametrize( | ||
"cmp_obj", | ||
[pd.Series(["b", "a", None, "d", "f", None], dtype="string"), "a"], | ||
"obj, cmp_obj, cmpop", string_series_compare_test_cases() | ||
) | ||
def test_string_series_compare(obj, cmpop, cmp_obj): | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wrong. Python, and pandas raise
TypeError
for ordering between str and not-str:Which I think is achievable by
return NotImplemented
instead?