Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Series.__contains__ #1480

Merged
merged 10 commits into from
Dec 3, 2024
7 changes: 7 additions & 0 deletions narwhals/_arrow/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -1017,6 +1017,13 @@ def __iter__(self: Self) -> Iterator[Any]:
for x in self._native_series.__iter__()
)

def __contains__(self: Self, other: Any) -> bool:
import pyarrow.compute as pc # ignore-banned-imports

return maybe_extract_py_scalar( # type: ignore[no-any-return]
pc.is_in(other, self._native_series), return_py_scalar=True
)

@property
def shape(self: Self) -> tuple[int]:
return (len(self._native_series),)
Expand Down
5 changes: 5 additions & 0 deletions narwhals/_pandas_like/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -851,6 +851,11 @@ def rolling_mean(
def __iter__(self: Self) -> Iterator[Any]:
yield from self._native_series.__iter__()

def __contains__(self: Self, other: Any) -> bool:
if other is None:
return self._native_series.isna().any() # type: ignore[no-any-return]
FBruzzesi marked this conversation as resolved.
Show resolved Hide resolved
return other in self._native_series
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fell for the example contained in our why section!

There are two ways of doing this:

  • return other in self.to_numpy(): yet I don't think this is a good idea for cudf
  • return self._native_series.isin({other}).any()

Copy link
Contributor

@AlessandroMiola AlessandroMiola Dec 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask a general, possibly silly, question, just for the sake of better understanding? πŸ˜‡

I see the point in implementing __contains__ for pandas-like and arrow and also what you mention here. Instead, I haven't properly got the details behind #1443 (comment), as I don't see func(1, pa.chunked_array([data])) returning False.

Should we implement __contains__ at narwhals/series.py to ensure the above behaviour is explicitly triggered via __contains__ (provided that what I'm stating above is correct)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I don't see func(1, pa.chunked_array([data])) returning False.

Damn that's true... but now I don't know why, and I wonder what is being called. The following returns false:

1 in pa.chunked_array([[1,2,3]])
False

Should we implement contains at narwhals/series.py to ensure the above behaviour is explicitly triggered via contains?

I am addressing it in the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn that's true... but now I don't know why, and I wonder what is being called.

Me neither, had issues in trying to figure it out

I am addressing it in the PR

yeah, I see it; was just asking to confirm my understanding as I couldn't indeed see where this was coming from, without the explicit reference to __contains__ πŸ˜ƒ

Copy link
Member Author

@FBruzzesi FBruzzesi Dec 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, since we implement __iter__ the check works. Example:

class A:
    pass

class B:
    def __iter__(self):
        yield from [1,2,3]

0 in B(), 1 in B()
> (False, True)

1 in A()
> TypeError: argument of type 'A' is not iterable

When I wrote the comment, the result was false, and now it returns true due to the changes in #1471, lines. In fact before such change, iteration was returning arrow scalars

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks for the clarification!:)


def is_finite(self: Self) -> Self:
s = self._native_series
return self._from_native_series((s > float("-inf")) & (s < float("inf")))
Expand Down
3 changes: 3 additions & 0 deletions narwhals/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -3324,6 +3324,9 @@ def rolling_mean(
def __iter__(self: Self) -> Iterator[Any]:
yield from self._compliant_series.__iter__()

def __contains__(self: Self, other: Any) -> bool:
return other in self._compliant_series

@property
def str(self: Self) -> SeriesStringNamespace[Self]:
return SeriesStringNamespace(self)
Expand Down
23 changes: 23 additions & 0 deletions tests/series_only/__contains___test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from __future__ import annotations

from typing import TYPE_CHECKING

import pytest

import narwhals.stable.v1 as nw

if TYPE_CHECKING:
from tests.utils import ConstructorEager

data = [1, 2, None]


@pytest.mark.parametrize(("other", "expected"), [(1, True), (None, True), (3, False)])
def test_contains(
constructor_eager: ConstructorEager,
other: int | None,
expected: bool, # noqa: FBT001
) -> None:
s = nw.from_native(constructor_eager({"a": data}), eager_only=True)["a"]

assert (other in s) == expected
1 change: 1 addition & 0 deletions utils/check_api_reference.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
"value_counts",
"zip_with",
"__iter__",
"__contains__",
}
BASE_DTYPES = {
"NumericType",
Expand Down
Loading