Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Add test for Expr.is_in() with Series input #1524

Closed
CangyuanLi opened this issue Dec 6, 2024 · 8 comments · Fixed by #1616
Closed

tests: Add test for Expr.is_in() with Series input #1524

CangyuanLi opened this issue Dec 6, 2024 · 8 comments · Fixed by #1616
Assignees
Labels
good first issue Good for newcomers, but anyone is welcome to submit a pull request! tests

Comments

@CangyuanLi
Copy link

Describe the bug

Hi, thanks for the awesome package! I noticed a bug with .is_in() when the Narwhals DataFrame is Polars-backed. When .si_in() is passed a Narwhals Polars-backed Series, it errors with

ValueError: can only call '.item()' if the Series is of length 1, or an explicit index is provided

You can see the behavior in the code below, which shows that it works when passed a list or a Polars Series. The code also shows the same bug does not happen with a Narwhals Pandas-backed Series.

Steps or code to reproduce the bug

import narwhals as nw
import narwhals.typing as nwt
import pandas as pd
import polars as pl

# Pandas

df = pd.DataFrame({"a": [1, 2, 3]})
df2 = pd.DataFrame({"b": [1, 0, 2]})
nw_df = nw.from_native(df)
nw_df2 = nw.from_native(df2)

print(nw_df.filter(nw.col("a").is_in(df2["b"])).to_native())
print(nw_df.filter(nw.col("a").is_in(nw_df2["b"])).to_native())

# Polars

df = pl.DataFrame({"a": [1, 2, 3]})
df2 = pl.DataFrame({"b": [1, 0, 2]})
nw_df = nw.from_native(df)
nw_df2 = nw.from_native(df2)

print(nw_df.filter(nw.col("a").is_in(df2["b"])).to_native())
print(nw_df.filter(nw.col("a").is_in(nw_df2["b"].to_list())).to_native())
print(nw_df.filter(nw.col("a").is_in(nw_df2["b"])).to_native())  # doesn't work!

Expected results

I expect the Pandas and Polars-backed frames to be the same.

Actual results

ValueError: can only call '.item()' if the Series is of length 1, or an explicit index is provided (Series is of length 3)

Please run narwhals.show_version() and enter the output below.

System:
    python: 3.10.6 (main, Oct 24 2022, 16:07:47) [GCC 11.2.0]
executable: /opt/conda/envs/Python-3.10-Premium/bin/python
   machine: Linux-4.18.0-372.98.1.el8_6.x86_64-x86_64-with-glibc2.28

Python dependencies:
     narwhals: 1.15.1
       pandas: 2.2.3
       polars: 1.16.0
         cudf: 
        modin: 
      pyarrow: 18.1.0
        numpy: 1.26.4

Relevant log output

No response

@MarcoGorelli
Copy link
Member

thanks @CangyuanLi for your kind works, and for your report

I just tried this on the main branch, and it looks like it works - just gonna see when it might have got fixed

@MarcoGorelli
Copy link
Member

In fact, it also works with 1.15.2 - could you try upgrading your Narwhals version please? pip install -U narwhals

@CangyuanLi
Copy link
Author

Oh it does, thanks so much!

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Dec 6, 2024

We don't have an explicit test case for this though, it looks like it got fixed as a positive side effect of #1480, good one @FBruzzesi !

Looks like it's because Polars explicitly check for Collection:

https://github.com/pola-rs/polars/blob/3985573cd1b2aedce18e243140e6b0b57696e9ba/py-polars/polars/expr/expr.py#L5778-L5784

Reopening then as a "needs test" issue

The task is, in is_in_test.py, to add a test which tests df.filter(nw.col("a").is_in(df_other["b"])). It should use the constructor_eager fixture. Please see the contributing guide for how to get started https://github.com/narwhals-dev/narwhals/blob/main/CONTRIBUTING.md

@MarcoGorelli MarcoGorelli reopened this Dec 6, 2024
@MarcoGorelli MarcoGorelli added good first issue Good for newcomers, but anyone is welcome to submit a pull request! tests reserved for sprint PyData Global 2024, PyLadiesCon 2024 labels Dec 6, 2024
@MarcoGorelli MarcoGorelli changed the title [Bug]: .is_in() fails when passing in a Polars-backed Narwhals column tests: Add test for Expr.is_in() with Series input Dec 6, 2024
@FBruzzesi
Copy link
Member

FBruzzesi commented Dec 6, 2024

Cool to see a nice side effect 😇
Long term plan would be to add support for expr's as well, similar to what's mentioned in #730 💪🏼

@jddelaune
Copy link

I'll give this a shot!

@MarcoGorelli MarcoGorelli removed the reserved for sprint PyData Global 2024, PyLadiesCon 2024 label Dec 8, 2024
@skritsotalakis
Copy link
Contributor

Hi!
Is this still available?

@MarcoGorelli
Copy link
Member

yup!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers, but anyone is welcome to submit a pull request! tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants