-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: handling of missing values in Index.__contains__ #59765
Comments
im not aware of a dedicated issue for this either. i think at one point I made a PR trying to make more of the EA subclasses use For the datetimelike cases i think/hope that mismatched NaTs will return False (i.e. |
Indeed, the
In the sense that it is not matched in general (again, except for categorical ..). But it seems also not be matched for object dtype with such decimal: Expanded table:
import numpy as np
import pandas as pd
from decimal import Decimal
# from conftest.py
indices_dict = {
"object-none": pd.Index(["a", None], dtype=object),
"object-nan": pd.Index(["a", np.nan], dtype=object),
"object-NA": pd.Index(["a", pd.NA], dtype=object),
"object-decimal-NaN": pd.Index(["a", Decimal("NaN")], dtype=object),
"datetime": pd.DatetimeIndex(["2024-01-01", "NaT"]),
"period": pd.PeriodIndex(["2024-01-01", None], freq="D"),
"timedelta": pd.TimedeltaIndex(["1 days", "NaT"]),
"float64": pd.Index([2.0, np.nan], dtype="float64"),
"categorical": pd.CategoricalIndex(["a", None]),
"interval": pd.IntervalIndex.from_tuples([(1, 2), np.nan]),
"nullable_int": pd.Index([2, None], dtype="Int64"),
"nullable_float": pd.Index([2.0, None], dtype="Float32"),
"string-python": pd.Index(["a", None], dtype="string[python]"),
"string-pyarrow": pd.Index(["a", None], dtype="string[pyarrow]"),
"str-python": pd.Index(["a", None], dtype=pd.StringDtype("pyarrow", na_value=np.nan))
}
results = []
for dtype, data in indices_dict.items():
for val in [None, np.nan, pd.NA, pd.NaT, np.datetime64("NaT"), np.timedelta64("NaT"), Decimal("NaN")]:
res = val in data
results.append((dtype, repr(val), res))
df = pd.DataFrame(results, columns=["dtype", "val", "result"])
df_overview = df.pivot(columns="val", index="dtype", values="result").reindex(columns=df["val"].unique(), index=df["dtype"].unique())
print(df_overview.astype(str).to_markdown()) |
The below table gives an overview of the result value for:
i.e. how
Index.__contains__
handles various missing value sentinels as input for the different data types.The last three rows with not a single True are specifically problematic, this seems a bug with the StringDtype
But more in general, this is quite inconsistent:
The code to generate the table above:
cc @jbrockmendel I would have expected we had issues about this, but didn't directly find anything
The text was updated successfully, but these errors were encountered: