Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: __get_item__ accepts more types than it's type signature says #592

Closed
3 tasks done
anilbey opened this issue Mar 27, 2023 · 6 comments · Fixed by #596
Closed
3 tasks done

BUG: __get_item__ accepts more types than it's type signature says #592

anilbey opened this issue Mar 27, 2023 · 6 comments · Fixed by #596
Labels
Bug good first issue Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@anilbey
Copy link
Contributor

anilbey commented Mar 27, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from enum import Enum
import pandas as pd

class MyEnum(Enum):
    TAYYAR = "tayyar"
    HAYDAR = "haydar"

df = pd.DataFrame(data = [[12.2, 10], [8.8, 15], [22.1, 14]], columns=[MyEnum.TAYYAR, MyEnum.HAYDAR])
print(df[MyEnum.TAYYAR])

Issue Description

The example code above takes an Enum as column to a DataFrame and uses that enum in __get_item__.

It works fine, which creates an inconsistency with the signature of __get_item__ method.

Here is the signature of __get_item__. According to this signature, enums are not accepted as index.

def __getitem__(self, Union[str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta], /) -> Series[Any]

When you use a static type checker e.g. mypy on the code example above, it fails.

mypy_enum.py:12: error: No overload variant of "__getitem__" of "DataFrame" matches argument type "MyEnum"  [call-overload]
mypy_enum.py:12: note: Possible overload variants:
mypy_enum.py:12: note:     def __getitem__(self, Union[str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta], /) -> Series[Any]
mypy_enum.py:12: note:     def __getitem__(self, slice, /) -> DataFrame
mypy_enum.py:12: note:     def [ScalarT] __getitem__(self, Union[Tuple[Any, ...], Series[bool], DataFrame, List[str], List[ScalarT], Index, ndarray[Any, dtype[str_]], ndarray[Any, dtype[bool_]], Sequence[Tuple[Union[str, bytes, date, datetime, timedelta, bool, int, float, complex, Timestamp, Timedelta], ...]]], /) -> DataFrame
Found 1 error in 1 file (checked 1 source file)

More info: https://stackoverflow.com/questions/75614405/mypy-indexing-pd-dataframe-with-an-enum-raises-no-overload-variant-error

Expected Behavior

__getitem__'s signature should be updated to support the column type itself. The column type is often not limited to primitive types. In can be custom objects as well.

Installed Versions

assert '_distutils' in core.__file__, core.__file__

@anilbey anilbey added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 27, 2023
@twoertwein twoertwein transferred this issue from pandas-dev/pandas Mar 27, 2023
@twoertwein
Copy link
Member

Moved to pandas-stubs as pandas's DataFrame.__getitem__ has actually no annotations.

@anilbey
Copy link
Contributor Author

anilbey commented Mar 27, 2023

Does the annotation come from pandas-stubs?

@twoertwein
Copy link
Member

Does the annotation come from pandas-stubs?

I assume so. It definitely doesn't come from pandas itself (unless you created an empty file called py.typed inside the pandas folder).

@anilbey
Copy link
Contributor Author

anilbey commented Mar 27, 2023

I see. No I definitely haven't created that file :)

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Mar 28, 2023

I think if we add Hashable to the __getitem__() declaration here, that should fix this.

def __getitem__(self, idx: Scalar | tuple[Hashable, ...]) -> Series: ...

PR and tests welcome

@Dr-Irv Dr-Irv added good first issue Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 28, 2023
anilbey added a commit to anilbey/pandas-stubs that referenced this issue Mar 29, 2023
anilbey added a commit to anilbey/pandas-stubs that referenced this issue Mar 29, 2023
@anilbey
Copy link
Contributor Author

anilbey commented Mar 29, 2023

Thanks @Dr-Irv I just make a PR with a test.

Dr-Irv pushed a commit that referenced this issue Mar 29, 2023
* add Hashable type to __get_item__ #592

* add assert_type to test_types_getitem_with_hashable

* code style: formatting

* remove tuple[Hashable, ...] from get_item overload1

* added more assert_type to assure other overloads are ok
twoertwein pushed a commit to twoertwein/pandas-stubs that referenced this issue Apr 1, 2023
* add Hashable type to __get_item__ pandas-dev#592

* add assert_type to test_types_getitem_with_hashable

* code style: formatting

* remove tuple[Hashable, ...] from get_item overload1

* added more assert_type to assure other overloads are ok
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants