Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP Series and DataFrame currently type-check as hashable #41283

Merged
merged 13 commits into from
Jun 29, 2021

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented May 3, 2021

@MarcoGorelli MarcoGorelli added the Typing type annotations, mypy/pyright type checking label May 3, 2021
@mzeitlin11
Copy link
Member

@MarcoGorelli does this also take care of #40013?

@MarcoGorelli
Copy link
Member Author

Thanks @mzeitlin11 - indeed, it looks like they're duplicates, I'll close the new one I'd opened

@MarcoGorelli MarcoGorelli marked this pull request as ready for review May 3, 2021 15:04
@MarcoGorelli MarcoGorelli marked this pull request as draft May 3, 2021 16:04
@MarcoGorelli MarcoGorelli marked this pull request as ready for review May 3, 2021 16:56
@MarcoGorelli MarcoGorelli requested a review from jbrockmendel May 3, 2021 16:57
pandas/core/generic.py Outdated Show resolved Hide resolved
pandas/core/generic.py Outdated Show resolved Hide resolved
@MarcoGorelli MarcoGorelli marked this pull request as draft May 8, 2021 10:48
@MarcoGorelli MarcoGorelli marked this pull request as ready for review May 8, 2021 13:30
@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented May 8, 2021

@jbrockmendel @jreback I've updated to just use __hash__: None, which is what they do in typeshed (https://github.com/python/typeshed/blob/3fa5988a2ac57cf8ce56e2de171a62b39952f96c/stdlib/builtins.pyi#L602)

The builtin error message will now be shown instead, so I've updated a few tests accordingly. Now, this is what'll be shown:

In [1]: hash(pd.Series([1, 2, 3]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-1f0c9bc1928b> in <module>
----> 1 hash(pd.Series([1, 2, 3]))

TypeError: unhashable type: 'Series'

instead of

In [1]: hash(pd.Series([1, 2, 3]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-1f0c9bc1928b> in <module>
----> 1 hash(pd.Series([1, 2, 3]))

~/pandas-marco/pandas/core/generic.py in __hash__(self)
   1874 
   1875     def __hash__(self) -> int:
-> 1876         raise TypeError(
   1877             f"{repr(type(self).__name__)} objects are mutable, "
   1878             f"thus they cannot be hashed"

TypeError: 'Series' objects are mutable, thus they cannot be hashed

@MarcoGorelli MarcoGorelli marked this pull request as draft May 8, 2021 15:09
@MarcoGorelli MarcoGorelli marked this pull request as ready for review May 8, 2021 16:55
@jbrockmendel
Copy link
Member

no objection here

@jreback jreback added this to the 1.3 milestone May 12, 2021
@jreback
Copy link
Contributor

jreback commented May 12, 2021

ok this seems reasonable. can you add a release note about the changing error message & rebase. ping on green.

@MarcoGorelli MarcoGorelli force-pushed the non-hashable-series-or-frame branch 2 times, most recently from 55a9dc8 to dd54dc9 Compare May 12, 2021 18:53
@MarcoGorelli MarcoGorelli marked this pull request as draft May 12, 2021 19:06
@MarcoGorelli MarcoGorelli force-pushed the non-hashable-series-or-frame branch from dd54dc9 to 89b56a4 Compare May 12, 2021 19:33
@MarcoGorelli MarcoGorelli marked this pull request as ready for review May 12, 2021 19:33
@MarcoGorelli
Copy link
Member Author

cool - @jreback done, green

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarcoGorelli lgtm. do we need tests for Index and EA (or do we have tests and not check the message)?

@@ -612,6 +612,7 @@ Other API changes
- Partially initialized :class:`CategoricalDtype` (i.e. those with ``categories=None`` objects will no longer compare as equal to fully initialized dtype objects.
- Accessing ``_constructor_expanddim`` on a :class:`DataFrame` and ``_constructor_sliced`` on a :class:`Series` now raise an ``AttributeError``. Previously a ``NotImplementedError`` was raised (:issue:`38782`)
- Added new ``engine`` and ``**engine_kwargs`` parameters to :meth:`DataFrame.to_sql` to support other future "SQL engines". Currently we still only use ``SQLAlchemy`` under the hood, but more engines are planned to be supported such as ``turbodbc`` (:issue:`36893`)
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``"'Series' objects are mutable, thus they cannot be hashed"`` (:issue:`40013`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit.

Suggested change
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``"'Series' objects are mutable, thus they cannot be hashed"`` (:issue:`40013`)
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``'Series' objects are mutable, thus they cannot be hashed`` (:issue:`40013`)

Comment on lines 6154 to 6155
if subset is None:
subset = self.columns
subset_iterable: Iterable = self.columns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nit, preference for

        subset_iterable: Iterable
        if subset is None:
            subset_iterable = self.columns
        elif (

to help distinguish cases were we need the variable type annotation (normally where the call to a function returns Any and would be redundant in the future when the return type of the called function is typed) vs a wider type used for a variable than the inferred type( i.e. the return type of the initial assignment to a variable)

@jreback
Copy link
Contributor

jreback commented May 26, 2021

@MarcoGorelli some comment and can you rebase

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u rebase and ping on green

@mwaskom
Copy link
Contributor

mwaskom commented Jun 19, 2021

This is more than a message change because previously isinstance(x, collections.abc.Hashable) would return True and now it will return False, right? (That said, the existing behavior is wrong, so IMO fixing this sooner rather than later would be helpful ... I currently have an awkward workaround to deal with it in a way that mypy understands).

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Jun 19, 2021

This is more than a message change because previously isinstance(x, collections.abc.Hashable) would return True and now it will return False, right?

Indeed, you are correct (I wasn't aware that this would affect that):

# here
In [4]: isinstance(Series([1,2,3]), collections.abc.Hashable)
Out[4]: False

#master
In [2]: isinstance(Series([1,2,3]), collections.abc.Hashable)
Out[2]: True

IMO fixing this sooner rather than later would be helpful

Agreed

This was referenced Jun 22, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone Jun 25, 2021
@@ -703,6 +703,7 @@ Other API changes
- Added new ``engine`` and ``**engine_kwargs`` parameters to :meth:`DataFrame.to_sql` to support other future "SQL engines". Currently we still only use ``SQLAlchemy`` under the hood, but more engines are planned to be supported such as `turbodbc <https://turbodbc.readthedocs.io/en/latest/>`_ (:issue:`36893`)
- Removed redundant ``freq`` from :class:`PeriodIndex` string representation (:issue:`41653`)
- :meth:`ExtensionDtype.construct_array_type` is now a required method instead of an optional one for :class:`ExtensionDtype` subclasses (:issue:`24860`)
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``'Series' objects are mutable, thus they cannot be hashed`` (:issue:`40013`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli do we need to add the change of isintance of collections.abc.Hashable?

@simonjayhawkins
Copy link
Member

@MarcoGorelli can you merge master to resolve conflicts

@jreback
Copy link
Contributor

jreback commented Jun 25, 2021

@MarcoGorelli if you can merge master

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Jun 26, 2021

mypy will fail.

I think undo all the changes to pandas/core/frame.py

and just do

        if subset is None:
            subset = cast(Sequence[Hashable], self.columns)
        elif (

could maybe add link to #28770 as related.

@@ -707,6 +707,7 @@ Other API changes
- Added new ``engine`` and ``**engine_kwargs`` parameters to :meth:`DataFrame.to_sql` to support other future "SQL engines". Currently we still only use ``SQLAlchemy`` under the hood, but more engines are planned to be supported such as `turbodbc <https://turbodbc.readthedocs.io/en/latest/>`_ (:issue:`36893`)
- Removed redundant ``freq`` from :class:`PeriodIndex` string representation (:issue:`41653`)
- :meth:`ExtensionDtype.construct_array_type` is now a required method instead of an optional one for :class:`ExtensionDtype` subclasses (:issue:`24860`)
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``'Series' objects are mutable, thus they cannot be hashed``. Furthermore, ``isinstance(Series, abc.collections.Hashable)`` will now return ``False`` (:issue:`40013`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``'Series' objects are mutable, thus they cannot be hashed``. Furthermore, ``isinstance(Series, abc.collections.Hashable)`` will now return ``False`` (:issue:`40013`)
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``'Series' objects are mutable, thus they cannot be hashed``. Furthermore, ``isinstance(Series, collections.abc.Hashable)`` will now return ``False`` (:issue:`40013`)

Series is a type so isinstance(Series, collections.abc.Hashable) is True

maybe isinstance(<Series>, collections.abc.Hashable)?

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Jun 26, 2021

and just do

        if subset is None:
            subset = cast(Sequence[Hashable], self.columns)
        elif (

or undo all the changes to frame.py and could simply ignore pandas/core/frame.py:6183: error: Incompatible types in assignment (expression has type "Index", variable has type "Union[Hashable, Sequence[Hashable], None]") [assignment]

and we can remove the ignore once #28770 is fixed (for all pandas objects)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment. ping on green.

@@ -6180,7 +6180,10 @@ def f(vals) -> tuple[np.ndarray, int]:
return labels.astype("i8", copy=False), len(shape)

if subset is None:
subset = self.columns
# Incompatible types in assignment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can eliminate these by either casting (as you do on L6195) or better just to assign to a new variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins do you have a preference here? In #41283 (comment) you'd suggested to put this instead of the cast and to remove it after #28770

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my preference is for the "fix later" ignore rather than changing code for a potential false positive

@@ -6180,7 +6180,10 @@ def f(vals) -> tuple[np.ndarray, int]:
return labels.astype("i8", copy=False), len(shape)

if subset is None:
subset = self.columns
# Incompatible types in assignment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my preference is for the "fix later" ignore rather than changing code for a potential false positive

@jreback jreback merged commit cd96ad8 into pandas-dev:master Jun 29, 2021
@jreback
Copy link
Contributor

jreback commented Jun 29, 2021

thanks @MarcoGorelli

@jreback
Copy link
Contributor

jreback commented Jun 29, 2021

@meeseeksdev backport 1.3.x

@lumberbot-app
Copy link

lumberbot-app bot commented Jun 29, 2021

Something went wrong ... Please have a look at my logs.

@MarcoGorelli MarcoGorelli deleted the non-hashable-series-or-frame branch June 29, 2021 12:31
simonjayhawkins pushed a commit that referenced this pull request Jun 29, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
pckSF added a commit to pckSF/pandas that referenced this pull request Jul 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Issues with return type hint and/or definition of pandas.core.generic.NDFrame.__hash__
6 participants