-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP Series and DataFrame currently type-check as hashable #41283
TYP Series and DataFrame currently type-check as hashable #41283
Conversation
@MarcoGorelli does this also take care of #40013? |
Thanks @mzeitlin11 - indeed, it looks like they're duplicates, I'll close the new one I'd opened |
@jbrockmendel @jreback I've updated to just use The builtin error message will now be shown instead, so I've updated a few tests accordingly. Now, this is what'll be shown: In [1]: hash(pd.Series([1, 2, 3]))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-1f0c9bc1928b> in <module>
----> 1 hash(pd.Series([1, 2, 3]))
TypeError: unhashable type: 'Series' instead of In [1]: hash(pd.Series([1, 2, 3]))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-1f0c9bc1928b> in <module>
----> 1 hash(pd.Series([1, 2, 3]))
~/pandas-marco/pandas/core/generic.py in __hash__(self)
1874
1875 def __hash__(self) -> int:
-> 1876 raise TypeError(
1877 f"{repr(type(self).__name__)} objects are mutable, "
1878 f"thus they cannot be hashed"
TypeError: 'Series' objects are mutable, thus they cannot be hashed |
no objection here |
ok this seems reasonable. can you add a release note about the changing error message & rebase. ping on green. |
55a9dc8
to
dd54dc9
Compare
dd54dc9
to
89b56a4
Compare
cool - @jreback done, green |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MarcoGorelli lgtm. do we need tests for Index and EA (or do we have tests and not check the message)?
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -612,6 +612,7 @@ Other API changes | |||
- Partially initialized :class:`CategoricalDtype` (i.e. those with ``categories=None`` objects will no longer compare as equal to fully initialized dtype objects. | |||
- Accessing ``_constructor_expanddim`` on a :class:`DataFrame` and ``_constructor_sliced`` on a :class:`Series` now raise an ``AttributeError``. Previously a ``NotImplementedError`` was raised (:issue:`38782`) | |||
- Added new ``engine`` and ``**engine_kwargs`` parameters to :meth:`DataFrame.to_sql` to support other future "SQL engines". Currently we still only use ``SQLAlchemy`` under the hood, but more engines are planned to be supported such as ``turbodbc`` (:issue:`36893`) | |||
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``"'Series' objects are mutable, thus they cannot be hashed"`` (:issue:`40013`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit.
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``"'Series' objects are mutable, thus they cannot be hashed"`` (:issue:`40013`) | |
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``'Series' objects are mutable, thus they cannot be hashed`` (:issue:`40013`) |
pandas/core/frame.py
Outdated
if subset is None: | ||
subset = self.columns | ||
subset_iterable: Iterable = self.columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another nit, preference for
subset_iterable: Iterable
if subset is None:
subset_iterable = self.columns
elif (
to help distinguish cases were we need the variable type annotation (normally where the call to a function returns Any and would be redundant in the future when the return type of the called function is typed) vs a wider type used for a variable than the inferred type( i.e. the return type of the initial assignment to a variable)
@MarcoGorelli some comment and can you rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u rebase and ping on green
This is more than a message change because previously |
Indeed, you are correct (I wasn't aware that this would affect that): # here
In [4]: isinstance(Series([1,2,3]), collections.abc.Hashable)
Out[4]: False
#master
In [2]: isinstance(Series([1,2,3]), collections.abc.Hashable)
Out[2]: True
Agreed |
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -703,6 +703,7 @@ Other API changes | |||
- Added new ``engine`` and ``**engine_kwargs`` parameters to :meth:`DataFrame.to_sql` to support other future "SQL engines". Currently we still only use ``SQLAlchemy`` under the hood, but more engines are planned to be supported such as `turbodbc <https://turbodbc.readthedocs.io/en/latest/>`_ (:issue:`36893`) | |||
- Removed redundant ``freq`` from :class:`PeriodIndex` string representation (:issue:`41653`) | |||
- :meth:`ExtensionDtype.construct_array_type` is now a required method instead of an optional one for :class:`ExtensionDtype` subclasses (:issue:`24860`) | |||
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``'Series' objects are mutable, thus they cannot be hashed`` (:issue:`40013`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli do we need to add the change of isintance of collections.abc.Hashable?
@MarcoGorelli can you merge master to resolve conflicts |
@MarcoGorelli if you can merge master |
mypy will fail. I think undo all the changes to pandas/core/frame.py and just do
could maybe add link to #28770 as related. |
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -707,6 +707,7 @@ Other API changes | |||
- Added new ``engine`` and ``**engine_kwargs`` parameters to :meth:`DataFrame.to_sql` to support other future "SQL engines". Currently we still only use ``SQLAlchemy`` under the hood, but more engines are planned to be supported such as `turbodbc <https://turbodbc.readthedocs.io/en/latest/>`_ (:issue:`36893`) | |||
- Removed redundant ``freq`` from :class:`PeriodIndex` string representation (:issue:`41653`) | |||
- :meth:`ExtensionDtype.construct_array_type` is now a required method instead of an optional one for :class:`ExtensionDtype` subclasses (:issue:`24860`) | |||
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``'Series' objects are mutable, thus they cannot be hashed``. Furthermore, ``isinstance(Series, abc.collections.Hashable)`` will now return ``False`` (:issue:`40013`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``'Series' objects are mutable, thus they cannot be hashed``. Furthermore, ``isinstance(Series, abc.collections.Hashable)`` will now return ``False`` (:issue:`40013`) | |
- Calling ``hash`` on non-hashable pandas objects will now raise ``TypeError`` with the built-in error message (e.g. ``unhashable type: 'Series'``). Previously it would raise a custom message such as ``'Series' objects are mutable, thus they cannot be hashed``. Furthermore, ``isinstance(Series, collections.abc.Hashable)`` will now return ``False`` (:issue:`40013`) |
Series is a type so isinstance(Series, collections.abc.Hashable)
is True
maybe isinstance(<Series>, collections.abc.Hashable)
?
or undo all the changes to frame.py and could simply ignore and we can remove the ignore once #28770 is fixed (for all pandas objects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment. ping on green.
@@ -6180,7 +6180,10 @@ def f(vals) -> tuple[np.ndarray, int]: | |||
return labels.astype("i8", copy=False), len(shape) | |||
|
|||
if subset is None: | |||
subset = self.columns | |||
# Incompatible types in assignment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can eliminate these by either casting (as you do on L6195) or better just to assign to a new variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins do you have a preference here? In #41283 (comment) you'd suggested to put this instead of the cast and to remove it after #28770
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my preference is for the "fix later" ignore rather than changing code for a potential false positive
@@ -6180,7 +6180,10 @@ def f(vals) -> tuple[np.ndarray, int]: | |||
return labels.astype("i8", copy=False), len(shape) | |||
|
|||
if subset is None: | |||
subset = self.columns | |||
# Incompatible types in assignment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my preference is for the "fix later" ignore rather than changing code for a potential false positive
thanks @MarcoGorelli |
@meeseeksdev backport 1.3.x |
…-check as hashable
Something went wrong ... Please have a look at my logs. |
…hashable (#42299) Co-authored-by: Marco Edward Gorelli <[email protected]>
pandas.core.generic.NDFrame.__hash__
#40013