-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use _values_for_factorize by default for hashing ExtensionArrays #53475
Use _values_for_factorize by default for hashing ExtensionArrays #53475
Conversation
@jbrockmendel are you OK with this? |
No objections. |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry for the backport noise, the original PR was only for 2.1, so of course this fixup only needs to target 2.1 as well |
Does this need a whatsnew? This looks like only a regression on main. |
Good catch, sorry! I only realized after merging it was only a change on main and thus didn't need to be backported, but forgot that I actually added a whatsnew ... |
PR #51319 added the
EA._hash_pandas_object
method to let ExtensionArrays override how they are hashed. But it also changed to no longer use the values returned byEA._values_for_factorize()
by default for hashing, but change this toEA.to_numpy()
. The previous behaviour was documented, and changing this can cause regressions / changes in behaviour or performance (depending on the return values of those two methods).See https://github.com/pandas-dev/pandas/pull/51319/files#r1212106303 for some more details.