Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use _values_for_factorize by default for hashing ExtensionArrays #53475

Merged
merged 5 commits into from
Jun 19, 2023

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented May 31, 2023

PR #51319 added the EA._hash_pandas_object method to let ExtensionArrays override how they are hashed. But it also changed to no longer use the values returned by EA._values_for_factorize() by default for hashing, but change this to EA.to_numpy(). The previous behaviour was documented, and changing this can cause regressions / changes in behaviour or performance (depending on the return values of those two methods).

See https://github.com/pandas-dev/pandas/pull/51319/files#r1212106303 for some more details.

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label May 31, 2023
@jorisvandenbossche jorisvandenbossche added this to the 2.0.3 milestone May 31, 2023
@jorisvandenbossche
Copy link
Member Author

@jbrockmendel are you OK with this?

@jbrockmendel
Copy link
Member

No objections.

@jorisvandenbossche jorisvandenbossche merged commit 2a1d2da into pandas-dev:main Jun 19, 2023
@jorisvandenbossche jorisvandenbossche deleted the EA-hash branch June 19, 2023 14:31
@lumberbot-app

This comment was marked as outdated.

@jorisvandenbossche
Copy link
Member Author

Sorry for the backport noise, the original PR was only for 2.1, so of course this fixup only needs to target 2.1 as well

@lithomas1
Copy link
Member

Does this need a whatsnew?

This looks like only a regression on main.
(If so, the whatsnew can be deleted)

@jorisvandenbossche
Copy link
Member Author

Good catch, sorry! I only realized after merging it was only a change on main and thus didn't need to be backported, but forgot that I actually added a whatsnew ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants