-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: values_for_argsort, values_for_factorize, from_factorized #53501
Comments
So what's the exact alternative that you are proposing? If we remove That doesn't really sound as "less stuff" for EA authors. But also not necessarily for pandas, as we would need to expose some of the lower level utility functions publicly to help EA authors to implement those methods, so also increasing the API surface on our side.
I think this could easily be solved by allowing |
Let's focus on values_for_factorize (vff) for now. Let's also assume that something like #53696 gets rid of the I propose adding a deprecation to the base class |
That would mean that we have to expose the To be clear, I have no problems with doing that. I am mostly not fully convinced we have much to gain here for the trouble going through this deprecation. If |
An alternative would be for the suggested implementation to wrap the numpy values in PandasArray and call that object's factorize/_hash_pandas_object. |
History: when originally designing EAs there was a hope/thought that many methods could be implemented in terms of a small number of core methods, of which values_for_factorize (vff) and values_for_argsort (vfa) were two of the main ones. Over time we found that many of the places we used these other than factorize/argsort were causing problems and they got pruned.
At this point we are down to only a few internal uses of each. _from_factorized is used only in EA.factorize. vfa is used in EA.argsort, EA.rank, and nargminmax (which in turn is used in EA.argmin/argmax). vff is used in EA.factorize and merge._factorize_keys. #53475 will restore it as being used in hash_pandas_object.
We should deprecate these patterns entirely.
2b) In factorize_keys we special-case MaskedDtype and ArrowDtype to avoid this performance hit. That special-casing is a code smell.
Implementation-wise, a deprecation could look like:
The text was updated successfully, but these errors were encountered: