-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: use _values_for_argsort for join_non_unique, join_monotonic #32467
Conversation
I believe the only requirement on values_for_argsort is that it's a monotonic transformation (it preserves the ordering). I'm not sure why that copy was there. |
It isn't clear to me what distinguishes this from values_for_factorize, as the docstring there also says This also came up in #30673 (attempt to implement value_counts in terms of EA methods). See also #32412. Do we know of any 3rd-party EAs where _ndarray_values, values_for_argsort() and _values_for_factorize()[0] are meaningfully distinct? |
are there any user facing things that this now allows? e.g. joins on EA? |
behavior is unchanged |
thanks, pls followon with consoilidations when you can |
My feeling says that this should use |
I think you're right, but ATM _ndarray_values matches values_for_argsort for all our Index-backing EAs, but values_for_factorize()[0] is slightly different for DTA/TDA (_data vs asi8). If we determine that we can change DTA/TDA _values_for_factorize (possibly as part of the discussion in #32586) then ill switch over these usages. |
With the
.copy()
removed fromCategorical._values_for_argsort
,ea_backed_index._data._values_for_argsort()
matchesea_backed_index._ndarray_values
in all extant cases.cc @jorisvandenbossche @TomAugspurger need to confirm
a) this is an intended-adjacent use of _values_for_argsort, and not just a coincidence that it matches extant behavior
b) the
.copy()
this removes fromCategorical._values_for_argsort
is not important for some un-tested reasonxref #32452, #32426