CLN: use _values_for_argsort for join_non_unique, join_monotonic #32467

jbrockmendel · 2020-03-05T18:29:29Z

With the .copy() removed from Categorical._values_for_argsort, ea_backed_index._data._values_for_argsort() matches ea_backed_index._ndarray_values in all extant cases.

cc @jorisvandenbossche @TomAugspurger need to confirm

a) this is an intended-adjacent use of _values_for_argsort, and not just a coincidence that it matches extant behavior
b) the .copy() this removes from Categorical._values_for_argsort is not important for some un-tested reason

xref #32452, #32426

TomAugspurger · 2020-03-05T22:08:31Z

I believe the only requirement on values_for_argsort is that it's a monotonic transformation (it preserves the ordering).

I'm not sure why that copy was there.

jbrockmendel · 2020-03-06T00:21:32Z

I believe the only requirement on values_for_argsort is that it's a monotonic transformation (it preserves the ordering).

It isn't clear to me what distinguishes this from values_for_factorize, as the docstring there also says An array suitable for factorization. This should maintain order and be a supported dtype. (the "supported dtype" bit isnt in the values_for_argsort docstring, but seems implied since it returns an ndarray)

This also came up in #30673 (attempt to implement value_counts in terms of EA methods). See also #32412.

Do we know of any 3rd-party EAs where _ndarray_values, values_for_argsort() and _values_for_factorize()[0] are meaningfully distinct?

pandas/core/indexes/base.py

jreback · 2020-03-08T15:55:32Z

are there any user facing things that this now allows? e.g. joins on EA?

jbrockmendel · 2020-03-08T16:34:04Z

are there any user facing things that this now allows? e.g. joins on EA?

behavior is unchanged

jreback · 2020-03-11T02:13:36Z

thanks, pls followon with consoilidations when you can

jorisvandenbossche · 2020-03-13T13:24:35Z

My feeling says that this should use _values_for_factorize, as joining is a factorize-based algo?

jbrockmendel · 2020-03-13T15:53:19Z

My feeling says that this should use _values_for_factorize, as joining is a factorize-based algo?

I think you're right, but ATM _ndarray_values matches values_for_argsort for all our Index-backing EAs, but values_for_factorize()[0] is slightly different for DTA/TDA (_data vs asi8). If we determine that we can change DTA/TDA _values_for_factorize (possibly as part of the discussion in #32586) then ill switch over these usages.

…das-dev#32467)

jbrockmendel added 2 commits March 5, 2020 10:22

CLN: use _values_for_argsort for join_non_unique, join_monotonic

9dd8cad

revert unnecessary

d5485e3

jreback requested changes Mar 8, 2020

View reviewed changes

pandas/core/indexes/base.py Show resolved Hide resolved

jreback added Clean ExtensionArray Extending pandas with custom dtypes or arrays. labels Mar 8, 2020

jbrockmendel mentioned this pull request Mar 10, 2020

EA: revisit interface #32586

Closed

jreback added this to the 1.1 milestone Mar 11, 2020

jreback approved these changes Mar 11, 2020

View reviewed changes

jreback merged commit d4815a5 into pandas-dev:master Mar 11, 2020

jbrockmendel mentioned this pull request Mar 11, 2020

REF: implement _get_engine_target #32611

Merged

jbrockmendel deleted the join_non_unique branch March 11, 2020 03:02

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020

CLN: use _values_for_argsort for join_non_unique, join_monotonic (pan…

04b025f

…das-dev#32467)

jorisvandenbossche mentioned this pull request Apr 3, 2020

EA interface - requirements for "hashable, value+order-preserving ndarray" #33276

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: use _values_for_argsort for join_non_unique, join_monotonic #32467

CLN: use _values_for_argsort for join_non_unique, join_monotonic #32467

jbrockmendel commented Mar 5, 2020

TomAugspurger commented Mar 5, 2020

jbrockmendel commented Mar 6, 2020

jreback commented Mar 8, 2020

jbrockmendel commented Mar 8, 2020

jreback commented Mar 11, 2020

jorisvandenbossche commented Mar 13, 2020

jbrockmendel commented Mar 13, 2020

CLN: use _values_for_argsort for join_non_unique, join_monotonic #32467

CLN: use _values_for_argsort for join_non_unique, join_monotonic #32467

Conversation

jbrockmendel commented Mar 5, 2020

TomAugspurger commented Mar 5, 2020

jbrockmendel commented Mar 6, 2020

jreback commented Mar 8, 2020

jbrockmendel commented Mar 8, 2020

jreback commented Mar 11, 2020

jorisvandenbossche commented Mar 13, 2020

jbrockmendel commented Mar 13, 2020