PERF: restore performance for unsorted CategoricalDtype comparison #27448

qwhelan · 2019-07-18T06:39:11Z

Fixes a performance regression introduced in #26403 very shortly before 0.25.0rc0 was cut, which can be seen here

When comparing CategoricalDtypes with ordered=False (the default), a hash is currently done that is relatively slow even for a small number of categories. If we check if the categories happen to be the same and in the same order, we see a significant speedup in the equal case. The .equals() function pre-exits, so overhead in the non-equal case should be minimal.

asv results:

       before           after         ratio
     [a4c19e7a]       [6810ff2f]
     <unsorted_cats~1>       <unsorted_cats>
-       179±0.4μs       25.6±0.4μs     0.14  indexing.CategoricalIndexIndexing.time_getitem_slice('monotonic_decr')
-         179±2μs         25.2±1μs     0.14  indexing.CategoricalIndexIndexing.time_getitem_slice('monotonic_incr')
-       180±0.4μs       25.1±0.3μs     0.14  indexing.CategoricalIndexIndexing.time_getitem_slice('non_monotonic')

jreback · 2019-07-18T10:53:56Z

thanks @qwhelan

PERF: restore performance for unsorted CategoricalDtype comparison

28b7714

qwhelan force-pushed the unsorted_cats branch from 6810ff2 to 28b7714 Compare July 18, 2019 07:41

jreback added Categorical Categorical Data Type Performance Memory or execution speed performance labels Jul 18, 2019

jreback added this to the 0.25.0 milestone Jul 18, 2019

jreback merged commit 1857bff into pandas-dev:master Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: restore performance for unsorted CategoricalDtype comparison #27448

PERF: restore performance for unsorted CategoricalDtype comparison #27448

qwhelan commented Jul 18, 2019 •

edited

Loading

jreback commented Jul 18, 2019

PERF: restore performance for unsorted CategoricalDtype comparison #27448

PERF: restore performance for unsorted CategoricalDtype comparison #27448

Conversation

qwhelan commented Jul 18, 2019 • edited Loading

jreback commented Jul 18, 2019

qwhelan commented Jul 18, 2019 •

edited

Loading