Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: restore performance for unsorted CategoricalDtype comparison #27448

Merged
merged 1 commit into from
Jul 18, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Jul 18, 2019

Fixes a performance regression introduced in #26403 very shortly before 0.25.0rc0 was cut, which can be seen here
Screenshot from 2019-07-17 23-29-08

When comparing CategoricalDtypes with ordered=False (the default), a hash is currently done that is relatively slow even for a small number of categories. If we check if the categories happen to be the same and in the same order, we see a significant speedup in the equal case. The .equals() function pre-exits, so overhead in the non-equal case should be minimal.

asv results:

       before           after         ratio
     [a4c19e7a]       [6810ff2f]
     <unsorted_cats~1>       <unsorted_cats>
-       179±0.4μs       25.6±0.4μs     0.14  indexing.CategoricalIndexIndexing.time_getitem_slice('monotonic_decr')
-         179±2μs         25.2±1μs     0.14  indexing.CategoricalIndexIndexing.time_getitem_slice('monotonic_incr')
-       180±0.4μs       25.1±0.3μs     0.14  indexing.CategoricalIndexIndexing.time_getitem_slice('non_monotonic')
  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jreback jreback added Categorical Categorical Data Type Performance Memory or execution speed performance labels Jul 18, 2019
@jreback jreback added this to the 0.25.0 milestone Jul 18, 2019
@jreback jreback merged commit 1857bff into pandas-dev:master Jul 18, 2019
@jreback
Copy link
Contributor

jreback commented Jul 18, 2019

thanks @qwhelan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants