Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: special case numpy.dtype in is_extension_array_dtype #39678

Merged

Conversation

jorisvandenbossche
Copy link
Member

Master:

dtype = np.dtype(float)
dtype2 = pd.Int32Dtype()

In [2]: %timeit pd.api.types.is_extension_array_dtype(dtype)
645 ns ± 59.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit pd.api.types.is_extension_array_dtype(dtype2)
251 ns ± 9.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

with this branch, it keeps the speed for ExtensionDtype the same, and improves for numpy.dtype (avoiding a extension dtype registry lookup):

In [2]: %timeit pd.api.types.is_extension_array_dtype(dtype)
328 ns ± 24.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [2]:  %timeit pd.api.types.is_extension_array_dtype(dtype2)
240 ns ± 19.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@jorisvandenbossche jorisvandenbossche added Performance Memory or execution speed performance Dtype Conversions Unexpected or buggy dtype conversions labels Feb 8, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Feb 8, 2021
Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jreback
Copy link
Contributor

jreback commented Feb 8, 2021

sure (assume you found this from existing asvs), if not can you add the one from above.

@jreback
Copy link
Contributor

jreback commented Feb 8, 2021

also a whatsnew mention is prob worth it.

@jorisvandenbossche
Copy link
Member Author

assume you found this from existing asvs, if not can you add the one from above.

Yes, it came from profiling the concat benchmarks (#39612), gives a small improvement. Now, apparently we don't have specific benchmarks for our dtype checks, so added one for is_extension_array_dtype (that could be supplemented with benchmarks for some of the other is_ checkers, but going to leave that for another issue/PR).

also a whatsnew mention is prob worth it.

Not sure it is worth it, this only gives a tiny improvement anywhere it is used (it was just lowing hanging fruit to shave off a percent, and with ArrayManager it gets used more, so getting more worth fixing it)

@jreback jreback merged commit be68850 into pandas-dev:master Feb 10, 2021
@jreback
Copy link
Contributor

jreback commented Feb 10, 2021

thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the perf-is_extension_array_dtype branch February 10, 2021 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants