Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Make validation check in regressor_column_matrix 300x faster #2642

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MarcoGorelli
Copy link
Contributor

This is something I've noticed on the way towards #2622 (which may not be that far off!)

The builtin Python max would need to iterate over elements, which is much slower than calling the native Series.max (where the algorithm would be in a low-level language)

Example:

In [43]: s = pd.Series(rng.integers(0, 10, size=1_000_000))

In [44]: %timeit max(s) > 1
65.4 ms ± 5.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [45]: %timeit s.max() > 1
221 μs ± 16.5 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

@MarcoGorelli MarcoGorelli marked this pull request as ready for review December 2, 2024 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants