-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Fix regression in datetime ops #17980
Conversation
HEAD: ``` In [1]: import pandas as pd; import numpy as np In [2]: s = pd.Series(pd.to_datetime(np.arange(100000), unit='ms')) In [3]: %timeit s - s.shift() 2.73 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` 0.21.0rc1: ``` 527 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` 0.20.3 ``` 2.4 ms ± 57.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ```
This also seems to help with the head:
master:
|
Codecov Report
@@ Coverage Diff @@
## master #17980 +/- ##
==========================================
- Coverage 91.23% 91.22% -0.02%
==========================================
Files 163 163
Lines 50113 50113
==========================================
- Hits 45723 45714 -9
- Misses 4390 4399 +9
Continue to review full report at Codecov.
|
Cool, this also fixes the slowdown in the stata benchmarks: head:
master:
|
Are the |
On master they were being called once per row. The change here skips that, since if we have a |
pandas/core/ops.py
Outdated
@@ -622,6 +622,10 @@ def _is_offset(self, arr_or_obj): | |||
""" check if obj or all elements of list-like is DateOffset """ | |||
if isinstance(arr_or_obj, ABCDateOffset): | |||
return True | |||
elif (is_datetime64_dtype(arr_or_obj) or | |||
is_timedelta64_dtype(arr_or_obj)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catch period here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What binary ops is a PeriodIndex
valid in (if any)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_period
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, we have is_period_dtype
which is the correct to use here
pandas/core/ops.py
Outdated
elif (is_datetime64_dtype(arr_or_obj) or | ||
is_timedelta64_dtype(arr_or_obj)): | ||
# Don't want to check elementwise for Series / array of datetime | ||
return False | ||
elif is_list_like(arr_or_obj) and len(arr_or_obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you actually only do this check if is_object_dtype
is True in the first place, otherwise no point in iterating at all (if its not a scalar ABCDateOffset
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think d0ea5dc has want you meant, much cleaner.
Thanks for hunting those down! |
* PERF: Fix regression in datetime ops HEAD: ``` In [1]: import pandas as pd; import numpy as np In [2]: s = pd.Series(pd.to_datetime(np.arange(100000), unit='ms')) In [3]: %timeit s - s.shift() 2.73 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` 0.21.0rc1: ``` 527 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` 0.20.3 ``` 2.4 ms ± 57.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` * timedelta too * Clean up the fix
* PERF: Fix regression in datetime ops HEAD: ``` In [1]: import pandas as pd; import numpy as np In [2]: s = pd.Series(pd.to_datetime(np.arange(100000), unit='ms')) In [3]: %timeit s - s.shift() 2.73 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` 0.21.0rc1: ``` 527 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` 0.20.3 ``` 2.4 ms ± 57.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` * timedelta too * Clean up the fix
HEAD:
0.21.0rc1:
0.20.3
xref #17861