PERF: Fix regression in datetime ops #17980

TomAugspurger · 2017-10-25T15:57:33Z

HEAD:

In [1]: import pandas as pd; import numpy as np
In [2]: s = pd.Series(pd.to_datetime(np.arange(100000), unit='ms'))
In [3]: %timeit s - s.shift()
2.73 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

0.21.0rc1:

527 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

0.20.3

2.4 ms ± 57.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

xref #17861

HEAD: ``` In [1]: import pandas as pd; import numpy as np In [2]: s = pd.Series(pd.to_datetime(np.arange(100000), unit='ms')) In [3]: %timeit s - s.shift() 2.73 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` 0.21.0rc1: ``` 527 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` 0.20.3 ``` 2.4 ms ± 57.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ```

TomAugspurger · 2017-10-25T16:05:00Z

This also seems to help with the timeseries.SeriesArithmetic.time_add_offset benchmarks:

head:

[ 33.33%] ··· Running timeseries.SeriesArithmetic.time_add_offset_delta                                                                               4.30ms
[ 66.67%] ··· Running timeseries.SeriesArithmetic.time_add_offset_fast                                                                                11.0ms
[100.00%] ··· Running timeseries.SeriesArithmetic.time_add_offset_slow                                                                                 703ms

master:

[ 33.33%] ··· Running timeseries.SeriesArithmetic.time_add_offset_delta                                                                                499ms
[ 66.67%] ··· Running timeseries.SeriesArithmetic.time_add_offset_fast                                                                                 489ms
[100.00%] ··· Running timeseries.SeriesArithmetic.time_add_offset_slow                                                                                 1.05s

codecov · 2017-10-25T17:16:46Z

Codecov Report

Merging #17980 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #17980      +/-   ##
==========================================
- Coverage   91.23%   91.22%   -0.02%     
==========================================
  Files         163      163              
  Lines       50113    50113              
==========================================
- Hits        45723    45714       -9     
- Misses       4390     4399       +9

Flag	Coverage Δ
#multiple	`89.03% <100%> (ø)`	⬆️
#single	`40.31% <100%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/ops.py	`91.77% <100%> (ø)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.75% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36c309e...d0ea5dc. Read the comment docs.

TomAugspurger · 2017-10-25T17:33:49Z

Cool, this also fixes the slowdown in the stata benchmarks:

head:

[  0.00%] ·· Building for existing-py_Users_taugspurger_Envs_pandas-dev_bin_python3.6
[  0.00%] ·· Benchmarking existing-py_Users_taugspurger_Envs_pandas-dev_bin_python3.6
[ 25.00%] ··· Running packers.STATA.time_write_stata                                                                                                  33.2ms
[ 50.00%] ··· Running packers.STATA.time_write_stata_with_validation                                                                                  49.5ms
[ 75.00%] ··· Running packers.packers_read_stata.time_packers_read_stata                                                                              39.6ms
[100.00%] ··· Running packers.packers_read_stata_with_validation.time_packers_read_stata_with_validation                                              48.0ms

master:

[  0.00%] ·· Building for existing-py_Users_taugspurger_Envs_pandas-dev_bin_python3.6
[  0.00%] ·· Benchmarking existing-py_Users_taugspurger_Envs_pandas-dev_bin_python3.6
[ 25.00%] ··· Running packers.STATA.time_write_stata                                                                                                   502ms
[ 50.00%] ··· Running packers.STATA.time_write_stata_with_validation                                                                                   529ms
[ 75.00%] ··· Running packers.packers_read_stata.time_packers_read_stata                                                                               631ms
[100.00%] ··· Running packers.packers_read_stata_with_validation.time_packers_read_stata_with_validation                                               650ms

jbrockmendel · 2017-10-25T18:07:38Z

Are the isinstance checks being called zillions of times? ABCDateOffset was introduced recently, and isinstance(x, ABCFoo) is about 2x slower than isinstance(x, Foo)

TomAugspurger · 2017-10-25T18:09:41Z

Are the isinstance checks being called zillions of times?

On master they were being called once per row. The change here skips that, since if we have a datetime64 or timedelta64 array, then we know none of them are offsets.

jreback · 2017-10-25T18:36:51Z

pandas/core/ops.py

@@ -622,6 +622,10 @@ def _is_offset(self, arr_or_obj):
        """ check if obj or all elements of list-like is DateOffset """
        if isinstance(arr_or_obj, ABCDateOffset):
            return True
+        elif (is_datetime64_dtype(arr_or_obj) or
+              is_timedelta64_dtype(arr_or_obj)):


catch period here as well?

What binary ops is a PeriodIndex valid in (if any)?

is_period

sorry, we have is_period_dtype which is the correct to use here

jreback · 2017-10-25T22:28:31Z

pandas/core/ops.py

+        elif (is_datetime64_dtype(arr_or_obj) or
+              is_timedelta64_dtype(arr_or_obj)):
+            # Don't want to check elementwise for Series / array of datetime
+            return False
        elif is_list_like(arr_or_obj) and len(arr_or_obj):


you actually only do this check if is_object_dtype is True in the first place, otherwise no point in iterating at all (if its not a scalar ABCDateOffset)

I think d0ea5dc has want you meant, much cleaner.

jorisvandenbossche · 2017-10-26T13:58:23Z

Thanks for hunting those down!

* PERF: Fix regression in datetime ops HEAD: ``` In [1]: import pandas as pd; import numpy as np In [2]: s = pd.Series(pd.to_datetime(np.arange(100000), unit='ms')) In [3]: %timeit s - s.shift() 2.73 ms ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` 0.21.0rc1: ``` 527 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` 0.20.3 ``` 2.4 ms ± 57.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` * timedelta too * Clean up the fix

TomAugspurger added 2 commits October 25, 2017 10:54

timedelta too

c5d2751

TomAugspurger added this to the 0.21.0 milestone Oct 25, 2017

TomAugspurger added the Performance Memory or execution speed performance label Oct 25, 2017

TomAugspurger mentioned this pull request Oct 25, 2017

PERF: Regressions since 0.20.3 #17861

Closed

jreback reviewed Oct 25, 2017

View reviewed changes

jreback requested changes Oct 25, 2017

View reviewed changes

Clean up the fix

d0ea5dc

jorisvandenbossche approved these changes Oct 26, 2017

View reviewed changes

TomAugspurger merged commit 6779ac0 into pandas-dev:master Oct 26, 2017

TomAugspurger deleted the dtype-infer branch October 26, 2017 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Fix regression in datetime ops #17980

PERF: Fix regression in datetime ops #17980

TomAugspurger commented Oct 25, 2017

TomAugspurger commented Oct 25, 2017

codecov bot commented Oct 25, 2017 •

edited

Loading

TomAugspurger commented Oct 25, 2017

jbrockmendel commented Oct 25, 2017

TomAugspurger commented Oct 25, 2017

jreback Oct 25, 2017

TomAugspurger Oct 25, 2017 •

edited

Loading

jreback Oct 25, 2017

jreback Oct 25, 2017

jreback Oct 25, 2017

TomAugspurger Oct 26, 2017

jorisvandenbossche commented Oct 26, 2017

PERF: Fix regression in datetime ops #17980

PERF: Fix regression in datetime ops #17980

Conversation

TomAugspurger commented Oct 25, 2017

TomAugspurger commented Oct 25, 2017

codecov bot commented Oct 25, 2017 • edited Loading

Codecov Report

TomAugspurger commented Oct 25, 2017

jbrockmendel commented Oct 25, 2017

TomAugspurger commented Oct 25, 2017

jreback Oct 25, 2017

Choose a reason for hiding this comment

TomAugspurger Oct 25, 2017 • edited Loading

Choose a reason for hiding this comment

jreback Oct 25, 2017

Choose a reason for hiding this comment

jreback Oct 25, 2017

Choose a reason for hiding this comment

jreback Oct 25, 2017

Choose a reason for hiding this comment

TomAugspurger Oct 26, 2017

Choose a reason for hiding this comment

jorisvandenbossche commented Oct 26, 2017

codecov bot commented Oct 25, 2017 •

edited

Loading

TomAugspurger Oct 25, 2017 •

edited

Loading