Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Numba to rolling.apply #29

Merged

Conversation

mroeschke
Copy link
Collaborator

@mroeschke mroeschke commented Sep 18, 2019

@mroeschke
Copy link
Collaborator Author

mroeschke commented Sep 18, 2019

Here's the performance comparison so far. Unfortunately cython is still faster for the ndarray case.

# This branch
In [1]: s = pd.Series(range(10000))

In [2]: f = lambda x: np.sum(x) + 5

# raw is unused; row is always an ndarray
In [4]: %timeit s.rolling(10).apply(f, raw=False)
219 ms ± 29.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: %timeit s.rolling(10).apply(f, raw=True)
226 ms ± 26.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Master
In [1]: s = pd.Series(range(10000))

In [2]: f = lambda x: np.sum(x) + 5

# row passed as a Series
In [4]: %timeit s.rolling(10).apply(f, raw=False)
1.19 s ± 5.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# row passed as a ndarray
In [5]: %timeit s.rolling(10).apply(f, raw=True)
32.3 ms ± 366 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

@mroeschke
Copy link
Collaborator Author

Additionally, apply currently supports passing *args and **kwargs into the function per row. When njitting the passed function from the user, passing kwargs is currently unsupported. numba/numba#2916. And binding these kwargs with functools.partial beforehand is unsupported, numba/numba#4587

@mroeschke
Copy link
Collaborator Author

I was able to solve the performance problem in aa9644c. The biggest issue was compiling the njit rolling_apply function every time. If we dynamically create the rolling apply function with the passed argument, cache the function, and call it again, performance beats cython.

In [1]: s = pd.Series(range(10000))

# r, a Rolling object, will cache the apply functions
In [2]: r = s.rolling(10)

In [3]: f = lambda x: np.sum(x) + 5

In [4]: %timeit r.apply(f, raw=False)
2.16 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

@mroeschke
Copy link
Collaborator Author

Here are the ASV benchmarks:

       before           after         ratio
     [68a663be]       [3c49d034]
     <master>         <feature/rolling_apply_numba>
+         289±2ms        350±0.9ms     1.21  rolling.Apply.time_rolling('Series', 1000, 'float', <function sum at 0x105754620>, True)
+         290±7ms          350±1ms     1.20  rolling.Apply.time_rolling('Series', 1000, 'int', <function sum at 0x105754620>, True)
+         297±8ms          353±2ms     1.19  rolling.Apply.time_rolling('DataFrame', 1000, 'int', <function sum at 0x105754620>, True)
-         250±3ms        215±0.6ms     0.86  rolling.Apply.time_rolling('Series', 10, 'float', <function sum at 0x105754620>, True)
-         255±1ms        215±0.8ms     0.84  rolling.Apply.time_rolling('Series', 10, 'int', <function sum at 0x105754620>, True)
-         308±5ms        246±0.9ms     0.80  rolling.Apply.time_rolling('Series', 10, 'float', <function Apply.<lambda> at 0x11b8c6730>, True)
-        310±20ms          241±4ms     0.78  rolling.Apply.time_rolling('DataFrame', 10, 'float', <function Apply.<lambda> at 0x11b8c6730>, True)
-        340±40ms        245±0.6ms     0.72  rolling.Apply.time_rolling('Series', 10, 'int', <function Apply.<lambda> at 0x11b8c6730>, True)
-      13.9±0.06s          380±1ms     0.03  rolling.Apply.time_rolling('Series', 1000, 'int', <function Apply.<lambda> at 0x11b8c6730>, False)
-       14.0±0.1s        380±0.8ms     0.03  rolling.Apply.time_rolling('Series', 1000, 'float', <function Apply.<lambda> at 0x11b8c6730>, False)
-         14.1±0s          380±1ms     0.03  rolling.Apply.time_rolling('DataFrame', 1000, 'float', <function Apply.<lambda> at 0x11b8c6730>, False)
-      13.7±0.01s        349±0.8ms     0.03  rolling.Apply.time_rolling('Series', 1000, 'int', <function sum at 0x105754620>, False)
-      13.9±0.06s        350±0.6ms     0.03  rolling.Apply.time_rolling('Series', 1000, 'float', <function sum at 0x105754620>, False)
-      13.7±0.07s        245±0.8ms     0.02  rolling.Apply.time_rolling('Series', 10, 'float', <function Apply.<lambda> at 0x11b8c6730>, False)
-       14.0±0.1s        246±0.5ms     0.02  rolling.Apply.time_rolling('Series', 10, 'int', <function Apply.<lambda> at 0x11b8c6730>, False)
-      13.5±0.01s        215±0.3ms     0.02  rolling.Apply.time_rolling('Series', 10, 'float', <function sum at 0x105754620>, False)

The raw=False benchmarks are partially misleading since it means cython is handling pandas objects while numba is always handling numpy arrays (numba cannot operate in nopython mode with pandas objects).

@mroeschke
Copy link
Collaborator Author

Additional notes:

  • I had to xfail a couple of tests. To operate rolling.apply in nopython mode for max performance, apply cannot accept arbitrary functions.
  • the passed function to apply cannot accept *args and **kwargs because it's unsupported in numba.

@mroeschke mroeschke changed the title WIP: Add Numba to rolling.apply Add Numba to rolling.apply Sep 29, 2019
@mroeschke mroeschke merged commit d34b96c into feature/generalized_window_operations Sep 29, 2019
@mroeschke mroeschke deleted the feature/rolling_apply_numba branch September 29, 2019 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant