Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add numba engine for rolling apply #30151

Merged
merged 56 commits into from
Dec 27, 2019
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
3b9bff8
Add numba to import_optional_dependencies
Dec 8, 2019
9a302bf
Start adding keywords
Dec 8, 2019
0e9a600
Modify apply for numba and cython
Dec 9, 2019
36a77ed
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 9, 2019
dbb2a9b
Add numba as optional dependency
Dec 9, 2019
f0e9a4d
Add premil tests
Dec 9, 2019
1250aee
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 10, 2019
4e7fd1a
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 11, 2019
cb976cf
Add numba to requirements-dev, type and reorder signature in apply
Dec 11, 2019
45420bb
Move numba routines to its own file
Dec 11, 2019
17851cf
Adjust signature in top level function as well
Dec 11, 2019
20767ca
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 11, 2019
9619f8d
Generate requirements-dev.txt using script
Dec 11, 2019
66fa69c
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 13, 2019
b8908ea
Add skip test decorator, add numba to a few builds
Dec 13, 2019
135f2ad
black
Dec 13, 2019
34a5687
don't rejit a user's jitted function
Dec 13, 2019
6da8199
Add numba/cython comparison test
Dec 13, 2019
123f77e
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 17, 2019
54e74d1
Remove typing for now
Dec 17, 2019
04d3530
Remove sub description for doc failures?
Dec 17, 2019
4bbf587
Fix test function
Dec 17, 2019
f849bc7
test user predefined jit function, clarify docstring
Dec 17, 2019
0c30e48
Apply engine kwargs to function as well
Dec 17, 2019
c4c952e
Clairfy documentation
Dec 17, 2019
8645976
Clarify what engine_kwargs applies to
Dec 17, 2019
987c916
Start section for numba rolling apply
Dec 17, 2019
b775684
Lint
Dec 17, 2019
2e04e60
clarify note
Dec 17, 2019
9b20ff5
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 18, 2019
0c14033
Add apply function cache to save compiled numba functions
Dec 19, 2019
c7106dc
Add performance example
Dec 19, 2019
1640085
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 21, 2019
2846faf
Remove whitespace
Dec 21, 2019
5a645c0
Address lint errors and separate apply tests
Dec 22, 2019
6bac000
Add whatsnew note
Dec 22, 2019
6f1c73f
Skip apply tests for numba not installed, lint
Dec 22, 2019
a890337
Add typing
Dec 22, 2019
0a9071c
Add more typing
Dec 22, 2019
9d8d40b
Formatting cleanups
Dec 23, 2019
84c3491
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 24, 2019
a429206
Address Jeff's comments
Dec 24, 2019
5826ad9
Black
Dec 24, 2019
cf7571b
Add clarification
Dec 24, 2019
4bc9787
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 24, 2019
18eed60
Move function to module level
Dec 24, 2019
f715b55
move cache check higher up
Dec 24, 2019
6a765bf
Address Will's comments
Dec 24, 2019
af3fe50
Type Callable in generate_numba_apply_func
Dec 24, 2019
eb7b5e1
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 24, 2019
f7dfcf4
use ellipsis, cannot specify np.ndarray as well
Dec 24, 2019
a42a960
Remove trailing whitespace in apply docstring
Dec 24, 2019
d019830
Address Will's and Brock's comments
Dec 25, 2019
29d145f
Fix typing
Dec 25, 2019
248149c
Merge remote-tracking branch 'upstream/master' into numba_rolling_apply
Dec 26, 2019
a3da51e
Address followup comments
Dec 26, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/getting_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@ gcsfs 0.2.2 Google Cloud Storage access
html5lib HTML parser for read_html (see :ref:`note <optional_html>`)
lxml 3.8.0 HTML parser for read_html (see :ref:`note <optional_html>`)
matplotlib 2.2.2 Visualization
numba 0.46.0 Alternative execution engine for rolling operations
openpyxl 2.5.7 Reading / writing for xlsx files
pandas-gbq 0.8.0 Google Big Query access
psycopg2 PostgreSQL engine for sqlalchemy
Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ dependencies:
- matplotlib>=2.2.2 # pandas.plotting, Series.plot, DataFrame.plot
- numexpr>=2.6.8
- scipy>=1.1
- numba>=0.46.0

# optional for io
- beautifulsoup4>=4.6.0 # pandas.read_html
Expand Down
1 change: 1 addition & 0 deletions pandas/compat/_optional.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
"xlrd": "1.1.0",
"xlwt": "1.2.0",
"xlsxwriter": "0.9.8",
"numba": "0.46.0",
}


Expand Down
77 changes: 77 additions & 0 deletions pandas/core/window/numba_.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
from typing import Callable, Dict, Optional, Tuple

import numpy as np

from pandas.compat._optional import import_optional_dependency


def _generate_numba_apply_func(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok with this being non-private (e.g. generate_numba_apply_func), unless its only used in this module

args: Tuple, kwargs: Dict, func: Callable, engine_kwargs: Optional[Dict]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments here

):
numba = import_optional_dependency("numba")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string that says what this function does (the parameters are already documented elsewhere, maybe just mention that)


if engine_kwargs is None:
engine_kwargs = {"nopython": True, "nogil": False, "parallel": False}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just

if engine_kwargs is None:
    engine_kwargs = {}

as you set the defaults below


nopython = engine_kwargs.get("nopython", True)
nogil = engine_kwargs.get("nogil", False)
parallel = engine_kwargs.get("parallel", False)

if kwargs and nopython:
raise ValueError(
"numba does not support kwargs with nopython=True: "
"https://github.com/numba/numba/issues/2916"
)

if parallel:
loop_range = numba.prange
else:
loop_range = range

def make_rolling_apply(func):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move make_rolling_apply to module scope (out of this function)?

also don't you need to actually assign to the cache? (on a miss)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache assignment happens here https://github.com/pandas-dev/pandas/pull/30151/files#diff-0de5c5d9abfcdd141e83701eaaec4358R541 (the function needs to run on some data first)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then i wouldn’t even check the cache here ; that must be at the higher level

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sure thing. I'll move it up then.

"""
1. jit the user's function
2. Return a rolling apply function with the jitted function inline

Configurations specified in engine_kwargs apply to both the user's
function _AND_ the rolling apply function.
"""

@numba.generated_jit(nopython=nopython)
def numba_func(window, *_args):
if getattr(np, func.__name__, False) is func:

def impl(window, *_args):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this func and the return outside of the if statements? AFAICT is duplicated

return func(window, *_args)

return impl
else:
jf = numba.jit(func, nopython=nopython)

def impl(window, *_args):
return jf(window, *_args)

return impl

@numba.jit(nopython=nopython, nogil=nogil, parallel=parallel)
def roll_apply(
values: np.ndarray,
begin: np.ndarray,
end: np.ndarray,
minimum_periods: int,
):
result = np.empty(len(begin))
for i in loop_range(len(result)):
start = begin[i]
stop = end[i]
window = values[start:stop]
count_nan = np.sum(np.isnan(window))
if len(window) - count_nan >= minimum_periods:
result[i] = numba_func(window, *args)
else:
result[i] = np.nan
return result

return roll_apply

return make_rolling_apply(func)
78 changes: 67 additions & 11 deletions pandas/core/window/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
FixedWindowIndexer,
VariableWindowIndexer,
)
from pandas.core.window.numba_ import _generate_numba_apply_func


class _Window(PandasObject, ShallowMixin, SelectionMixin):
Expand Down Expand Up @@ -1239,9 +1240,21 @@ def count(self):
objects instead.
If you are just applying a NumPy reduction function this will
achieve much better performance.

*args, **kwargs
Arguments and keyword arguments to be passed into func.
engine : str, default 'cython'
jreback marked this conversation as resolved.
Show resolved Hide resolved
Execution engine for the applied function.
* ``'cython'`` : Runs rolling apply through C-extensions from cython.
* ``'numba'`` : Runs rolling apply through JIT compiled code from numba.
Only available when ``raw`` is set to ``True``.
engine_kwargs : dict, default None
Arguments to specify for the execution engine.
* For ``'cython'`` engine, there are no accepted ``engine_kwargs``
* For ``'numba'`` engine, the engine can accept ``nopython``, ``nogil``
and ``parallel``. The default ``engine_kwargs`` for the ``'numba'`` engine is
``{'nopython': True, 'nogil': False, 'parallel': False}``
args : tuple, default None
Positional arguments to be passed into func
kwargs : dict, default None
Keyword arguments to be passed into func

Returns
-------
Expand All @@ -1255,16 +1268,46 @@ def count(self):
"""
)

def apply(self, func, raw=False, args=(), kwargs={}):
from pandas import Series

def apply(
self,
func: Callable,
raw: bool = False,
engine: str = "cython",
engine_kwargs: Optional[Dict] = None,
args: Optional[Tuple] = None,
kwargs: Optional[Dict] = None,
):
if args is None:
jreback marked this conversation as resolved.
Show resolved Hide resolved
args = ()
if kwargs is None:
kwargs = {}
kwargs.pop("_level", None)
kwargs.pop("floor", None)
window = self._get_window()
offset = _offset(window, self.center)
if not is_bool(raw):
raise ValueError("raw parameter must be `True` or `False`")

if engine == "cython":
if engine_kwargs is not None:
raise ValueError("cython engine does not accept engine_kwargs")
jreback marked this conversation as resolved.
Show resolved Hide resolved
apply_func = self._generate_cython_apply_func(
args, kwargs, raw, offset, func
)
elif engine == "numba":
if raw is False:
raise ValueError("raw must be `True` when using the numba engine")
apply_func = _generate_numba_apply_func(args, kwargs, func, engine_kwargs)
else:
raise ValueError("engine must be either 'numba' or 'cython'")

# TODO: Why do we always pass center=False?
# name=func for WindowGroupByMixin._apply
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
return self._apply(apply_func, center=False, floor=0, name=func)

def _generate_cython_apply_func(self, args, kwargs, raw, offset, func):
jreback marked this conversation as resolved.
Show resolved Hide resolved
from pandas import Series

window_func = partial(
self._get_cython_func_type("roll_generic"),
args=args,
Expand All @@ -1279,9 +1322,7 @@ def apply_func(values, begin, end, min_periods, raw=raw):
values = Series(values, index=self.obj.index)
return window_func(values, begin, end, min_periods)

# TODO: Why do we always pass center=False?
# name=func for WindowGroupByMixin._apply
return self._apply(apply_func, center=False, floor=0, name=func)
return apply_func

def sum(self, *args, **kwargs):
nv.validate_window_func("sum", args, kwargs)
Expand Down Expand Up @@ -1927,8 +1968,23 @@ def count(self):

@Substitution(name="rolling")
@Appender(_shared_docs["apply"])
def apply(self, func, raw=False, args=(), kwargs={}):
return super().apply(func, raw=raw, args=args, kwargs=kwargs)
def apply(
self,
func,
raw=False,
engine="cython",
engine_kwargs=None,
args=None,
kwargs=None,
):
return super().apply(
func,
raw=raw,
engine=engine,
engine_kwargs=engine_kwargs,
args=args,
kwargs=kwargs,
)

@Substitution(name="rolling")
@Appender(_shared_docs["sum"])
Expand Down
28 changes: 28 additions & 0 deletions pandas/tests/window/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -342,3 +342,31 @@ def test_multiple_agg_funcs(self, func, window_size, expected_vals):
)

tm.assert_frame_equal(result, expected)


class TestEngine:
jreback marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class would be more logically placed in test_apply

def test_invalid_engine(self):
with pytest.raises(
ValueError, match="engine must be either 'numba' or 'cython'"
):
Series(range(1)).rolling(1).apply(lambda x: x, engine="foo")

def test_invalid_engine_kwargs_cython(self):
with pytest.raises(
ValueError, match="cython engine does not accept engine_kwargs"
):
Series(range(1)).rolling(1).apply(
lambda x: x, engine="cython", engine_kwargs={"nopython": False}
)

def test_invalid_raw_numba(self):
with pytest.raises(
ValueError, match="raw must be `True` when using the numba engine"
):
Series(range(1)).rolling(1).apply(lambda x: x, raw=False, engine="numba")

def test_invalid_kwargs_nopython(self):
with pytest.raises(ValueError, match="numba does not support kwargs with"):
Series(range(1)).rolling(1).apply(
lambda x: x, kwargs={"a": 1}, engine="numba", raw=True
)
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ jinja2
matplotlib>=2.2.2
numexpr>=2.6.8
scipy>=1.1
numba>=0.46.0
beautifulsoup4>=4.6.0
fastparquet>=0.3.2
html5lib
Expand Down