Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Improve performance in rolling.mean(engine="numba") #43612

Merged
merged 28 commits into from
Sep 23, 2021

Conversation

mroeschke
Copy link
Member

  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

This also starts to add a shared aggregation function (mean) that can shared between rolling/groupby/DataFrame when using the numba engine.

df = pd.DataFrame(np.ones((10000, 1000)))
roll = df.rolling(10)
roll.mean(engine="numba", engine_kwargs={"nopython": True, "nogil": True, "parallel": True})
%timeit roll.mean(engine="numba", engine_kwargs={"nopython": True, "nogil": True, "parallel": True})

260 ms ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <- PR
431 ms ± 9.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <- master

@mroeschke mroeschke added numba numba-accelerated operations Performance Memory or execution speed performance labels Sep 16, 2021
pandas/core/numba_/executor.py Outdated Show resolved Hide resolved
pandas/core/numba_/executor.py Outdated Show resolved Hide resolved
@jreback
Copy link
Contributor

jreback commented Sep 16, 2021

how does this compare to the cython mean?

@mroeschke
Copy link
Member Author

how does this compare to the cython mean?

In [3]: %timeit roll.mean()  # cython
371 ms ± 15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@jbrockmendel
Copy link
Member

is there an issue somewhere for discussing making numba required and just using this instead of the cython versions?

@mroeschke
Copy link
Member Author

is there an issue somewhere for discussing making numba required and just using this instead of the cython versions?

Not an ongoing issue, but the last time it was discussed was in #28987. Looks like a lot of the discussion back then revolved around stability & maturity but those issues may not be as bad anymore.

@jreback
Copy link
Contributor

jreback commented Sep 17, 2021

is there an issue somewhere for discussing making numba required and just using this instead of the cython versions?

we ought to have this discussion as that would greatly simplify code generally. this is a good start though.

pandas/core/_numba/executor.py Show resolved Hide resolved


@numba.jit(nopython=True, nogil=True, parallel=False)
def is_monotonic_increasing(bounds):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar questions about typing (even if it doesn't actually help perf we should do it)

pandas/core/_numba/kernels.py Outdated Show resolved Hide resolved
pandas/core/_numba/kernels.py Outdated Show resolved Hide resolved
@mroeschke mroeschke added this to the 1.4 milestone Sep 18, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@@ -0,0 +1 @@
from pandas.core._numba.kernels.mean_ import sliding_mean # noqa:F401
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alt can use __all__

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

pandas/core/window/rolling.py Show resolved Hide resolved
pandas/core/window/rolling.py Show resolved Hide resolved
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @pandas-dev/pandas-core if any comments

@numba.jit(nopython=True, nogil=True, parallel=False)
def is_monotonic_increasing(bounds: np.ndarray) -> bool:
n = len(bounds)
if n == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this block. n==1 and n < 2 -> n==0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point I was able to simplify this block.

min_periods: int,
) -> np.ndarray:
N = len(start)
nobs = 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could nobs ever overflow int64 or uint64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose with sufficient observations (nobs) this could overflow, but this value should be less than or equal to the window size so the user would also have to provide a window size that overflows u/int64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an actual build for the maximum size of a NumPy array, np.intp. This is int64 on Windows64 and Linux, and I suspect on OSX. It seems that nobs should be in integer which should be slightly faster than a float.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, changed nobs to a int.

end: np.ndarray,
min_periods: int,
):
result = np.empty((len(start), values.shape[1]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can dtype by specified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added dtype.

def is_monotonic_increasing(bounds: np.ndarray) -> bool:
n = len(bounds)
if n == 1:
return bounds[0] == bounds[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to stop single element NaN sequences from being monotonic increasing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, yes. This snippet was taken from translating this function, but I was able to remove this condition since we know the inputs should be int64s will no NaNs

if n == 1:

@jreback jreback merged commit ffbeda7 into pandas-dev:master Sep 23, 2021
@jreback
Copy link
Contributor

jreback commented Sep 23, 2021

thanks @mroeschke

@mroeschke mroeschke deleted the kernels/mean_kernel branch September 23, 2021 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
numba numba-accelerated operations Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants