Add Numba as an optional dependency for rolling.apply for pandas 1.0 #28987

mroeschke · 2019-10-15T05:53:20Z

As mentioned on the pandas dev call last week, I've been working with @jreback and @DiegoAlbertoTorres on a proof of concept (POC) implementing rolling.mean and rolling.apply using Numba instead of our current Cython implementation. As described in this proof of concept document, we worked on:

Refactoring window bound calculation and aggregation to use Numba
Developing a new API for users to implement their own window bounds calculations (df.rolling(MyWindowerClass()).mean())

The document details performance results, high level implementation details and integration plan into pandas. The fork is up to date with master and is running the same CI checks.

The proposal for pandas 1.0 is:

1) Introduce Numba as a required pandas dependency
2) Have rolling.mean dispatch to the Numba implementation

The proposal for post pandas 1.0 is:

Expose the new API for users to implement their own window bounds calculations
Implement all rolling aggregations (min, max, count, etc,) in Numba
Implement EWM and Expanding in Numba
Generalize data grouping (groupby, rolling, resample) and aggregations (mean, max, etc) using Numba jitclasses

As the issue title notes, I'm hoping to keep this discussion focused around any concerns, thoughts, suggestions regarding the POC and how it pertains to the pandas 1.0 proposal, but feel free to ask any questions regarding topics the POC doesn't cover.

cc @pandas-dev/pandas-core

The text was updated successfully, but these errors were encountered:

rgommers · 2019-10-15T07:09:11Z

Hi all. This proposal has a lot of implications that I don't see addressed in the PoC document. I would suggest reviewing this thread about introducing Numba as a SciPy dependency from last year.

Some concrete questions that may be showstoppers are:

what is the status on ARM and other more exotic hardware (e.g. AIX, the Debian-supported architectures)?
what's the impact on deployment size (e.g. this may hurt embedded use cases, and push NumPy+SciPy+Pandas over the size limit for AWS Lambda).

Pydata/Sparse, which is pure Python + Numba, is thinking about incrementally moving parts back to compiled code, due to stability and currently missing features in Numba.

And I think that Scikit-learn experimented with this fairly recently and also came away with "not ready yet" (second-hand info via @amueller, please correct me if I'm wrong @NicolasHug).

shoyer · 2019-10-15T07:14:29Z

I’m sympathetic to the goal here, but my sense is that Numba can still be a challenging project to redistribute, so I’m not sure this is a good idea. Not everyone uses conda or pip. Just look at the building from source instructions for patching LLVM, for example: https://llvmlite.readthedocs.io/en/latest/admin-guide/install.html.

A strength of pandas has been that it’s easy to install with a standard toolchain, and has a minimal set of required dependencies.

In your linked doc, you list three benefits from using Numba:

Performance parity or improvement over Cython

Eliminate shipping C-extensions

Ease of debugging and maintenance as pure Python code

With all due respect, I question benefits (2) and (3).

Removing all C extensions would be to remove the need for a C compiler when building pandas from source, but this seems quite farfetched for pandas. Until that happens, I’m not sure why this would be productive. C extensions are a mature and stable technology.

Calling Numba wrapped code “pure Python code” from a debugging perspective also seems misguided. It’s only pure Python until you compile it with Numba, at which point you can debug Numba’s compiler and LLVM IR. I’m pretty confident that the toolchain for debugging Cython’s generated C code is much more battle tested. For example, Cython is actually capable of embedding arbitrary Python, while Numba requires dropping back into “object” mode, which as far as I understand entails skipping most of the compiler.

toobaz · 2019-10-15T07:25:49Z

As mentioned in the last dev call, my (very limited) experience with numba suggests that it is very easy (= few lines of decorator) to use it without hard depending on it, and leaving the user with the pure Python implementation if numba is missing. While such pure Python implementation would be very slow, this would affect only rolling functions which are not necessarily used by all users (at that time, we would probably emit a warning). In any case, while I would not impose the burden of always writing a numba and a numpy implementation of rolling functions, this does not mean we have to kill existing numpy implementations (as a fallback) in 1.0.

So if we decide to go ahead introducing numba in 1.0 (and on this, other people have better judgement than me), I still would refrain, if possible, from introducing the hard dependency.

gfyoung · 2019-10-15T07:37:34Z

I still would refrain, if possible, from introducing the hard dependency.

If you're going to use numba, I think you should either go "all in" or not at all. The toolchain implications of using numba are not trivial, and the fallback is not ideal with pure Python code.

this would affect only rolling functions which are not necessarily used by all users

Sure, but that doesn't mean we should sacrifice performance. Ideally, there should be a non-negative benefit for all users if we made such a switch.

NicolasHug · 2019-10-15T12:23:55Z

And I think that Scikit-learn experimented with this fairly recently and also came away with "not ready yet" (second-hand info via @amueller, please correct me if I'm wrong @NicolasHug).

Yes, I don't think Numba is mature enough for scikit-learn to switch from Cython.

I've used numba extensively in pygbm and more recently in a toy project (hmmkay). Here are the main arguments I have against it:

error messages aren't helpful (at all)
compilation time is significant for a large code-base (caching makes it less of an issue but it can be surprising for users)
it's fairly unstable compared to cython. Bugs can be extremely hard to catch, and it takes time to realize that the bug comes from numba, and not from your code. I literally wasted days with pygbm on this.
API is not entirely stable yet (e.g. deprecation of reflected lists for typed lists)
A few features are missing, like being able to parallelize methods. It doesn't look like much, but it makes the code look like spaghetti after a while.
It's magic, so it's hard to know whether what you are writing is actually optimal or not. This is also true for Cython but I find numba even more surprising.

With all this I tend to question the "ease of use" argument of numba, I don't find it to be true, for now.

Regarding allowing a soft dependency on Numba: numba code is Python code, but it's not pythonic code. For numba to generate efficient code you need to unfold for loops, etc. Yes this is pure python code, but it will be incredibly slow if numba does not JIT compile it (slower than if you had been writing pythonic code).

WillAyd · 2019-10-15T16:08:12Z

With one of the major concerns in this thread being the installation dependency, is there a comprehensive listing of what platforms this would impact? I'm not sure we've been explicit ourselves as to what platforms are supported; if it is a complicating factor for niche platforms I don't think that should be a showstopper

FWIW I think it's definitely worth exploring. The performance improvements on lambda expressions are great, and while we already have a lot of Cython architecture in place I would argue that we may be biased in assessing Cython as easier to debug than Numba. I don't find that we get a lot of newcomers that jump into the Cython code, nor is our Cython code always clean or consistent. I could see Numba potentially being easier to digest

jbrockmendel · 2019-10-15T16:25:23Z

A lot of the conversation has been about numba vs cython, but we also have a soft dependency on bottleneck that is used in core.nanops that we might be able to remove if we were to use numba. IIRC @shoyer did a proof of concept for switching over some of those functions? bottleneck's last release was May 2017 and there are some bugs e.g. #22385 that don't seem to have any movement. I would view removing this dependency as an upside to using numba.

…

On Tue, Oct 15, 2019 at 9:08 AM William Ayd ***@***.***> wrote: With one of the major concerns in this thread being the installation dependency, is there a comprehensive listing of what platforms this would impact? I'm not sure we've been explicit ourselves as to what platforms are supported; if it is a complicating factor for niche platforms I don't think that should be a showstopper FWIW I think it's definitely worth exploring. The performance improvements on lambda expressions are great, and while we already have a lot of Cython architecture in place I would argue that we may be biased in assessing Cython as easier to debug than Numba. I don't find that we get a lot of newcomers that jump into the Cython code, nor is our Cython code always clean or consistent. I could see Numba potentially being easier to digest — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#28987?email_source=notifications&email_token=AB5UM6FA6ASQEGLADCL4D2LQOXTJVA5CNFSM4JAXTJC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJKW7Y#issuecomment-542288767>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5UM6EWB7QB6PMPSCVMAG3QOXTJVANCNFSM4JAXTJCQ> .

shoyer · 2019-10-15T17:35:46Z

A lot of the conversation has been about numba vs cython, but we also have
a soft dependency on bottleneck that is used in core.nanops that we might
be able to remove if we were to use numba. IIRC @shoyer did a proof of
concept for switching over some of those functions?

It is straightforward to rewrite most of the algorithms in bottleneck in Numba (e.g., like I did in http://github.com/shoyer/numbagg), but also in Cython or C++. If somebody cares about removing the bottleneck dependency, there are any number of fine ways to do so.

Actually C++ is really the elephant in the room here. It is definitely the most mature technology for high performance, high level programming.

bottleneck's last release was May 2017 and there are some bugs e.g. #22385 that don't seem to have any movement

Bottleneck has a new maintainer and recently moved over in the pydata GitHub organization.

I don't think it's fair to consider #22385 a bug. Bottleneck doesn't implement pairwise summation, but that doesn't make it wrong, it just a slightly less numerically stable algorithm for computing sums.

jorisvandenbossche · 2019-10-15T19:30:42Z

There are two goals mentioned in the top post: 1) use numba to implement the built-in rolling operations and 2) provide an API for the user to pass a numba function to be used as rolling operation.

I think the second goal is also very useful (and for the user give more added value, as the first goal is more an internal change). Could this second goal be implemented without having numba as a hard dependency?

stuartarchibald · 2019-10-17T17:02:19Z

Hi, Numba core developer here,

We agree that there are concerns in using Numba on a project like Pandas, and we would like to prioritise them better. It has been helpful in the past to itemise them and obtain clarity over the issues such that they can be fixed/improved (xref: numba/numba#2888 for the last round of this). I've opened a new issue to start a new discussion over on the Numba issue tracker (xref: numba/numba#4723).

Thanks!

mroeschke · 2019-10-18T17:37:22Z

Thanks for all the feedback @pandas-dev/pandas-core

I'd like to revise my pandas 1.0 proposal to the following:

Introduce Numba as an optional dependency
Have rolling.apply dispatch to Numba as opt-in behavior (e.g. rolling(...).apply(..., engine='Numba')) as a potential significant performance boost

The main allure of Numba in its immediate state is being able to write Python and achieve comparable performance with Cython (or C++) thus providing:

Lower barrier to entry for new contributors.
More debuggable workflow for the maintance team, e.g. @jit(forceobj=True) to drop back to Python.
2a) Debugging the LLVM IR has been more of a matter of debugging performance instead of result correctness in my experience; the later which is more important.

A con of adding Numba as a optional dependency for rolling is having to maintain duplicate implementations in Cython and Numba. The burden should go down in the long term as we re-visit Numba as a core dependency and circle back to a single implementation.

While a lot of the concerns are around Numba's immaturity, it is a consideration for limiting its application in pandas 1.0 (to only rolling.apply).
Moreover, I believe adopting Numba as an optional dependency can help drive significant feedback to the project (platform support, feature completeness, etc)

jorisvandenbossche · 2019-10-18T17:53:24Z

A con of adding Numba as a optional dependency for rolling is having to maintain duplicate implementations in Cython and Numba.

Why is this? Would you still include implementations of all built-in rolling functions in pandas? I would think the .apply(func, engine='numba') is mainly for user-defined-functions?

jreback · 2019-10-18T18:00:46Z

A con of adding Numba as a optional dependency for rolling is having to maintain duplicate implementations in Cython and Numba.

Why is this? Would you still include implementations of all built-in rolling functions in pandas? I would think the .apply(func, engine='numba') is mainly for user-defined-functions?

no, you wouldn't need to (at this point), duplicate anything for building in rolling function (e.g. mean); the duplication of code is in the computation of the Fixed and Variable window indexers themselves (not a huge duplication).

mroeschke · 2019-10-19T03:24:19Z

Right, I think we'll need 2 implementations for calculating the window boundaries, a Cython version (for rolling functions that are still in Cython) and a Numba version (for rolling functions that get converted to Numba)

TomAugspurger · 2019-11-12T15:12:19Z

@mroeschke are you hoping to do this for 1.0? Roughly, how much effort is left on #29428, and how much additional effort to close this? Trying to get a sense for where we're at.

Edit: and should the title of this post be updated based on #28987 (comment)? The optional part is still the plan, right?

mroeschke · 2019-11-12T16:27:04Z

@TomAugspurger yes still aiming for this for 1.0.

Hoping to finish #29428 by end of week. After there should be ~2/3 followup PRs (1 more cleanup for the "window indexers", 1 for Numba + rolling apply). These PR's should be a little more straightforward and hoping to finish those in December/end of year.

rgommers · 2019-11-12T22:25:22Z

@TomAugspurger yes still aiming for this for 1.0.

Just an observation: there's a lot of concerns and potential issues in this issue, with essentially no response, except for "let's make it optional then". If you had discussions in a dev call or something, it would be nice to summarize, because the decision making looks a little odd here.

TomAugspurger · 2019-11-12T22:37:25Z

No dev call (though this will likely come up during tomorrow's).

My (possibly wrong) understanding of the situation is that

We're not ready for a hard dependency on numba
Being able to do a .rolling(...).apply(numba_jitted_user_function) all in Numba is compelling enough to warrant some duplicate code (though I haven't followed the discussion on how much)

Though that doesn't quite square with @mroeschke comment

and a Numba version (for rolling functions that get converted to Numba)

Are we converting some functions to Numba? Or are we limiting the scope to taking a numba-path for user-defined functions that have been jitted (by the user).

mroeschke · 2019-11-12T22:54:35Z

The revised proposal #28987 (comment) was in response to the various concerns around Numba and making its use essential fully optional.

Reiterating the proposal points:

Introduce Numba as an optional dependency
Have rolling.apply dispatch to Numba as opt-in behavior (e.g. rolling(...).apply(user_python_func, engine='Numba')) as a potential significant performance boost

@TomAugspurger so the idea is that we would try to jit the users function (and raise if we fail) instead of the users jitting before passing to apply - though we could accomodate the latter as well.

Added the topic as a discussion point for tomorrow's dev call.

mroeschke added Dependencies Required and optional dependencies Needs Discussion Requires discussion from core team before further action Window rolling, ewma, expanding labels Oct 15, 2019

stuartarchibald mentioned this issue Oct 17, 2019

Tracker for Numba usability improvements Q4 2019. numba/numba#4723

Closed

jreback added this to the 1.0 milestone Oct 18, 2019

stuartarchibald mentioned this issue Oct 21, 2019

Bind to cython from Numba twosigma/pandas#43

Open

mroeschke mentioned this issue Nov 6, 2019

REF: Separate window bounds calculation from aggregation functions #29428

Merged

3 tasks

mroeschke changed the title ~~Add Numba as a required dependency for rolling.mean for pandas 1.0~~ Add Numba as an optional dependency for rolling.mean for pandas 1.0 Nov 12, 2019

mroeschke changed the title ~~Add Numba as an optional dependency for rolling.mean for pandas 1.0~~ Add Numba as an optional dependency for rolling.apply for pandas 1.0 Dec 9, 2019

mroeschke mentioned this issue Dec 9, 2019

ENH: Add numba engine for rolling apply #30151

Merged

5 tasks

jreback closed this as completed in #30151 Dec 27, 2019

mroeschke mentioned this issue Feb 10, 2020

ENH: Add engine keyword argument to groupby.apply to leverage Numba #31845

Open

rgommers mentioned this issue May 30, 2020

PyData/Sparse support scikit-learn/scikit-learn#17364

Open

mroeschke mentioned this issue Sep 17, 2021

PERF: Improve performance in rolling.mean(engine="numba") #43612

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Numba as an optional dependency for rolling.apply for pandas 1.0 #28987

Add Numba as an optional dependency for rolling.apply for pandas 1.0 #28987

mroeschke commented Oct 15, 2019

rgommers commented Oct 15, 2019

shoyer commented Oct 15, 2019

toobaz commented Oct 15, 2019

gfyoung commented Oct 15, 2019 •

edited

Loading

NicolasHug commented Oct 15, 2019

WillAyd commented Oct 15, 2019

jbrockmendel commented Oct 15, 2019 via email

shoyer commented Oct 15, 2019

jorisvandenbossche commented Oct 15, 2019

stuartarchibald commented Oct 17, 2019

mroeschke commented Oct 18, 2019 •

edited

Loading

jorisvandenbossche commented Oct 18, 2019

jreback commented Oct 18, 2019

mroeschke commented Oct 19, 2019

TomAugspurger commented Nov 12, 2019 •

edited

Loading

mroeschke commented Nov 12, 2019

rgommers commented Nov 12, 2019

TomAugspurger commented Nov 12, 2019

mroeschke commented Nov 12, 2019 •

edited

Loading

Add Numba as an optional dependency for rolling.apply for pandas 1.0 #28987

Add Numba as an optional dependency for rolling.apply for pandas 1.0 #28987

Comments

mroeschke commented Oct 15, 2019

rgommers commented Oct 15, 2019

shoyer commented Oct 15, 2019

toobaz commented Oct 15, 2019

gfyoung commented Oct 15, 2019 • edited Loading

NicolasHug commented Oct 15, 2019

WillAyd commented Oct 15, 2019

jbrockmendel commented Oct 15, 2019 via email

shoyer commented Oct 15, 2019

jorisvandenbossche commented Oct 15, 2019

stuartarchibald commented Oct 17, 2019

mroeschke commented Oct 18, 2019 • edited Loading

jorisvandenbossche commented Oct 18, 2019

jreback commented Oct 18, 2019

mroeschke commented Oct 19, 2019

TomAugspurger commented Nov 12, 2019 • edited Loading

mroeschke commented Nov 12, 2019

rgommers commented Nov 12, 2019

TomAugspurger commented Nov 12, 2019

mroeschke commented Nov 12, 2019 • edited Loading

gfyoung commented Oct 15, 2019 •

edited

Loading

mroeschke commented Oct 18, 2019 •

edited

Loading

TomAugspurger commented Nov 12, 2019 •

edited

Loading

mroeschke commented Nov 12, 2019 •

edited

Loading