Skip to content

Commit

Permalink
Merge pull request #10614 from nickeubank/update_numba_docs
Browse files Browse the repository at this point in the history
Extended docs on numba
  • Loading branch information
jreback committed Jul 22, 2015
2 parents 818f0a7 + 640c5cb commit 3bf13ac
Showing 1 changed file with 41 additions and 2 deletions.
43 changes: 41 additions & 2 deletions doc/source/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,10 @@ Numba works by generating optimized machine code using the LLVM compiler infrast

You will need to install ``numba``. This is easy with ``conda``, by using: ``conda install numba``, see :ref:`installing using miniconda<install.miniconda>`.

.. note::

As of ``numba`` version 0.20, pandas objects cannot be passed directly to numba-compiled functions. Instead, one must pass the ``numpy`` array underlying the ``pandas`` object to the numba-compiled function as demonstrated below.

We simply take the plain python code from above and annotate with the ``@jit`` decorator.

.. code-block:: python
Expand Down Expand Up @@ -338,14 +342,49 @@ We simply take the plain python code from above and annotate with the ``@jit`` d
result = apply_integrate_f_numba(df['a'].values, df['b'].values, df['N'].values)
return pd.Series(result, index=df.index, name='result')
Similar to above, we directly pass ``numpy`` arrays directly to the numba function. Further
we are wrapping the results to provide a nice interface by passing/returning pandas objects.
Note that we directly pass ``numpy`` arrays to the numba function. ``compute_numba`` is just a wrapper that provides a nicer interface by passing/returning pandas objects.

.. code-block:: python
In [4]: %timeit compute_numba(df)
1000 loops, best of 3: 798 us per loop
``numba`` can also be used to write vectorized functions that do not require the user to explicitly
loop over the observations of a vector; a vectorized function will be applied to each row automatically.
Consider the following toy example of doubling each observation:

.. code-block:: python
import numba
def double_every_value_nonumba(x):
return x*2
@numba.vectorize
def double_every_value_withnumba(x):
return x*2
# Custom function without numba
In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba)
1000 loops, best of 3: 797 us per loop
# Standard implementation (faster than a custom function)
In [6]: %timeit df['col1_doubled'] = df.a*2
1000 loops, best of 3: 233 us per loop
# Custom function with numba
In [7]: %timeit df['col1_doubled'] = double_every_value_withnumba(df.a.values)
1000 loops, best of 3: 145 us per loop
.. note::

``numba`` will execute on any function, but can only accelerate certain classes of functions.

``numba`` is best at accelerating functions that apply numerical functions to numpy arrays. When passed a function that only uses operations it knows how to accelerate, it will execute in ``nopython`` mode.

If ``numba`` is passed a function that includes something it doesn't know how to work with -- a category that currently includes sets, lists, dictionaries, or string functions -- it will revert to ``object mode``. In ``object mode``, numba will execute but your code will not speed up significantly. If you would prefer that ``numba`` throw an error if it cannot compile a function in a way that speeds up your code, pass numba the argument ``nopython=True`` (e.g. ``@numba.jit(nopython=True)``). For more on troubleshooting ``numba`` modes, see the `numba troubleshooting page <http://numba.pydata.org/numba-doc/0.20.0/user/troubleshoot.html#the-compiled-code-is-too-slow>`__.

Read more in the `numba docs <http://numba.pydata.org/>`__.

.. _enhancingperf.eval:
Expand Down

0 comments on commit 3bf13ac

Please sign in to comment.