Merge pull request #10614 from nickeubank/update_numba_docs

Extended docs on numba
pandas-dev · Jul 22, 2015 · 3bf13ac · 3bf13ac
2 parents 818f0a7 + 640c5cb
commit 3bf13ac
Showing 1 changed file with 41 additions and 2 deletions.
diff --git a/doc/source/enhancingperf.rst b/doc/source/enhancingperf.rst
@@ -307,6 +307,10 @@ Numba works by generating optimized machine code using the LLVM compiler infrast
 
     You will need to install ``numba``. This is easy with ``conda``, by using: ``conda install numba``, see :ref:`installing using miniconda<install.miniconda>`.
 
+.. note::
+
+    As of ``numba`` version 0.20, pandas objects cannot be passed directly to numba-compiled functions. Instead, one must pass the ``numpy`` array underlying the ``pandas`` object to the numba-compiled function as demonstrated below.
+
 We simply take the plain python code from above and annotate with the ``@jit`` decorator.
 
 .. code-block:: python
@@ -338,14 +342,49 @@ We simply take the plain python code from above and annotate with the ``@jit`` d
        result = apply_integrate_f_numba(df['a'].values, df['b'].values, df['N'].values)
        return pd.Series(result, index=df.index, name='result')
 
-Similar to above, we directly pass ``numpy`` arrays directly to the numba function. Further
-we are wrapping the results to provide a nice interface by passing/returning pandas objects.
+Note that we directly pass ``numpy`` arrays to the numba function. ``compute_numba`` is just a wrapper that provides a nicer interface by passing/returning pandas objects.
 
 .. code-block:: python
 
     In [4]: %timeit compute_numba(df)
     1000 loops, best of 3: 798 us per loop
 
+``numba`` can also be used to write vectorized functions that do not require the user to explicitly 
+loop over the observations of a vector; a vectorized function will be applied to each row automatically. 
+Consider the following toy example of doubling each observation: 
+
+.. code-block:: python
+
+    import numba
+    
+    def double_every_value_nonumba(x):
+        return x*2
+    
+    @numba.vectorize
+    def double_every_value_withnumba(x):
+        return x*2
+    
+    
+    # Custom function without numba	
+    In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba)
+    1000 loops, best of 3: 797 us per loop
+    
+    # Standard implementation (faster than a custom function)
+    In [6]: %timeit df['col1_doubled'] = df.a*2
+    1000 loops, best of 3: 233 us per loop
+    
+    # Custom function with numba
+    In [7]: %timeit df['col1_doubled'] = double_every_value_withnumba(df.a.values)
+    1000 loops, best of 3: 145 us per loop
+
+.. note::
+
+    ``numba`` will execute on any function, but can only accelerate certain classes of functions. 
+
+``numba`` is best at accelerating functions that apply numerical functions to numpy arrays. When passed a function that only uses operations it knows how to accelerate, it will execute in ``nopython`` mode. 
+
+If ``numba`` is passed a function that includes something it doesn't know how to work with -- a category that currently includes sets, lists, dictionaries, or string functions -- it will revert to ``object mode``. In ``object mode``, numba will execute but your code will not speed up significantly. If you would prefer that ``numba`` throw an error if it cannot compile a function in a way that speeds up your code, pass numba the argument ``nopython=True`` (e.g.  ``@numba.jit(nopython=True)``). For more on troubleshooting ``numba`` modes, see the `numba troubleshooting page <http://numba.pydata.org/numba-doc/0.20.0/user/troubleshoot.html#the-compiled-code-is-too-slow>`__. 
+
 Read more in the `numba docs <http://numba.pydata.org/>`__.
 
 .. _enhancingperf.eval: