From 3f3b4e0bc15107153167ec605b45676113ffb9c1 Mon Sep 17 00:00:00 2001
From: Tommy <10076072+tommyod@users.noreply.github.com>
Date: Sat, 3 Feb 2018 21:32:56 +0100
Subject: [PATCH] DOC: Spellcheck of enhancingperf.rst (#19516)

* Spellchecked enhancingperf, sparse

* Uppercased 'cython' to 'Cython'

* Typeset variants of numba as 'Numba', as on their page

* Updated reference to Numba docs to latest version
---
 doc/source/enhancingperf.rst | 81 +++++++++++++++++++++++-------------
 doc/source/sparse.rst        |  8 ++--
 2 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/doc/source/enhancingperf.rst b/doc/source/enhancingperf.rst
index 7afa852262a38..b786b1d0c134a 100644
--- a/doc/source/enhancingperf.rst
+++ b/doc/source/enhancingperf.rst
@@ -19,6 +19,13 @@
 Enhancing Performance
 *********************
 
+In this part of the tutorial, we will investigate how to speed up certain
+functions operating on pandas ``DataFrames`` using three different techniques: 
+Cython, Numba and :func:`pandas.eval`. We will see a speed improvement of ~200 
+when we use Cython and Numba on a test function operating row-wise on the 
+``DataFrame``. Using :func:`pandas.eval` we will speed up a sum by an order of 
+~2.
+
 .. _enhancingperf.cython:
 
 Cython (Writing C extensions for pandas)
@@ -29,20 +36,20 @@ computationally heavy applications however, it can be possible to achieve sizeab
 speed-ups by offloading work to `cython <http://cython.org/>`__.
 
 This tutorial assumes you have refactored as much as possible in Python, for example
-trying to remove for loops and making use of NumPy vectorization, it's always worth
+by trying to remove for-loops and making use of NumPy vectorization. It's always worth
 optimising in Python first.
 
 This tutorial walks through a "typical" process of cythonizing a slow computation.
-We use an `example from the cython documentation <http://docs.cython.org/src/quickstart/cythonize.html>`__
+We use an `example from the Cython documentation <http://docs.cython.org/src/quickstart/cythonize.html>`__
 but in the context of pandas. Our final cythonized solution is around 100 times
-faster than the pure Python.
+faster than the pure Python solution.
 
 .. _enhancingperf.pure:
 
 Pure python
 ~~~~~~~~~~~
 
-We have a DataFrame to which we want to apply a function row-wise.
+We have a ``DataFrame`` to which we want to apply a function row-wise.
 
 .. ipython:: python
 
@@ -91,10 +98,10 @@ hence we'll concentrate our efforts cythonizing these two functions.
 
 .. _enhancingperf.plain:
 
-Plain cython
+Plain Cython
 ~~~~~~~~~~~~
 
-First we're going to need to import the cython magic function to ipython:
+First we're going to need to import the Cython magic function to ipython:
 
 .. ipython:: python
    :okwarning:
@@ -102,7 +109,7 @@ First we're going to need to import the cython magic function to ipython:
    %load_ext Cython
 
 
-Now, let's simply copy our functions over to cython as is (the suffix
+Now, let's simply copy our functions over to Cython as is (the suffix
 is here to distinguish between function versions):
 
 .. ipython::
@@ -177,8 +184,8 @@ in Python, so maybe we could minimize these by cythonizing the apply part.
 
 .. note::
 
-  We are now passing ndarrays into the cython function, fortunately cython plays
-  very nicely with numpy.
+  We are now passing ndarrays into the Cython function, fortunately Cython plays
+  very nicely with NumPy.
 
 .. ipython::
 
@@ -213,9 +220,9 @@ the rows, applying our ``integrate_f_typed``, and putting this in the zeros arra
 .. warning::
 
    You can **not pass** a ``Series`` directly as a ``ndarray`` typed parameter
-   to a cython function. Instead pass the actual ``ndarray`` using the
-   ``.values`` attribute of the Series. The reason is that the cython
-   definition is specific to an ndarray and not the passed Series.
+   to a Cython function. Instead pass the actual ``ndarray`` using the
+   ``.values`` attribute of the ``Series``. The reason is that the Cython
+   definition is specific to an ndarray and not the passed ``Series``.
 
    So, do not do this:
 
@@ -223,7 +230,7 @@ the rows, applying our ``integrate_f_typed``, and putting this in the zeros arra
 
         apply_integrate_f(df['a'], df['b'], df['N'])
 
-   But rather, use ``.values`` to get the underlying ``ndarray``
+   But rather, use ``.values`` to get the underlying ``ndarray``:
 
    .. code-block:: python
 
@@ -255,7 +262,7 @@ More advanced techniques
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
 There is still hope for improvement. Here's an example of using some more
-advanced cython techniques:
+advanced Cython techniques:
 
 .. ipython::
 
@@ -289,16 +296,17 @@ advanced cython techniques:
    In [4]: %timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values)
    1000 loops, best of 3: 987 us per loop
 
-Even faster, with the caveat that a bug in our cython code (an off-by-one error,
+Even faster, with the caveat that a bug in our Cython code (an off-by-one error,
 for example) might cause a segfault because memory access isn't checked.
-
+For more about ``boundscheck`` and ``wraparound``, see the Cython docs on 
+`compiler directives <http://cython.readthedocs.io/en/latest/src/reference/compilation.html?highlight=wraparound#compiler-directives>`__.
 
 .. _enhancingperf.numba:
 
-Using numba
+Using Numba
 -----------
 
-A recent alternative to statically compiling cython code, is to use a *dynamic jit-compiler*, ``numba``.
+A recent alternative to statically compiling Cython code, is to use a *dynamic jit-compiler*, Numba.
 
 Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters.
 
@@ -306,16 +314,17 @@ Numba works by generating optimized machine code using the LLVM compiler infrast
 
 .. note::
 
-    You will need to install ``numba``. This is easy with ``conda``, by using: ``conda install numba``, see :ref:`installing using miniconda<install.miniconda>`.
+    You will need to install Numba. This is easy with ``conda``, by using: ``conda install numba``, see :ref:`installing using miniconda<install.miniconda>`.
 
 .. note::
 
-    As of ``numba`` version 0.20, pandas objects cannot be passed directly to numba-compiled functions. Instead, one must pass the ``numpy`` array underlying the ``pandas`` object to the numba-compiled function as demonstrated below.
+    As of Numba version 0.20, pandas objects cannot be passed directly to Numba-compiled functions. Instead, one must pass the NumPy array underlying the pandas object to the Numba-compiled function as demonstrated below.
 
 Jit
 ~~~
 
-Using ``numba`` to just-in-time compile your code. We simply take the plain Python code from above and annotate with the ``@jit`` decorator.
+We demonstrate how to use Numba to just-in-time compile our code. We simply 
+take the plain Python code from above and annotate with the ``@jit`` decorator.
 
 .. code-block:: python
 
@@ -346,17 +355,19 @@ Using ``numba`` to just-in-time compile your code. We simply take the plain Pyth
        result = apply_integrate_f_numba(df['a'].values, df['b'].values, df['N'].values)
        return pd.Series(result, index=df.index, name='result')
 
-Note that we directly pass ``numpy`` arrays to the numba function. ``compute_numba`` is just a wrapper that provides a nicer interface by passing/returning pandas objects.
+Note that we directly pass NumPy arrays to the Numba function. ``compute_numba`` is just a wrapper that provides a nicer interface by passing/returning pandas objects.
 
 .. code-block:: ipython
 
     In [4]: %timeit compute_numba(df)
     1000 loops, best of 3: 798 us per loop
 
+In this example, using Numba was faster than Cython.
+
 Vectorize
 ~~~~~~~~~
 
-``numba`` can also be used to write vectorized functions that do not require the user to explicitly
+Numba can also be used to write vectorized functions that do not require the user to explicitly
 loop over the observations of a vector; a vectorized function will be applied to each row automatically.
 Consider the following toy example of doubling each observation:
 
@@ -389,13 +400,23 @@ Caveats
 
 .. note::
 
-    ``numba`` will execute on any function, but can only accelerate certain classes of functions.
+    Numba will execute on any function, but can only accelerate certain classes of functions.
 
-``numba`` is best at accelerating functions that apply numerical functions to NumPy arrays. When passed a function that only uses operations it knows how to accelerate, it will execute in ``nopython`` mode.
+Numba is best at accelerating functions that apply numerical functions to NumPy 
+arrays. When passed a function that only uses operations it knows how to 
+accelerate, it will execute in ``nopython`` mode.
 
-If ``numba`` is passed a function that includes something it doesn't know how to work with -- a category that currently includes sets, lists, dictionaries, or string functions -- it will revert to ``object mode``. In ``object mode``, numba will execute but your code will not speed up significantly. If you would prefer that ``numba`` throw an error if it cannot compile a function in a way that speeds up your code, pass numba the argument ``nopython=True`` (e.g.  ``@numba.jit(nopython=True)``). For more on troubleshooting ``numba`` modes, see the `numba troubleshooting page <http://numba.pydata.org/numba-doc/0.20.0/user/troubleshoot.html#the-compiled-code-is-too-slow>`__.
+If Numba is passed a function that includes something it doesn't know how to 
+work with -- a category that currently includes sets, lists, dictionaries, or 
+string functions -- it will revert to ``object mode``. In ``object mode``, 
+Numba will execute but your code will not speed up significantly. If you would 
+prefer that Numba throw an error if it cannot compile a function in a way that 
+speeds up your code, pass Numba the argument 
+``nopython=True`` (e.g.  ``@numba.jit(nopython=True)``). For more on 
+troubleshooting Numba modes, see the `Numba troubleshooting page 
+<http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#the-compiled-code-is-too-slow>`__.
 
-Read more in the `numba docs <http://numba.pydata.org/>`__.
+Read more in the `Numba docs <http://numba.pydata.org/>`__.
 
 .. _enhancingperf.eval:
 
@@ -448,7 +469,7 @@ These operations are supported by :func:`pandas.eval`:
 - Attribute access, e.g., ``df.a``
 - Subscript expressions, e.g., ``df[0]``
 - Simple variable evaluation, e.g., ``pd.eval('df')`` (this is not very useful)
-- Math functions, `sin`, `cos`, `exp`, `log`, `expm1`, `log1p`,
+- Math functions: `sin`, `cos`, `exp`, `log`, `expm1`, `log1p`,
   `sqrt`, `sinh`, `cosh`, `tanh`, `arcsin`, `arccos`, `arctan`, `arccosh`,
   `arcsinh`, `arctanh`, `abs` and `arctan2`.
 
@@ -581,7 +602,7 @@ on the original ``DataFrame`` or return a copy with the new column.
    For backwards compatibility, ``inplace`` defaults to ``True`` if not
    specified. This will change in a future version of pandas - if your
    code depends on an inplace assignment you should update to explicitly
-   set ``inplace=True``
+   set ``inplace=True``.
 
 .. ipython:: python
 
@@ -780,7 +801,7 @@ Technical Minutia Regarding Expression Evaluation
 Expressions that would result in an object dtype or involve datetime operations
 (because of ``NaT``) must be evaluated in Python space. The main reason for
 this behavior is to maintain backwards compatibility with versions of NumPy <
-1.7. In those versions of ``numpy`` a call to ``ndarray.astype(str)`` will
+1.7. In those versions of NumPy a call to ``ndarray.astype(str)`` will
 truncate any strings that are more than 60 characters in length. Second, we
 can't pass ``object`` arrays to ``numexpr`` thus string comparisons must be
 evaluated in Python space.
diff --git a/doc/source/sparse.rst b/doc/source/sparse.rst
index 2e224f103a95e..260d8aa32ef52 100644
--- a/doc/source/sparse.rst
+++ b/doc/source/sparse.rst
@@ -17,11 +17,11 @@ Sparse data structures
 
 .. note:: The ``SparsePanel`` class has been removed in 0.19.0
 
-We have implemented "sparse" versions of Series and DataFrame. These are not sparse
+We have implemented "sparse" versions of ``Series`` and ``DataFrame``. These are not sparse
 in the typical "mostly 0". Rather, you can view these objects as being "compressed"
 where any data matching a specific value (``NaN`` / missing value, though any value
 can be chosen) is omitted. A special ``SparseIndex`` object tracks where data has been
-"sparsified". This will make much more sense in an example. All of the standard pandas
+"sparsified". This will make much more sense with an example. All of the standard pandas
 data structures have a ``to_sparse`` method:
 
 .. ipython:: python
@@ -32,7 +32,7 @@ data structures have a ``to_sparse`` method:
    sts
 
 The ``to_sparse`` method takes a ``kind`` argument (for the sparse index, see
-below) and a ``fill_value``. So if we had a mostly zero Series, we could
+below) and a ``fill_value``. So if we had a mostly zero ``Series``, we could
 convert it to sparse with ``fill_value=0``:
 
 .. ipython:: python
@@ -40,7 +40,7 @@ convert it to sparse with ``fill_value=0``:
    ts.fillna(0).to_sparse(fill_value=0)
 
 The sparse objects exist for memory efficiency reasons. Suppose you had a
-large, mostly NA DataFrame:
+large, mostly NA ``DataFrame``:
 
 .. ipython:: python