Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/WIP: .sorted #10726

Merged
merged 1 commit into from
Aug 21, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 3 additions & 6 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -434,9 +434,8 @@ Reshaping, sorting
:toctree: generated/

Series.argsort
Series.order
Series.reorder_levels
Series.sort
Series.sort_values
Series.sort_index
Series.sortlevel
Series.swaplevel
Expand Down Expand Up @@ -908,7 +907,7 @@ Reshaping, sorting, transposing

DataFrame.pivot
DataFrame.reorder_levels
DataFrame.sort
DataFrame.sort_values
DataFrame.sort_index
DataFrame.sortlevel
DataFrame.nlargest
Expand Down Expand Up @@ -1293,7 +1292,6 @@ Modifying and Computations
Index.insert
Index.min
Index.max
Index.order
Index.reindex
Index.repeat
Index.take
Expand All @@ -1319,8 +1317,7 @@ Sorting
:toctree: generated/

Index.argsort
Index.order
Index.sort
Index.sort_values

Time-specific operations
~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
45 changes: 31 additions & 14 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1418,39 +1418,56 @@ description.

.. _basics.sorting:

Sorting by index and value
--------------------------
Sorting
-------

.. warning::

The sorting API is substantially changed in 0.17.0, see :ref:`here <whatsnew_0170.api_breaking.sorting>` for these changes.
In particular, all sorting methods now return a new object by default, and **DO NOT** operate in-place (except by passing ``inplace=True``).

There are two obvious kinds of sorting that you may be interested in: sorting
by label and sorting by actual values. The primary method for sorting axis
labels (indexes) across data structures is the :meth:`~DataFrame.sort_index` method.
by label and sorting by actual values.

By Index
~~~~~~~~

The primary method for sorting axis
labels (indexes) are the ``Series.sort_index()`` and the ``DataFrame.sort_index()`` methods.

.. ipython:: python

unsorted_df = df.reindex(index=['a', 'd', 'c', 'b'],
columns=['three', 'two', 'one'])

# DataFrame
unsorted_df.sort_index()
unsorted_df.sort_index(ascending=False)
unsorted_df.sort_index(axis=1)

:meth:`DataFrame.sort_index` can accept an optional ``by`` argument for ``axis=0``
# Series
unsorted_df['three'].sort_index()

By Values
~~~~~~~~~

The :meth:`Series.sort_values` and :meth:`DataFrame.sort_values` are the entry points for **value** sorting (that is the values in a column or row).
:meth:`DataFrame.sort_values` can accept an optional ``by`` argument for ``axis=0``
which will use an arbitrary vector or a column name of the DataFrame to
determine the sort order:

.. ipython:: python

df1 = pd.DataFrame({'one':[2,1,1,1],'two':[1,3,2,4],'three':[5,4,3,2]})
df1.sort_index(by='two')
df1.sort_values(by='two')

The ``by`` argument can take a list of column names, e.g.:

.. ipython:: python

df1[['one', 'two', 'three']].sort_index(by=['one','two'])

Series has the method :meth:`~Series.order` (analogous to `R's order function
<http://stat.ethz.ch/R-manual/R-patched/library/base/html/order.html>`__) which
sorts by value, with special treatment of NA values via the ``na_position``
These methods have special treatment of NA values via the ``na_position``
argument:

.. ipython:: python
Expand All @@ -1459,11 +1476,11 @@ argument:
s.order()
s.order(na_position='first')

.. note::

:meth:`Series.sort` sorts a Series by value in-place. This is to provide
compatibility with NumPy methods which expect the ``ndarray.sort``
behavior. :meth:`Series.order` returns a copy of the sorted data.
.. _basics.searchsorted:

searchsorted
~~~~~~~~~~~~

Series has the :meth:`~Series.searchsorted` method, which works similar to
:meth:`numpy.ndarray.searchsorted`.
Expand Down Expand Up @@ -1493,7 +1510,7 @@ faster than sorting the entire Series and calling ``head(n)`` on the result.

s = pd.Series(np.random.permutation(10))
s
s.order()
s.sort_values()
s.nsmallest(3)
s.nlargest(3)

Expand Down
62 changes: 61 additions & 1 deletion doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ users upgrade to this version.
Highlights include:

- Release the Global Interpreter Lock (GIL) on some cython operations, see :ref:`here <whatsnew_0170.gil>`
- The sorting API has been revamped to remove some long-time inconsistencies, see :ref:`here <whatsnew_0170.api_breaking.sorting>`
- The default for ``to_datetime`` will now be to ``raise`` when presented with unparseable formats,
previously this would return the original input, see :ref:`here <whatsnew_0170.api_breaking.to_datetime>`
- The default for ``dropna`` in ``HDFStore`` has changed to ``False``, to store by default all rows even
Expand Down Expand Up @@ -187,6 +188,65 @@ Other enhancements
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0170.api_breaking.sorting:

Changes to sorting API
^^^^^^^^^^^^^^^^^^^^^^

The sorting API has had some longtime inconsistencies. (:issue:`9816`,:issue:`8239`).

Here is a summary of the **prior** to 0.17.0 API

- ``Series.sort`` is **INPLACE** while ``DataFrame.sort`` returns a new object.
- ``Series.order`` returned a new object
- It was possible to use ``Series/DataFrame.sort_index`` to sort by **values** by passing the ``by`` keyword.
- ``Series/DataFrame.sortlevel`` worked only on a ``MultiIndex`` for sorting by index.

To address these issues, we have revamped the API:

- We have introduced a new method, :meth:`DataFrame.sort_values`, which is the merger of ``DataFrame.sort()``, ``Series.sort()``,
and ``Series.order``, to handle sorting of **values**.
- The existing method ``Series.sort()`` has been deprecated and will be removed in a
future version of pandas.
- The ``by`` argument of ``DataFrame.sort_index()`` has been deprecated and will be removed in a future version of pandas.
- The methods ``DataFrame.sort()``, ``Series.order()``, will not be recommended to use and will carry a deprecation warning
in the doc-string.
- The existing method ``.sort_index()`` will gain the ``level`` keyword to enable level sorting.

We now have two distinct and non-overlapping methods of sorting. A ``*`` marks items that
will show a ``FutureWarning``.

To sort by the **values**:

================================= ====================================
Previous Replacement
================================= ====================================
\*``Series.order()`` ``Series.sort_values()``
\*``Series.sort()`` ``Series.sort_values(inplace=True)``
\*``DataFrame.sort(columns=...)`` ``DataFrame.sort_values(by=...)``
================================= ====================================

To sort by the **index**:

================================= ====================================
Previous Equivalent
================================= ====================================
``Series.sort_index()`` ``Series.sort_index()``
``Series.sortlevel(level=...)`` ``Series.sort_index(level=...``)
``DataFrame.sort_index()`` ``DataFrame.sort_index()``
``DataFrame.sortlevel(level=...)`` ``DataFrame.sort_index(level=...)``
\*``DataFrame.sort()`` ``DataFrame.sort_index()``
================================== ====================================

We have also deprecated and changed similar methods in two Series-like classes, ``Index`` and ``Categorical``.

================================== ====================================
Previous Replacement
================================== ====================================
\*``Index.order()`` ``Index.sort_values()``
\*``Categorical.order()`` ``Categorical.sort_values``
================================== ====================================

.. _whatsnew_0170.api_breaking.to_datetime:

Changes to to_datetime and to_timedelta
Expand Down Expand Up @@ -570,7 +630,7 @@ Removal of prior version deprecations/changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- Remove use of some deprecated numpy comparison operations, mainly in tests. (:issue:`10569`)

- Removal of ``na_last`` parameters from ``Series.order()`` and ``Series.sort()``, in favor of ``na_position``, xref (:issue:`5231`)

.. _whatsnew_0170.performance:

Expand Down
6 changes: 2 additions & 4 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,9 +262,7 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
result.index = bins[:-1]

if sort:
result.sort()
if not ascending:
result = result[::-1]
result = result.sort_values(ascending=ascending)

if normalize:
result = result / float(values.size)
Expand Down Expand Up @@ -497,7 +495,7 @@ def select_n_slow(dropped, n, take_last, method):
reverse_it = take_last or method == 'nlargest'
ascending = method == 'nsmallest'
slc = np.s_[::-1] if reverse_it else np.s_[:]
return dropped[slc].order(ascending=ascending).head(n)
return dropped[slc].sort_values(ascending=ascending).head(n)


_select_methods = {'nsmallest': nsmallest, 'nlargest': nlargest}
Expand Down
43 changes: 37 additions & 6 deletions pandas/core/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -1083,7 +1083,7 @@ def argsort(self, ascending=True, **kwargs):
result = result[::-1]
return result

def order(self, inplace=False, ascending=True, na_position='last'):
def sort_values(self, inplace=False, ascending=True, na_position='last'):
""" Sorts the Category by category value returning a new Categorical by default.

Only ordered Categoricals can be sorted!
Expand All @@ -1092,10 +1092,10 @@ def order(self, inplace=False, ascending=True, na_position='last'):

Parameters
----------
ascending : boolean, default True
Sort ascending. Passing False sorts descending
inplace : boolean, default False
Do operation in place.
ascending : boolean, default True
Sort ascending. Passing False sorts descending
na_position : {'first', 'last'} (optional, default='last')
'first' puts NaNs at the beginning
'last' puts NaNs at the end
Expand Down Expand Up @@ -1139,6 +1139,37 @@ def order(self, inplace=False, ascending=True, na_position='last'):
return Categorical(values=codes,categories=self.categories, ordered=self.ordered,
fastpath=True)

def order(self, inplace=False, ascending=True, na_position='last'):
"""
DEPRECATED: use :meth:`Categorical.sort_values`

Sorts the Category by category value returning a new Categorical by default.

Only ordered Categoricals can be sorted!

Categorical.sort is the equivalent but sorts the Categorical inplace.

Parameters
----------
inplace : boolean, default False
Do operation in place.
ascending : boolean, default True
Sort ascending. Passing False sorts descending
na_position : {'first', 'last'} (optional, default='last')
'first' puts NaNs at the beginning
'last' puts NaNs at the end

Returns
-------
y : Category or None

See Also
--------
Category.sort
"""
warn("order is deprecated, use sort_values(...)",
FutureWarning, stacklevel=2)
return self.sort_values(inplace=inplace, ascending=ascending, na_position=na_position)

def sort(self, inplace=True, ascending=True, na_position='last'):
""" Sorts the Category inplace by category value.
Expand All @@ -1163,10 +1194,10 @@ def sort(self, inplace=True, ascending=True, na_position='last'):

See Also
--------
Category.order
Category.sort_values
"""
return self.order(inplace=inplace, ascending=ascending,
na_position=na_position)
return self.sort_values(inplace=inplace, ascending=ascending,
na_position=na_position)

def ravel(self, order='C'):
""" Return a flattened (numpy) array.
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -2155,6 +2155,9 @@ def _mut_exclusive(**kwargs):
return val2


def _not_none(*args):
return (arg for arg in args if arg is not None)

def _any_none(*args):
for arg in args:
if arg is None:
Expand Down
Loading