Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pandas to 0.23.1 #190

Merged
merged 2 commits into from
Jun 14, 2018
Merged

Conversation

pyup-bot
Copy link
Collaborator

@pyup-bot pyup-bot commented Jun 13, 2018

This PR updates pandas from 0.22.0 to 0.23.1.

Changelog

0.23.0

----------------------

This is a major release from 0.22.0 and includes a number of API changes,
deprecations, new features, enhancements, and performance improvements along
with a large number of bug fixes. We recommend that all users upgrade to this
version.

Highlights include:

- :ref:`Round-trippable JSON format with 'table' orient <whatsnew_0230.enhancements.round-trippable_json>`.
- :ref:`Instantiation from dicts respects order for Python 3.6+ <whatsnew_0230.api_breaking.dict_insertion_order>`.
- :ref:`Dependent column arguments for assign <whatsnew_0230.enhancements.assign_dependent>`.
- :ref:`Merging / sorting on a combination of columns and index levels <whatsnew_0230.enhancements.merge_on_columns_and_levels>`.
- :ref:`Extending Pandas with custom types <whatsnew_023.enhancements.extension>`.
- :ref:`Excluding unobserved categories from groupby <whatsnew_0230.enhancements.categorical_grouping>`.
- :ref:`Changes to make output shape of DataFrame.apply consistent <whatsnew_0230.api_breaking.apply>`.

Check the :ref:`API Changes <whatsnew_0230.api_breaking>` and :ref:`deprecations <whatsnew_0230.deprecations>` before updating.

.. warning::

Starting January 1, 2019, pandas feature releases will support Python 3 only.
See :ref:`install.dropping-27` for more.

.. contents:: What's new in v0.23.0
 :local:
 :backlinks: none
 :depth: 2

.. _whatsnew_0230.enhancements:

New features
~~~~~~~~~~~~

.. _whatsnew_0230.enhancements.round-trippable_json:

JSON read/write round-trippable with ``orient='table'``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A ``DataFrame`` can now be written to and subsequently read back via JSON while preserving metadata through usage of the ``orient='table'`` argument (see :issue:`18912` and :issue:`9146`). Previously, none of the available ``orient`` values guaranteed the preservation of dtypes and index names, amongst other metadata.

.. ipython:: python

df = pd.DataFrame({'foo': [1, 2, 3, 4],
 	      'bar': ['a', 'b', 'c', 'd'],
 	      'baz': pd.date_range('2018-01-01', freq='d', periods=4),
 	      'qux': pd.Categorical(['a', 'b', 'c', 'c'])
 	      }, index=pd.Index(range(4), name='idx'))
df
df.dtypes
df.to_json('test.json', orient='table')
new_df = pd.read_json('test.json', orient='table')
new_df
new_df.dtypes

Please note that the string `index` is not supported with the round trip format, as it is used by default in ``write_json`` to indicate a missing index name.

.. ipython:: python
:okwarning:

df.index.name = 'index'

df.to_json('test.json', orient='table')
new_df = pd.read_json('test.json', orient='table')
new_df
new_df.dtypes

.. ipython:: python
:suppress:

import os
os.remove('test.json')


.. _whatsnew_0230.enhancements.assign_dependent:


``.assign()`` accepts dependent arguments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :func:`DataFrame.assign` now accepts dependent keyword arguments for python version later than 3.6 (see also `PEP 468
<https://www.python.org/dev/peps/pep-0468/>`_). Later keyword arguments may now refer to earlier ones if the argument is a callable. See the
:ref:`documentation here <dsintro.chained_assignment>` (:issue:`14207`)

.. ipython:: python

 df = pd.DataFrame({'A': [1, 2, 3]})
 df
 df.assign(B=df.A, C=lambda x:x['A']+ x['B'])

.. warning::

This may subtly change the behavior of your code when you're
using ``.assign()`` to update an existing column. Previously, callables
referring to other variables being updated would get the "old" values

Previous Behavior:

.. code-block:: ipython

   In [2]: df = pd.DataFrame({"A": [1, 2, 3]})

   In [3]: df.assign(A=lambda df: df.A + 1, C=lambda df: df.A * -1)
   Out[3]:
      A  C
   0  2 -1
   1  3 -2
   2  4 -3

New Behavior:

.. ipython:: python

   df.assign(A=df.A+1, C= lambda df: df.A* -1)



.. _whatsnew_0230.enhancements.merge_on_columns_and_levels:

Merging on a combination of columns and index levels
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Strings passed to :meth:`DataFrame.merge` as the ``on``, ``left_on``, and ``right_on``
parameters may now refer to either column names or index level names.
This enables merging ``DataFrame`` instances on a combination of index levels
and columns without resetting indexes. See the :ref:`Merge on columns and
levels <merging.merge_on_columns_and_levels>` documentation section.
(:issue:`14355`)

.. ipython:: python

left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1')

left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3'],
                     'key2': ['K0', 'K1', 'K0', 'K1']},
                    index=left_index)

right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')

right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3'],
                      'key2': ['K0', 'K0', 'K0', 'K1']},
                     index=right_index)

left.merge(right, on=['key1', 'key2'])

.. _whatsnew_0230.enhancements.sort_by_columns_and_levels:

Sorting by a combination of columns and index levels
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Strings passed to :meth:`DataFrame.sort_values` as the ``by`` parameter may
now refer to either column names or index level names.  This enables sorting
``DataFrame`` instances by a combination of index levels and columns without
resetting indexes. See the :ref:`Sorting by Indexes and Values
<basics.sort_indexes_and_values>` documentation section.
(:issue:`14353`)

.. ipython:: python

 Build MultiIndex
idx = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('a', 2),
                                 ('b', 2), ('b', 1), ('b', 1)])
idx.names = ['first', 'second']

 Build DataFrame
df_multi = pd.DataFrame({'A': np.arange(6, 0, -1)},
                        index=idx)
df_multi

 Sort by 'second' (index) and 'A' (column)
df_multi.sort_values(by=['second', 'A'])


.. _whatsnew_023.enhancements.extension:

Extending Pandas with Custom Types (Experimental)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pandas now supports storing array-like objects that aren't necessarily 1-D NumPy
arrays as columns in a DataFrame or values in a Series. This allows third-party
libraries to implement extensions to NumPy's types, similar to how pandas
implemented categoricals, datetimes with timezones, periods, and intervals.

As a demonstration, we'll use cyberpandas_, which provides an ``IPArray`` type
for storing ip addresses.

.. code-block:: ipython

In [1]: from cyberpandas import IPArray

In [2]: values = IPArray([
   ...:     0,
   ...:     3232235777,
   ...:     42540766452641154071740215577757643572
   ...: ])
   ...:
   ...:

``IPArray`` isn't a normal 1-D NumPy array, but because it's a pandas
:class:`~pandas.api.extension.ExtensionArray`, it can be stored properly inside pandas' containers.

.. code-block:: ipython

In [3]: ser = pd.Series(values)

In [4]: ser
Out[4]:
0                         0.0.0.0
1                     192.168.1.1
2    2001:db8:85a3::8a2e:370:7334
dtype: ip

Notice that the dtype is ``ip``. The missing value semantics of the underlying
array are respected:

.. code-block:: ipython

In [5]: ser.isna()
Out[5]:
0     True
1    False
2    False
dtype: bool

For more, see the :ref:`extension types <extending.extension-types>`
documentation. If you build an extension array, publicize it on our
:ref:`ecosystem page <ecosystem.extensions>`.

.. _cyberpandas: https://cyberpandas.readthedocs.io/en/latest/


.. _whatsnew_0230.enhancements.categorical_grouping:

New ``observed`` keyword for excluding unobserved categories in ``groupby``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Grouping by a categorical includes the unobserved categories in the output.
When grouping by multiple categorical columns, this means you get the cartesian product of all the
categories, including combinations where there are no observations, which can result in a large
number of groups. We have added a keyword ``observed`` to control this behavior, it defaults to
``observed=False`` for backward-compatibility. (:issue:`14942`, :issue:`8138`, :issue:`15217`, :issue:`17594`, :issue:`8669`, :issue:`20583`, :issue:`20902`)

.. ipython:: python

cat1 = pd.Categorical(["a", "a", "b", "b"],
                      categories=["a", "b", "z"], ordered=True)
cat2 = pd.Categorical(["c", "d", "c", "d"],
                      categories=["c", "d", "y"], ordered=True)
df = pd.DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
df['C'] = ['foo', 'bar'] * 2
df

To show all values, the previous behavior:

.. ipython:: python

df.groupby(['A', 'B', 'C'], observed=False).count()


To show only observed values:

.. ipython:: python

df.groupby(['A', 'B', 'C'], observed=True).count()

For pivotting operations, this behavior is *already* controlled by the ``dropna`` keyword:

.. ipython:: python

cat1 = pd.Categorical(["a", "a", "b", "b"],
                      categories=["a", "b", "z"], ordered=True)
cat2 = pd.Categorical(["c", "d", "c", "d"],
                      categories=["c", "d", "y"], ordered=True)
df = DataFrame({"A": cat1, "B": cat2, "values": [1, 2, 3, 4]})
df

.. ipython:: python

pd.pivot_table(df, values='values', index=['A', 'B'],
               dropna=True)
pd.pivot_table(df, values='values', index=['A', 'B'],
               dropna=False)


.. _whatsnew_0230.enhancements.window_raw:

Rolling/Expanding.apply() accepts ``raw=False`` to pass a ``Series`` to the function
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`Series.rolling().apply() <pandas.core.window.Rolling.apply>`, :func:`DataFrame.rolling().apply() <pandas.core.window.Rolling.apply>`,
:func:`Series.expanding().apply() <pandas.core.window.Expanding.apply>`, and :func:`DataFrame.expanding().apply() <pandas.core.window.Expanding.apply>` have gained a ``raw=None`` parameter.
This is similar to :func:`DataFame.apply`. This parameter, if ``True`` allows one to send a ``np.ndarray`` to the applied function. If ``False`` a ``Series`` will be passed. The
default is ``None``, which preserves backward compatibility, so this will default to ``True``, sending an ``np.ndarray``.
In a future version the default will be changed to ``False``, sending a ``Series``. (:issue:`5071`, :issue:`20584`)

.. ipython:: python

s = pd.Series(np.arange(5), np.arange(5) + 1)
s

Pass a ``Series``:

.. ipython:: python

s.rolling(2, min_periods=1).apply(lambda x: x.iloc[-1], raw=False)

Mimic the original behavior of passing a ndarray:

.. ipython:: python

s.rolling(2, min_periods=1).apply(lambda x: x[-1], raw=True)


.. _whatsnew_0210.enhancements.limit_area:

``DataFrame.interpolate`` has gained the ``limit_area`` kwarg
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:`DataFrame.interpolate` has gained a ``limit_area`` parameter to allow further control of which ``NaN`` s are replaced.
Use ``limit_area='inside'`` to fill only NaNs surrounded by valid values or use ``limit_area='outside'`` to fill only ``NaN`` s
outside the existing valid values while preserving those inside.  (:issue:`16284`) See the :ref:`full documentation here <missing_data.interp_limits>`.


.. ipython:: python

ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, np.nan, 13, np.nan, np.nan])
ser

Fill one consecutive inside value in both directions

.. ipython:: python

ser.interpolate(limit_direction='both', limit_area='inside', limit=1)

Fill all consecutive outside values backward

.. ipython:: python

ser.interpolate(limit_direction='backward', limit_area='outside')

Fill all consecutive outside values in both directions

.. ipython:: python

ser.interpolate(limit_direction='both', limit_area='outside')

.. _whatsnew_0210.enhancements.get_dummies_dtype:

``get_dummies`` now supports ``dtype`` argument
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The :func:`get_dummies` now accepts a ``dtype`` argument, which specifies a dtype for the new columns. The default remains uint8. (:issue:`18330`)

.. ipython:: python

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
pd.get_dummies(df, columns=['c']).dtypes
pd.get_dummies(df, columns=['c'], dtype=bool).dtypes


.. _whatsnew_0230.enhancements.timedelta_mod:

Timedelta mod method
^^^^^^^^^^^^^^^^^^^^

``mod`` (%) and ``divmod`` operations are now defined on ``Timedelta`` objects
when operating with either timedelta-like or with numeric arguments.
See the :ref:`documentation here <timedeltas.mod_divmod>`. (:issue:`19365`)

.. ipython:: python

 td = pd.Timedelta(hours=37)
 td % pd.Timedelta(minutes=45)

.. _whatsnew_0230.enhancements.ran_inf:

``.rank()`` handles ``inf`` values when ``NaN`` are present
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, ``.rank()`` would assign ``inf`` elements ``NaN`` as their ranks. Now ranks are calculated properly. (:issue:`6945`)

.. ipython:: python

 s = pd.Series([-np.inf, 0, 1, np.nan, np.inf])
 s

Previous Behavior:

.. code-block:: ipython

 In [11]: s.rank()
 Out[11]:
 0    1.0
 1    2.0
 2    3.0
 3    NaN
 4    NaN
 dtype: float64

Current Behavior:

.. ipython:: python

 s.rank()

Furthermore, previously if you rank ``inf`` or ``-inf`` values together with ``NaN`` values, the calculation won't distinguish ``NaN`` from infinity when using 'top' or 'bottom' argument.

.. ipython:: python

 s = pd.Series([np.nan, np.nan, -np.inf, -np.inf])
 s

Previous Behavior:

.. code-block:: ipython

 In [15]: s.rank(na_option='top')
 Out[15]:
 0    2.5
 1    2.5
 2    2.5
 3    2.5
 dtype: float64

Current Behavior:

.. ipython:: python

 s.rank(na_option='top')

These bugs were squashed:

- Bug in :meth:`DataFrame.rank` and :meth:`Series.rank` when ``method='dense'`` and ``pct=True`` in which percentile ranks were not being used with the number of distinct observations (:issue:`15630`)
- Bug in :meth:`Series.rank` and :meth:`DataFrame.rank` when ``ascending='False'`` failed to return correct ranks for infinity if ``NaN`` were present (:issue:`19538`)
- Bug in :func:`DataFrameGroupBy.rank` where ranks were incorrect when both infinity and ``NaN`` were present (:issue:`20561`)


.. _whatsnew_0230.enhancements.str_cat_align:

``Series.str.cat`` has gained the ``join`` kwarg
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, :meth:`Series.str.cat` did not -- in contrast to most of ``pandas`` -- align :class:`Series` on their index before concatenation (see :issue:`18657`).
The method has now gained a keyword ``join`` to control the manner of alignment, see examples below and :ref:`here <text.concatenate>`.

In v.0.23 `join` will default to None (meaning no alignment), but this default will change to ``'left'`` in a future version of pandas.

.. ipython:: python
:okwarning:

 s = pd.Series(['a', 'b', 'c', 'd'])
 t = pd.Series(['b', 'd', 'e', 'c'], index=[1, 3, 4, 2])
 s.str.cat(t)
 s.str.cat(t, join='left', na_rep='-')

Furthermore, :meth:`Series.str.cat` now works for ``CategoricalIndex`` as well (previously raised a ``ValueError``; see :issue:`20842`).

.. _whatsnew_0230.enhancements.astype_category:

``DataFrame.astype`` performs column-wise conversion to ``Categorical``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:`DataFrame.astype` can now perform column-wise conversion to ``Categorical`` by supplying the string ``'category'`` or
a :class:`~pandas.api.types.CategoricalDtype`. Previously, attempting this would raise a ``NotImplementedError``. See the
:ref:`categorical.objectcreation` section of the documentation for more details and examples. (:issue:`12860`, :issue:`18099`)

Supplying the string ``'category'`` performs column-wise conversion, with only labels appearing in a given column set as categories:

.. ipython:: python

 df = pd.DataFrame({'A': list('abca'), 'B': list('bccd')})
 df = df.astype('category')
 df['A'].dtype
 df['B'].dtype


Supplying a ``CategoricalDtype`` will make the categories in each column consistent with the supplied dtype:

.. ipython:: python

 from pandas.api.types import CategoricalDtype
 df = pd.DataFrame({'A': list('abca'), 'B': list('bccd')})
 cdt = CategoricalDtype(categories=list('abcd'), ordered=True)
 df = df.astype(cdt)
 df['A'].dtype
 df['B'].dtype


.. _whatsnew_0230.enhancements.other:

Other Enhancements
^^^^^^^^^^^^^^^^^^

- Unary ``+`` now permitted for ``Series`` and ``DataFrame`` as  numeric operator (:issue:`16073`)
- Better support for :meth:`~pandas.io.formats.style.Styler.to_excel` output with the ``xlsxwriter`` engine. (:issue:`16149`)
- :func:`pandas.tseries.frequencies.to_offset` now accepts leading '+' signs e.g. '+1h'. (:issue:`18171`)
- :func:`MultiIndex.unique` now supports the ``level=`` argument, to get unique values from a specific index level (:issue:`17896`)
- :class:`pandas.io.formats.style.Styler` now has method ``hide_index()`` to determine whether the index will be rendered in output (:issue:`14194`)
- :class:`pandas.io.formats.style.Styler` now has method ``hide_columns()`` to determine whether columns will be hidden in output (:issue:`14194`)
- Improved wording of ``ValueError`` raised in :func:`to_datetime` when ``unit=`` is passed with a non-convertible value (:issue:`14350`)
- :func:`Series.fillna` now accepts a Series or a dict as a ``value`` for a categorical dtype (:issue:`17033`)
- :func:`pandas.read_clipboard` updated to use qtpy, falling back to PyQt5 and then PyQt4, adding compatibility with Python3 and multiple python-qt bindings (:issue:`17722`)
- Improved wording of ``ValueError`` raised in :func:`read_csv` when the ``usecols`` argument cannot match all columns. (:issue:`17301`)
- :func:`DataFrame.corrwith` now silently drops non-numeric columns when passed a Series. Before, an exception was raised (:issue:`18570`).
- :class:`IntervalIndex` now supports time zone aware ``Interval`` objects (:issue:`18537`, :issue:`18538`)
- :func:`Series` / :func:`DataFrame` tab completion also returns identifiers in the first level of a :func:`MultiIndex`. (:issue:`16326`)
- :func:`read_excel()` has gained the ``nrows`` parameter (:issue:`16645`)
- :meth:`DataFrame.append` can now in more cases preserve the type of the calling dataframe's columns (e.g. if both are ``CategoricalIndex``) (:issue:`18359`)
- :meth:`DataFrame.to_json` and :meth:`Series.to_json` now accept an ``index`` argument which allows the user to exclude the index from the JSON output (:issue:`17394`)
- ``IntervalIndex.to_tuples()`` has gained the ``na_tuple`` parameter to control whether NA is returned as a tuple of NA, or NA itself (:issue:`18756`)
- ``Categorical.rename_categories``, ``CategoricalIndex.rename_categories`` and :attr:`Series.cat.rename_categories`
can now take a callable as their argument (:issue:`18862`)
- :class:`Interval` and :class:`IntervalIndex` have gained a ``length`` attribute (:issue:`18789`)
- ``Resampler`` objects now have a functioning :attr:`~pandas.core.resample.Resampler.pipe` method.
Previously, calls to ``pipe`` were diverted to  the ``mean`` method (:issue:`17905`).
- :func:`~pandas.api.types.is_scalar` now returns ``True`` for ``DateOffset`` objects (:issue:`18943`).
- :func:`DataFrame.pivot` now accepts a list for the ``values=`` kwarg (:issue:`17160`).
- Added :func:`pandas.api.extensions.register_dataframe_accessor`,
:func:`pandas.api.extensions.register_series_accessor`, and
:func:`pandas.api.extensions.register_index_accessor`, accessor for libraries downstream of pandas
to register custom accessors like ``.cat`` on pandas objects. See
:ref:`Registering Custom Accessors <extending.register-accessors>` for more (:issue:`14781`).

- ``IntervalIndex.astype`` now supports conversions between subtypes when passed an ``IntervalDtype`` (:issue:`19197`)
- :class:`IntervalIndex` and its associated constructor methods (``from_arrays``, ``from_breaks``, ``from_tuples``) have gained a ``dtype`` parameter (:issue:`19262`)
- Added :func:`pandas.core.groupby.SeriesGroupBy.is_monotonic_increasing` and :func:`pandas.core.groupby.SeriesGroupBy.is_monotonic_decreasing` (:issue:`17015`)
- For subclassed ``DataFrames``, :func:`DataFrame.apply` will now preserve the ``Series`` subclass (if defined) when passing the data to the applied function (:issue:`19822`)
- :func:`DataFrame.from_dict` now accepts a ``columns`` argument that can be used to specify the column names when ``orient='index'`` is used (:issue:`18529`)
- Added option ``display.html.use_mathjax`` so `MathJax <https://www.mathjax.org/>`_ can be disabled when rendering tables in ``Jupyter`` notebooks (:issue:`19856`, :issue:`19824`)
- :func:`DataFrame.replace` now supports the ``method`` parameter, which can be used to specify the replacement method when ``to_replace`` is a scalar, list or tuple and ``value`` is ``None`` (:issue:`19632`)
- :meth:`Timestamp.month_name`, :meth:`DatetimeIndex.month_name`, and :meth:`Series.dt.month_name` are now available (:issue:`12805`)
- :meth:`Timestamp.day_name` and :meth:`DatetimeIndex.day_name` are now available to return day names with a specified locale (:issue:`12806`)
- :meth:`DataFrame.to_sql` now performs a multi-value insert if the underlying connection supports itk rather than inserting row by row.
``SQLAlchemy`` dialects supporting multi-value inserts include: ``mysql``, ``postgresql``, ``sqlite`` and any dialect with ``supports_multivalues_insert``. (:issue:`14315`, :issue:`8953`)
- :func:`read_html` now accepts a ``displayed_only`` keyword argument to controls whether or not hidden elements are parsed (``True`` by default) (:issue:`20027`)
- :func:`read_html` now reads all ``<tbody>`` elements in a ``<table>``, not just the first. (:issue:`20690`)
- :meth:`~pandas.core.window.Rolling.quantile` and :meth:`~pandas.core.window.Expanding.quantile` now accept the ``interpolation`` keyword, ``linear`` by default (:issue:`20497`)
- zip compression is supported via ``compression=zip`` in :func:`DataFrame.to_pickle`, :func:`Series.to_pickle`, :func:`DataFrame.to_csv`, :func:`Series.to_csv`, :func:`DataFrame.to_json`, :func:`Series.to_json`. (:issue:`17778`)
- :class:`~pandas.tseries.offsets.WeekOfMonth` constructor now supports ``n=0`` (:issue:`20517`).
- :class:`DataFrame` and :class:`Series` now support matrix multiplication (`) operator (:issue:`10259`) for Python>=3.5
- Updated :meth:`DataFrame.to_gbq` and :meth:`pandas.read_gbq` signature and documentation to reflect changes from
the Pandas-GBQ library version 0.4.0. Adds intersphinx mapping to Pandas-GBQ
library. (:issue:`20564`)
- Added new writer for exporting Stata dta files in version 117, ``StataWriter117``.  This format supports exporting strings with lengths up to 2,000,000 characters (:issue:`16450`)
- :func:`to_hdf` and :func:`read_hdf` now accept an ``errors`` keyword argument to control encoding error handling (:issue:`20835`)
- :func:`cut` has gained the ``duplicates='raise'|'drop'`` option to control whether to raise on duplicated edges (:issue:`20947`)
- :func:`date_range`, :func:`timedelta_range`, and :func:`interval_range` now return a linearly spaced index if ``start``, ``stop``, and ``periods`` are specified, but ``freq`` is not. (:issue:`20808`, :issue:`20983`, :issue:`20976`)

.. _whatsnew_0230.api_breaking:

Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0230.api_breaking.deps:

Dependencies have increased minimum versions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We have updated our minimum supported versions of dependencies (:issue:`15184`).
If installed, we now require:

+-----------------+-----------------+----------+---------------+
| Package         | Minimum Version | Required |     Issue     |
+=================+=================+==========+===============+
| python-dateutil | 2.5.0           |    X     | :issue:`15184`|
+-----------------+-----------------+----------+---------------+
| openpyxl        | 2.4.0           |          | :issue:`15184`|
+-----------------+-----------------+----------+---------------+
| beautifulsoup4  | 4.2.1           |          | :issue:`20082`|
+-----------------+-----------------+----------+---------------+
| setuptools      | 24.2.0          |          | :issue:`20698`|
+-----------------+-----------------+----------+---------------+

.. _whatsnew_0230.api_breaking.dict_insertion_order:

Instantiation from dicts preserves dict insertion order for python 3.6+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Until Python 3.6, dicts in Python had no formally defined ordering. For Python
version 3.6 and later, dicts are ordered by insertion order, see
`PEP 468 <https://www.python.org/dev/peps/pep-0468/>`_.
Pandas will use the dict's insertion order, when creating a ``Series`` or
``DataFrame`` from a dict and you're using Python version 3.6 or
higher. (:issue:`19884`)

Previous Behavior (and current behavior if on Python < 3.6):

.. code-block:: ipython

pd.Series({'Income': 2000,
           'Expenses': -1500,
           'Taxes': -200,
           'Net result': 300})
Expenses     -1500
Income        2000
Net result     300
Taxes         -200
dtype: int64

Note the Series above is ordered alphabetically by the index values.

New Behavior (for Python >= 3.6):

.. ipython:: python

 pd.Series({'Income': 2000,
            'Expenses': -1500,
            'Taxes': -200,
            'Net result': 300})

Notice that the Series is now ordered by insertion order. This new behavior is
used for all relevant pandas types (``Series``, ``DataFrame``, ``SparseSeries``
and ``SparseDataFrame``).

If you wish to retain the old behavior while using Python >= 3.6, you can use
``.sort_index()``:

.. ipython:: python

 pd.Series({'Income': 2000,
            'Expenses': -1500,
            'Taxes': -200,
            'Net result': 300}).sort_index()

.. _whatsnew_0230.api_breaking.deprecate_panel:

Deprecate Panel
^^^^^^^^^^^^^^^

``Panel`` was deprecated in the 0.20.x release, showing as a ``DeprecationWarning``. Using ``Panel`` will now show a ``FutureWarning``. The recommended way to represent 3-D data are
with a ``MultiIndex`` on a ``DataFrame`` via the :meth:`~Panel.to_frame` or with the `xarray package <http://xarray.pydata.org/en/stable/>`__. Pandas
provides a :meth:`~Panel.to_xarray` method to automate this conversion. For more details see :ref:`Deprecate Panel <dsintro.deprecate_panel>` documentation. (:issue:`13563`, :issue:`18324`).

.. ipython:: python
:suppress:

import pandas.util.testing as tm

.. ipython:: python
:okwarning:

p = tm.makePanel()
p

Convert to a MultiIndex DataFrame

.. ipython:: python

p.to_frame()

Convert to an xarray DataArray

.. ipython:: python
:okwarning:

p.to_xarray()


.. _whatsnew_0230.api_breaking.core_common:

pandas.core.common removals
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following error & warning messages are removed from ``pandas.core.common`` (:issue:`13634`, :issue:`19769`):

- ``PerformanceWarning``
- ``UnsupportedFunctionCall``
- ``UnsortedIndexError``
- ``AbstractMethodError``

These are available from import from ``pandas.errors`` (since 0.19.0).


.. _whatsnew_0230.api_breaking.apply:

Changes to make output of ``DataFrame.apply`` consistent
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`DataFrame.apply` was inconsistent when applying an arbitrary user-defined-function that returned a list-like with ``axis=1``. Several bugs and inconsistencies
are resolved. If the applied function returns a Series, then pandas will return a DataFrame; otherwise a Series will be returned, this includes the case
where a list-like (e.g. ``tuple`` or ``list`` is returned) (:issue:`16353`, :issue:`17437`, :issue:`17970`, :issue:`17348`, :issue:`17892`, :issue:`18573`,
:issue:`17602`, :issue:`18775`, :issue:`18901`, :issue:`18919`).

.. ipython:: python

 df = pd.DataFrame(np.tile(np.arange(3), 6).reshape(6, -1) + 1, columns=['A', 'B', 'C'])
 df

Previous Behavior: if the returned shape happened to match the length of original columns, this would return a ``DataFrame``.
If the return shape did not match, a ``Series`` with lists was returned.

.. code-block:: python

In [3]: df.apply(lambda x: [1, 2, 3], axis=1)
Out[3]:
   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3
3  1  2  3
4  1  2  3
5  1  2  3

In [4]: df.apply(lambda x: [1, 2], axis=1)
Out[4]:
0    [1, 2]
1    [1, 2]
2    [1, 2]
3    [1, 2]
4    [1, 2]
5    [1, 2]
dtype: object


New Behavior: When the applied function returns a list-like, this will now *always* return a ``Series``.

.. ipython:: python

 df.apply(lambda x: [1, 2, 3], axis=1)
 df.apply(lambda x: [1, 2], axis=1)

To have expanded columns, you can use ``result_type='expand'``

.. ipython:: python

 df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')

To broadcast the result across the original columns (the old behaviour for
list-likes of the correct length), you can use ``result_type='broadcast'``.
The shape must match the original columns.

.. ipython:: python

 df.apply(lambda x: [1, 2, 3], axis=1, result_type='broadcast')

Returning a ``Series`` allows one to control the exact return structure and column names:

.. ipython:: python

 df.apply(lambda x: Series([1, 2, 3], index=['D', 'E', 'F']), axis=1)

.. _whatsnew_0230.api_breaking.concat:

Concatenation will no longer sort
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In a future version of pandas :func:`pandas.concat` will no longer sort the non-concatenation axis when it is not already aligned.
The current behavior is the same as the previous (sorting), but now a warning is issued when ``sort`` is not specified and the non-concatenation axis is not aligned (:issue:`4588`).

.. ipython:: python
:okwarning:

df1 = pd.DataFrame({"a": [1, 2], "b": [1, 2]}, columns=['b', 'a'])
df2 = pd.DataFrame({"a": [4, 5]})

pd.concat([df1, df2])

To keep the previous behavior (sorting) and silence the warning, pass ``sort=True``

.. ipython:: python

pd.concat([df1, df2], sort=True)

To accept the future behavior (no sorting), pass ``sort=False``

.. ipython

pd.concat([df1, df2], sort=False)

Note that this change also applies to :meth:`DataFrame.append`, which has also received a ``sort`` keyword for controlling this behavior.


.. _whatsnew_0230.api_breaking.build_changes:

Build Changes
^^^^^^^^^^^^^

- Building pandas for development now requires ``cython >= 0.24`` (:issue:`18613`)
- Building from source now explicitly requires ``setuptools`` in ``setup.py`` (:issue:`18113`)
- Updated conda recipe to be in compliance with conda-build 3.0+ (:issue:`18002`)

.. _whatsnew_0230.api_breaking.index_division_by_zero:

Index Division By Zero Fills Correctly
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Division operations on ``Index`` and subclasses will now fill division of positive numbers by zero with ``np.inf``, division of negative numbers by zero with ``-np.inf`` and `0 / 0` with ``np.nan``.  This matches existing ``Series`` behavior. (:issue:`19322`, :issue:`19347`)

Previous Behavior:

.. code-block:: ipython

 In [6]: index = pd.Int64Index([-1, 0, 1])

 In [7]: index / 0
 Out[7]: Int64Index([0, 0, 0], dtype='int64')

  Previous behavior yielded different results depending on the type of zero in the divisor
 In [8]: index / 0.0
 Out[8]: Float64Index([-inf, nan, inf], dtype='float64')

 In [9]: index = pd.UInt64Index([0, 1])

 In [10]: index / np.array([0, 0], dtype=np.uint64)
 Out[10]: UInt64Index([0, 0], dtype='uint64')

 In [11]: pd.RangeIndex(1, 5) / 0
 ZeroDivisionError: integer division or modulo by zero

Current Behavior:

.. ipython:: python

 index = pd.Int64Index([-1, 0, 1])
  division by zero gives -infinity where negative, +infinity where positive, and NaN for 0 / 0
 index / 0

  The result of division by zero should not depend on whether the zero is int or float
 index / 0.0

 index = pd.UInt64Index([0, 1])
 index / np.array([0, 0], dtype=np.uint64)

 pd.RangeIndex(1, 5) / 0

.. _whatsnew_0230.api_breaking.extract:

Extraction of matching patterns from strings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, extracting matching patterns from strings with :func:`str.extract` used to return a
``Series`` if a single group was being extracted (a ``DataFrame`` if more than one group was
extracted). As of Pandas 0.23.0 :func:`str.extract` always returns a ``DataFrame``, unless
``expand`` is set to ``False``. Finally, ``None`` was an accepted value for
the ``expand`` parameter (which was equivalent to ``False``), but now raises a ``ValueError``. (:issue:`11386`)

Previous Behavior:

.. code-block:: ipython

 In [1]: s = pd.Series(['number 10', '12 eggs'])

 In [2]: extracted = s.str.extract('.*(\d\d).*')

 In [3]: extracted
 Out [3]:
 0    10
 1    12
 dtype: object

 In [4]: type(extracted)
 Out [4]:
 pandas.core.series.Series

New Behavior:

.. ipython:: python

 s = pd.Series(['number 10', '12 eggs'])
 extracted = s.str.extract('.*(\d\d).*')
 extracted
 type(extracted)

To restore previous behavior, simply set ``expand`` to ``False``:

.. ipython:: python

 s = pd.Series(['number 10', '12 eggs'])
 extracted = s.str.extract('.*(\d\d).*', expand=False)
 extracted
 type(extracted)

.. _whatsnew_0230.api_breaking.cdt_ordered:

Default value for the ``ordered`` parameter of ``CategoricalDtype``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The default value of the ``ordered`` parameter for :class:`~pandas.api.types.CategoricalDtype` has changed from ``False`` to ``None`` to allow updating of ``categories`` without impacting ``ordered``.  Behavior should remain consistent for downstream objects, such as :class:`Categorical` (:issue:`18790`)

In previous versions, the default value for the ``ordered`` parameter was ``False``.  This could potentially lead to the ``ordered`` parameter unintentionally being changed from ``True`` to ``False`` when users attempt to update ``categories`` if ``ordered`` is not explicitly specified, as it would silently default to ``False``.  The new behavior for ``ordered=None`` is to retain the existing value of ``ordered``.

New Behavior:

.. ipython:: python

 from pandas.api.types import CategoricalDtype
 cat = pd.Categorical(list('abcaba'), ordered=True, categories=list('cba'))
 cat
 cdt = CategoricalDtype(categories=list('cbad'))
 cat.astype(cdt)

Notice in the example above that the converted ``Categorical`` has retained ``ordered=True``.  Had the default value for ``ordered`` remained as ``False``, the converted ``Categorical`` would have become unordered, despite ``ordered=False`` never being explicitly specified.  To change the value of ``ordered``, explicitly pass it to the new dtype, e.g. ``CategoricalDtype(categories=list('cbad'), ordered=False)``.

Note that the unintentional conversion of ``ordered`` discussed above did not arise in previous versions due to separate bugs that prevented ``astype`` from doing any type of category to category conversion (:issue:`10696`, :issue:`18593`).  These bugs have been fixed in this release, and motivated changing the default value of ``ordered``.

.. _whatsnew_0230.api_breaking.pretty_printing:

Better pretty-printing of DataFrames in a terminal
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, the default value for the maximum number of columns was
``pd.options.display.max_columns=20``. This meant that relatively wide data
frames would not fit within the terminal width, and pandas would introduce line
breaks to display these 20 columns. This resulted in an output that was
relatively difficult to read:

.. image:: _static/print_df_old.png

If Python runs in a terminal, the maximum number of columns is now determined
automatically so that the printed data frame fits within the current terminal
width (``pd.options.display.max_columns=0``) (:issue:`17023`). If Python runs
as a Jupyter kernel (such as the Jupyter QtConsole or a Jupyter notebook, as
well as in many IDEs), this value cannot be inferred automatically and is thus
set to `20` as in previous versions. In a terminal, this results in a much
nicer output:

.. image:: _static/print_df_new.png

Note that if you don't like the new default, you can always set this option
yourself. To revert to the old setting, you can run this line:

.. code-block:: python

pd.options.display.max_columns = 20

.. _whatsnew_0230.api.datetimelike:

Datetimelike API Changes
^^^^^^^^^^^^^^^^^^^^^^^^

- The default ``Timedelta`` constructor now accepts an ``ISO 8601 Duration`` string as an argument (:issue:`19040`)
- Subtracting ``NaT`` from a :class:`Series` with ``dtype='datetime64[ns]'`` returns a ``Series`` with ``dtype='timedelta64[ns]'`` instead of ``dtype='datetime64[ns]'`` (:issue:`18808`)
- Addition or subtraction of ``NaT`` from :class:`TimedeltaIndex` will return ``TimedeltaIndex`` instead of ``DatetimeIndex`` (:issue:`19124`)
- :func:`DatetimeIndex.shift` and :func:`TimedeltaIndex.shift` will now raise ``NullFrequencyError`` (which subclasses ``ValueError``, which was raised in older versions) when the index object frequency is ``None`` (:issue:`19147`)
- Addition and subtraction of ``NaN`` from a :class:`Series` with ``dtype='timedelta64[ns]'`` will raise a ``TypeError`` instead of treating the ``NaN`` as ``NaT`` (:issue:`19274`)
- ``NaT`` division with :class:`datetime.timedelta` will now return ``NaN`` instead of raising (:issue:`17876`)
- Operations between a :class:`Series` with dtype ``dtype='datetime64[ns]'`` and a :class:`PeriodIndex` will correctly raises ``TypeError`` (:issue:`18850`)
- Subtraction of :class:`Series` with timezone-aware ``dtype='datetime64[ns]'`` with mis-matched timezones will raise ``TypeError`` instead of ``ValueError`` (:issue:`18817`)
- :class:`Timestamp` will no longer silently ignore unused or invalid ``tz`` or ``tzinfo`` keyword arguments (:issue:`17690`)
- :class:`Timestamp` will no longer silently ignore invalid ``freq`` arguments (:issue:`5168`)
- :class:`CacheableOffset` and :class:`WeekDay` are no longer available in the ``pandas.tseries.offsets`` module (:issue:`17830`)
- ``pandas.tseries.frequencies.get_freq_group()`` and ``pandas.tseries.frequencies.DAYS`` are removed from the public API (:issue:`18034`)
- :func:`Series.truncate` and :func:`DataFrame.truncate` will raise a ``ValueError`` if the index is not sorted instead of an unhelpful ``KeyError`` (:issue:`17935`)
- :attr:`Series.first` and :attr:`DataFrame.first` will now raise a ``TypeError``
rather than ``NotImplementedError`` when index is not a :class:`DatetimeIndex` (:issue:`20725`).
- :attr:`Series.last` and :attr:`DataFrame.last` will now raise a ``TypeError``
rather than ``NotImplementedError`` when index is not a :class:`DatetimeIndex` (:issue:`20725`).
- Restricted ``DateOffset`` keyword arguments. Previously, ``DateOffset`` subclasses allowed arbitrary keyword arguments which could lead to unexpected behavior. Now, only valid arguments will be accepted. (:issue:`17176`, :issue:`18226`).
- :func:`pandas.merge` provides a more informative error message when trying to merge on timezone-aware and timezone-naive columns (:issue:`15800`)
- For :class:`DatetimeIndex` and :class:`TimedeltaIndex` with ``freq=None``, addition or subtraction of integer-dtyped array or ``Index`` will raise ``NullFrequencyError`` instead of ``TypeError`` (:issue:`19895`)
- :class:`Timestamp` constructor now accepts a `nanosecond` keyword or positional argument (:issue:`18898`)
- :class:`DatetimeIndex` will now raise an ``AttributeError`` when the ``tz`` attribute is set after instantiation (:issue:`3746`)
- :class:`DatetimeIndex` with a ``pytz`` timezone will now return a consistent ``pytz`` timezone (:issue:`18595`)

.. _whatsnew_0230.api.other:

Other API Changes
^^^^^^^^^^^^^^^^^

- :func:`Series.astype` and :func:`Index.astype` with an incompatible dtype will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`18231`)
- ``Series`` construction with an ``object`` dtyped tz-aware datetime and ``dtype=object`` specified, will now return an ``object`` dtyped ``Series``, previously this would infer the datetime dtype (:issue:`18231`)
- A :class:`Series` of ``dtype=category`` constructed from an empty ``dict`` will now have categories of ``dtype=object`` rather than ``dtype=float64``, consistently with the case in which an empty list is passed (:issue:`18515`)
- All-NaN levels in a ``MultiIndex`` are now assigned ``float`` rather than ``object`` dtype, promoting consistency with ``Index`` (:issue:`17929`).
- Levels names of a ``MultiIndex`` (when not None) are now required to be unique: trying to create a ``MultiIndex`` with repeated names will raise a ``ValueError`` (:issue:`18872`)
- Both construction and renaming of ``Index``/``MultiIndex`` with non-hashable ``name``/``names`` will now raise ``TypeError`` (:issue:`20527`)
- :func:`Index.map` can now accept ``Series`` and dictionary input objects (:issue:`12756`, :issue:`18482`, :issue:`18509`).
- :func:`DataFrame.unstack` will now default to filling with ``np.nan`` for ``object`` columns. (:issue:`12815`)
- :class:`IntervalIndex` constructor will raise if the ``closed`` parameter conflicts with how the input data is inferred to be closed (:issue:`18421`)
- Inserting missing values into indexes will work for all types of indexes and automatically insert the correct type of missing value (``NaN``, ``NaT``, etc.) regardless of the type passed in (:issue:`18295`)
- When created with duplicate labels, ``MultiIndex`` now raises a ``ValueError``. (:issue:`17464`)
- :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`)
- :func:`pandas.DataFrame.merge` no longer casts a ``float`` column to ``object`` when merging on ``int`` and ``float`` columns (:issue:`16572`)
- :func:`pandas.merge` now raises a ``ValueError`` when trying to merge on incompatible data types (:issue:`9780`)
- The default NA value for :class:`UInt64Index` has changed from 0 to ``NaN``, which impacts methods that mask with NA, such as ``UInt64Index.where()`` (:issue:`18398`)
- Refactored ``setup.py`` to use ``find_packages`` instead of explicitly listing out all subpackages (:issue:`18535`)
- Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`)
- :func:`wide_to_long` previously kept numeric-like suffixes as ``object`` dtype. Now they are cast to numeric if possible (:issue:`17627`)
- In :func:`read_excel`, the ``comment`` argument is now exposed as a named parameter (:issue:`18735`)
- Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`)
- The options ``html.border`` and ``mode.use_inf_as_null`` were deprecated in prior versions, these will now show ``FutureWarning`` rather than a ``DeprecationWarning`` (:issue:`19003`)
- :class:`IntervalIndex` and ``IntervalDtype`` no longer support categorical, object, and string subtypes (:issue:`19016`)
- ``IntervalDtype`` now returns ``True`` when compared against ``'interval'`` regardless of subtype, and ``IntervalDtype.name`` now returns ``'interval'`` regardless of subtype (:issue:`18980`)
- ``KeyError`` now raises instead of ``ValueError`` in :meth:`~DataFrame.drop`, :meth:`~Panel.drop`, :meth:`~Series.drop`, :meth:`~Index.drop` when dropping a non-existent element in an axis with duplicates (:issue:`19186`)
- :func:`Series.to_csv` now accepts a ``compression`` argument that works in the same way as the ``compression`` argument in :func:`DataFrame.to_csv` (:issue:`18958`)
- Set operations (union, difference...) on :class:`IntervalIndex` with incompatible index types will now raise a ``TypeError`` rather than a ``ValueError`` (:issue:`19329`)
- :class:`DateOffset` objects render more simply, e.g. ``<DateOffset: days=1>`` instead of ``<DateOffset: kwds={'days': 1}>`` (:issue:`19403`)
- ``Categorical.fillna`` now validates its ``value`` and ``method`` keyword arguments. It now raises when both or none are specified, matching the behavior of :meth:`Series.fillna` (:issue:`19682`)
- ``pd.to_datetime('today')`` now returns a datetime, consistent with ``pd.Timestamp('today')``; previously ``pd.to_datetime('today')`` returned a ``.normalized()`` datetime (:issue:`19935`)
- :func:`Series.str.replace` now takes an optional `regex` keyword which, when set to ``False``, uses literal string replacement rather than regex replacement (:issue:`16808`)
- :func:`DatetimeIndex.strftime` and :func:`PeriodIndex.strftime` now return an ``Index`` instead of a numpy array to be consistent with similar accessors (:issue:`20127`)
- Constructing a Series from a list of length 1 no longer broadcasts this list when a longer index is specified (:issue:`19714`, :issue:`20391`).
- :func:`DataFrame.to_dict` with ``orient='index'`` no longer casts int columns to float for a DataFrame with only int and float columns (:issue:`18580`)
- A user-defined-function that is passed to :func:`Series.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, :func:`DataFrame.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, or its expanding cousins, will now *always* be passed a ``Series``, rather than a ``np.array``; ``.apply()`` only has the ``raw`` keyword, see :ref:`here <whatsnew_0230.enhancements.window_raw>`. This is consistent with the signatures of ``.aggregate()`` across pandas (:issue:`20584`)
- Rolling and Expanding types raise ``NotImplementedError`` upon iteration (:issue:`11704`).

.. _whatsnew_0230.deprecations:

Deprecations
~~~~~~~~~~~~

- ``Series.from_array`` and ``SparseSeries.from_array`` are deprecated. Use the normal constructor ``Series(..)`` and ``SparseSeries(..)`` instead (:issue:`18213`).
- ``DataFrame.as_matrix`` is deprecated. Use ``DataFrame.values`` instead (:issue:`18458`).
- ``Series.asobject``, ``DatetimeIndex.asobject``, ``PeriodIndex.asobject`` and ``TimeDeltaIndex.asobject`` have been deprecated. Use ``.astype(object)`` instead (:issue:`18572`)
- Grouping by a tuple of keys now emits a ``FutureWarning`` and is deprecated.
In the future, a tuple passed to ``'by'`` will always refer to a single key
that is the actual tuple, instead of treating the tuple as multiple keys. To
retain the previous behavior, use a list instead of a tuple (:issue:`18314`)
- ``Series.valid`` is deprecated. Use :meth:`Series.dropna` instead (:issue:`18800`).
- :func:`read_excel` has deprecated the ``skip_footer`` parameter. Use ``skipfooter`` instead (:issue:`18836`)
- :meth:`ExcelFile.parse` has deprecated ``sheetname`` in favor of ``sheet_name`` for consistency with :func:`read_excel` (:issue:`20920`).
- The ``is_copy`` attribute is deprecated and will be removed in a future version (:issue:`18801`).
- ``IntervalIndex.from_intervals`` is deprecated in favor of the :class:`IntervalIndex` constructor (:issue:`19263`)
- ``DataFrame.from_items`` is deprecated. Use :func:`DataFrame.from_dict` instead, or ``DataFrame.from_dict(OrderedDict())`` if you wish to preserve the key order (:issue:`17320`, :issue:`17312`)
- Indexing a :class:`MultiIndex` or a :class:`FloatIndex` with a list containing some missing keys will now show a :class:`FutureWarning`, which is consistent with other types of indexes (:issue:`17758`).

- The ``broadcast`` parameter of ``.apply()`` is deprecated in favor of ``result_type='broadcast'`` (:issue:`18577`)
- The ``reduce`` parameter of ``.apply()`` is deprecated in favor of ``result_type='reduce'`` (:issue:`18577`)
- The ``order`` parameter of :func:`factorize` is deprecated and will be removed in a future release (:issue:`19727`)
- :attr:`Timestamp.weekday_name`, :attr:`DatetimeIndex.weekday_name`, and :attr:`Series.dt.weekday_name` are deprecated in favor of :meth:`Timestamp.day_name`, :meth:`DatetimeIndex.day_name`, and :meth:`Series.dt.day_name` (:issue:`12806`)

- ``pandas.tseries.plotting.tsplot`` is deprecated. Use :func:`Series.plot` instead (:issue:`18627`)
- ``Index.summary()`` is deprecated and will be removed in a future version (:issue:`18217`)
- ``NDFrame.get_ftype_counts()`` is deprecated and will be removed in a future version (:issue:`18243`)
- The ``convert_datetime64`` parameter in :func:`DataFrame.to_records` has been deprecated and will be removed in a future version. The NumPy bug motivating this parameter has been resolved. The default value for this parameter has also changed from ``True`` to ``None`` (:issue:`18160`).
- :func:`Series.rolling().apply() <pandas.core.window.Rolling.apply>`, :func:`DataFrame.rolling().apply() <pandas.core.window.Rolling.apply>`,
:func:`Series.expanding().apply() <pandas.core.window.Expanding.apply>`, and :func:`DataFrame.expanding().apply() <pandas.core.window.Expanding.apply>` have deprecated passing an ``np.array`` by default. One will need to pass the new ``raw`` parameter to be explicit about what is passed (:issue:`20584`)
- The ``data``, ``base``, ``strides``, ``flags`` and ``itemsize`` properties
of the ``Series`` and ``Index`` classes have been deprecated and will be
removed in a future version (:issue:`20419`).
- ``DatetimeIndex.offset`` is deprecated. Use ``DatetimeIndex.freq`` instead (:issue:`20716`)
- Floor division between an integer ndarray and a :class:`Timedelta` is deprecated. Divide by :attr:`Timedelta.value` instead (:issue:`19761`)
- Setting ``PeriodIndex.freq`` (which was not guaranteed to work correctly) is deprecated. Use :meth:`PeriodIndex.asfreq` instead (:issue:`20678`)
- ``Index.get_duplicates()`` is deprecated and will be removed in a future version (:issue:`20239`)
- The previous default behavior of negative indices in ``Categorical.take`` is deprecated. In a future version it will change from meaning missing values to meaning positional indices from the right. The future behavior is consistent with :meth:`Series.take` (:issue:`20664`).
- Passing multiple axes to the ``axis`` parameter in :func:`DataFrame.dropna` has been deprecated and will be removed in a future version (:issue:`20987`)


.. _whatsnew_0230.prior_deprecations:

Removal of prior version deprecations/changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Warnings against the obsolete usage ``Categorical(codes, categories)``, which were emitted for instance when the first two arguments to ``Categorical()`` had different dtypes, and recommended the use of ``Categorical.from_codes``, have now been removed (:issue:`8074`)
- The ``levels`` and ``labels`` attributes of a ``MultiIndex`` can no longer be set directly (:issue:`4039`).
- ``pd.tseries.util.pivot_annual`` has been removed (deprecated since v0.19). Use ``pivot_table`` instead (:issue:`18370`)
- ``pd.tseries.util.isleapyear`` has been removed (deprecated since v0.19). Use ``.is_leap_year`` property in Datetime-likes instead (:issue:`18370`)
- ``pd.ordered_merge`` has been removed (deprecated since v0.19). Use ``pd.merge_ordered`` instead (:issue:`18459`)
- The ``SparseList`` class has been removed (:issue:`14007`)
- The ``pandas.io.wb`` and ``pandas.io.data`` stub modules have been removed (:issue:`13735`)
- ``Categorical.from_array`` has been removed (:issue:`13854`)
- The ``freq`` and ``how`` parameters have been removed from the ``rolling``/``expanding``/``ewm`` methods of DataFrame
and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:`18601` & :issue:`18668`)
- ``DatetimeIndex.to_datetime``, ``Timestamp.to_datetime``, ``PeriodIndex.to_datetime``, and ``Index.to_datetime`` have been removed (:issue:`8254`, :issue:`14096`, :issue:`14113`)
- :func:`read_csv` has dropped the ``skip_footer`` parameter (:issue:`13386`)
- :func:`read_csv` has dropped the ``as_recarray`` parameter (:issue:`13373`)
- :func:`read_csv` has dropped the ``buffer_lines`` parameter (:issue:`13360`)
- :func:`read_csv` has dropped the ``compact_ints`` and ``use_unsigned`` parameters (:issue:`13323`)
- The ``Timestamp`` class has dropped the ``offset`` attribute in favor of ``freq`` (:issue:`13593`)
- The ``Series``, ``Categorical``, and ``Index`` classes have dropped the ``reshape`` method (:issue:`13012`)
- ``pandas.tseries.frequencies.get_standard_freq`` has been removed in favor of ``pandas.tseries.frequencies.to_offset(freq).rule_code`` (:issue:`13874`)
- The ``freqstr`` keyword has been removed from ``pandas.tseries.frequencies.to_offset`` in favor of ``freq`` (:issue:`13874`)
- The ``Panel4D`` and ``PanelND`` classes have been removed (:issue:`13776`)
- The ``Panel`` class has dropped the ``to_long`` and ``toLong`` methods (:issue:`19077`)
- The options ``display.line_with`` and ``display.height`` are removed in favor of ``display.width`` and ``display.max_rows`` respectively (:issue:`4391`, :issue:`19107`)
- The ``labels`` attribute of the ``Categorical`` class has been removed in favor of :attr:`Categorical.codes` (:issue:`7768`)
- The ``flavor`` parameter have been removed from func:`to_sql` method (:issue:`13611`)
- The modules ``pandas.tools.hashing`` and ``pandas.util.hashing`` have been removed (:issue:`16223`)
- The top-level functions ``pd.rolling_*``, ``pd.expanding_*`` and ``pd.ewm*`` have been removed (Deprecated since v0.18).
Instead, use the DataFrame/Series methods :attr:`~DataFrame.rolling`, :attr:`~DataFrame.expanding` and :attr:`~DataFrame.ewm` (:issue:`18723`)
- Imports from ``pandas.core.common`` for functions such as ``is_datetime64_dtype`` are now removed. These are located in ``pandas.api.types``. (:issue:`13634`, :issue:`19769`)
- The ``infer_dst`` keyword in :meth:`Series.tz_localize`, :meth:`DatetimeIndex.tz_localize`
and :class:`DatetimeIndex` have been removed. ``infer_dst=True`` is equivalent to
``ambiguous='infer'``, and ``infer_dst=False`` to ``ambiguous='raise'`` (:issue:`7963`).
- When ``.resample()`` was changed from an eager to a lazy operation, like ``.groupby()`` in v0.18.0, we put in place compatibility (with a ``FutureWarning``),
so operations would continue to work. This is now fully removed, so a ``Resampler`` will no longer forward compat operations (:issue:`20554`)
- Remove long deprecated ``axis=None`` parameter from ``.replace()`` (:issue:`20271`)

.. _whatsnew_0230.performance:

Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Indexers on ``Series`` or ``DataFrame`` no longer create a reference cycle (:issue:`17956`)
- Added a keyword argument, ``cache``, to :func:`to_datetime` that improved the performance of converting duplicate datetime arguments (:issue:`11665`)
- :class:`DateOffset` arithmetic performance is improved (:issue:`18218`)
- Converting a ``Series`` of ``Timedelta`` objects to days, seconds, etc... sped up through vectorization of underlying methods (:issue:`18092`)
- Improved performance of ``.map()`` with a ``Series/dict`` input (:issue:`15081`)
- The overridden ``Timedelta`` properties of days, seconds and microseconds have been removed, leveraging their built-in Python versions instead (:issue:`18242`)
- ``Series`` construction will reduce the number of copies made of the input data in certain cases (:issue:`17449`)
- Improved performance of :func:`Series.dt.date` and :func:`DatetimeIndex.date` (:issue:`18058`)
- Improved performance of :func:`Series.dt.time` and :func:`DatetimeIndex.time` (:issue:`18461`)
- Improved performance of :func:`IntervalIndex.symmetric_difference()` (:issue:`18475`)
- Improved performance of ``DatetimeIndex`` and ``Series`` arithmetic operations with Business-Month and Business-Quarter frequencies (:issue:`18489`)
- :func:`Series` / :func:`DataFrame` tab completion limits to 100 values, for better performance. (:issue:`18587`)
- Improved performance of :func:`DataFrame.median` with ``axis=1`` when bottleneck is not installed (:issue:`16468`)
- Improved performance of :func:`MultiIndex.get_loc` for large indexes, at the cost of a reduction in performance for small ones (:issue:`18519`)
- Improved performance of :func:`MultiIndex.remove_unused_levels` when there are no unused levels, at the cost of a reduction in performance when there are (:issue:`19289`)
- Improved performance of :func:`Index.get_loc` for non-unique indexes (:issue:`19478`)
- Improved performance of pairwise ``.rolling()`` and ``.expanding()`` with ``.cov()`` and ``.corr()`` operations (:issue:`17917`)
- Improved performance of :func:`pandas.core.groupby.GroupBy.rank` (:issue:`15779`)
- Improved performance of variable ``.rolling()`` on ``.min()`` and ``.max()`` (:issue:`19521`)
- Improved performance of :func:`pandas.core.groupby.GroupBy.ffill` and :func:`pandas.core.groupby.GroupBy.bfill` (:issue:`11296`)
- Improved performance of :func:`pandas.core.groupby.GroupBy.any` and :func:`pandas.core.groupby.GroupBy.all` (:issue:`15435`)
- Improved performance of :func:`pandas.core.groupby.GroupBy.pct_change` (:issue:`19165`)
- Improved performance of :func:`Series.isin` in the case of categorical dtypes (:issue:`20003`)
- Improved performance of ``getattr(Series, attr)`` when the Series has certain index types. This manifested in slow printing of large Series with a ``DatetimeIndex`` (:issue:`19764`)
- Fixed a performance regression for :func:`GroupBy.nth` and :func:`GroupBy.last` with some object columns (:issue:`19283`)
- Improved performance of :func:`pandas.core.arrays.Categorical.from_codes` (:issue:`18501`)

.. _whatsnew_0230.docs:

Documentation Changes
~~~~~~~~~~~~~~~~~~~~~

Thanks to all of the contributors who participated in the Pandas Documentation
Sprint, which took place on March 10th. We had about 500 participants from over
30 locations across the world. You should notice that many of the
:ref:`API docstrings <api>` have greatly improved.

There were too many simultaneous contributions to include a release note for each
improvement, but this `GitHub search`_ should give you an idea of how many docstrings
were improved.

Special thanks to `Marc Garcia`_ for organizing the sprint. For more information,
read the `NumFOCUS blogpost`_ recapping the sprint.

.. _GitHub search: https://github.com/pandas-dev/pandas/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3ADocs+created%3A2018-03-10..2018-03-15+
.. _NumFOCUS blogpost: https://www.numfocus.org/blog/worldwide-pandas-sprint/
.. _Marc Garcia: https://github.com/datapythonista

- Changed spelling of "numpy" to "NumPy", and "python" to "Python". (:issue:`19017`)
- Consistency when introducing code samples, using either colon or period.
Rewrote some sentences for greater clarity, added more dynamic references
to functions, methods and classes.
(:issue:`18941`, :issue:`18948`, :issue:`18973`, :issue:`19017`)
- Added a reference to :func:`DataFrame.assign` in the concatenate section of the merging documentation (:issue:`18665`)

.. _whatsnew_0230.bug_fixes:

Bug Fixes
~~~~~~~~~

Categorical
^^^^^^^^^^^

.. warning::

A class of bugs were introduced in pandas 0.21 with ``CategoricalDtype`` that
affects the correctness of operations like ``merge``, ``concat``, and
indexing when comparing multiple unordered ``Categorical`` arrays that have
the same categories, but in a different order. We highly recommend upgrading
or manually aligning your categories before doing these operations.

- Bug in ``Categorical.equals`` returning the wrong result when comparing two
unordered ``Categorical`` arrays with the same categories, but in a different
order (:issue:`16603`)
- Bug in :func:`pandas.api.types.union_categoricals` returning the wrong result
when for unordered categoricals with the categories in a different order.
This affected :func:`pandas.concat` with Categorical data (:issue:`19096`).
- Bug in :func:`pandas.merge` returning the wrong result when joining on an
unordered ``Categorical`` that had the same categories but in a different
order (:issue:`19551`)
- Bug in :meth:`CategoricalIndex.get_indexer` returning the wrong result when
``target`` was an unordered ``Categorical`` that had the same categories as
``self`` but in a different order (:issue:`19551`)
- Bug in :meth:`Index.astype` with a categorical dtype where the resultant index is not converted to a :class:`CategoricalIndex` for all types of index (:issue:`18630`)
- Bug in :meth:`Series.astype` and ``Categorical.astype()`` where an existing categorical data does not get updated (:issue:`10696`, :issue:`18593`)
- Bug in :meth:`Series.str.split` with ``expand=True`` incorrectly raising an IndexError on empty strings (:issue:`20002`).
- Bug in :class:`Index` constructor with ``dtype=CategoricalDtype(...)`` where ``categories`` and ``ordered`` are not maintained (:issue:`19032`)
- Bug in :class:`Series` constructor with scalar and ``dtype=CategoricalDtype(...)`` where ``categories`` and ``ordered`` are not maintained (:issue:`19565`)
- Bug in ``Categorical.__iter__`` not converting to Python types (:issue:`19909`)
- Bug in :func:`pandas.factorize` returning the unique codes for the ``uniques``. This now returns a ``Categorical`` with the same dtype as the input (:issue:`19721`)
- Bug in :func:`pandas.factorize` including an item for missing values in the ``uniques`` return value (:issue:`19721`)
- Bug in :meth:`Series.take` with categorical data interpreting ``-1`` in `indices` as missing value markers, rather than the last element of the Series (:issue:`20664`)

Datetimelike
^^^^^^^^^^^^

- Bug in :func:`Series.__sub__` subtracting a non-nanosecond ``np.datetime64`` object from a ``Series`` gave incorrect results (:issue:`7996`)
- Bug in :class:`DatetimeIndex`, :class:`TimedeltaIndex` addition and subtraction of zero-dimensional integer arrays gave incorrect results (:issue:`19012`)
- Bug in :class:`DatetimeIndex` and :class:`TimedeltaIndex` where adding or subtracting an array-like of ``DateOffset`` objects either raised (``np.array``, ``pd.Index``) or broadcast incorrectly (``pd.Series``) (:issue:`18849`)
- Bug in :func:`Series.__add__` adding Series with dtype ``timedelta64[ns]`` to a timezone-aware ``DatetimeIndex`` incorrectly dropped timezone information (:issue:`13905`)
- Adding a ``Period`` object to a ``datetime`` or ``Timestamp`` object will now correctly raise a ``TypeError`` (:issue:`17983`)
- Bug in :class:`Timestamp` where comparison with an array of ``Timestamp`` objects would result in a ``RecursionError`` (:issue:`15183`)
- Bug in :class:`Series` floor
...

Resolves #142
Resolves #192

@Harmon758 Harmon758 merged commit bbf9395 into rewrite Jun 14, 2018
@Harmon758 Harmon758 deleted the pyup-update-pandas-0.22.0-to-0.23.1 branch June 14, 2018 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pandas versions available: 0.23.1 pandas versions available: 0.23.0
2 participants