Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored Resample API breaking change #11841

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1729,6 +1729,65 @@ The following methods are available only for ``DataFrameGroupBy`` objects.
DataFrameGroupBy.corrwith
DataFrameGroupBy.boxplot

Resampling
----------
.. currentmodule:: pandas.tseries.resample

Resampler objects are returned by resample calls: :func:`pandas.DataFrame.resample`, :func:`pandas.Series.resample`.

Indexing, iteration
~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/

Resampler.__iter__
Resampler.groups
Resampler.indices
Resampler.get_group

Function application
~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/

Resampler.apply
Resampler.aggregate
Resampler.transform

Upsampling
~~~~~~~~~~

.. autosummary::
:toctree: generated/

Resampler.ffill
Resampler.backfill
Resampler.bfill
Resampler.pad
Resampler.fillna
Resampler.asfreq

Computations / Descriptive Stats
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: generated/

Resampler.count
Resampler.nunique
Resampler.first
Resampler.last
Resampler.max
Resampler.mean
Resampler.median
Resampler.min
Resampler.ohlc
Resampler.prod
Resampler.size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also add count and nunique

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count there, added nunique

Resampler.sem
Resampler.std
Resampler.sum
Resampler.var

Style
-----
.. currentmodule:: pandas.core.style
Expand Down
2 changes: 1 addition & 1 deletion doc/source/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -567,7 +567,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
return pd.NaT

mhc = {'Mean' : np.mean, 'Max' : np.max, 'Custom' : MyCust}
ts.resample("5min",how = mhc)
ts.resample("5min").apply(mhc)
ts

`Create a value counts column and reassign back to the DataFrame
Expand Down
7 changes: 6 additions & 1 deletion doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,12 @@ users upgrade to this version.

Highlights include:

See the :ref:`v0.17.0 Whatsnew <whatsnew_0180>` overview for an extensive list
Highlights include:

- Window functions are now methods on ``.groupby`` like objects, see :ref:`here <whatsnew_0180.enhancements.moments>`.
- API breaking ``.resample`` changes to make it more ``.groupby`` like, see :ref:`here <whatsnew_0180.resample>`.

See the :ref:`v0.18.0 Whatsnew <whatsnew_0180>` overview for an extensive list
of all enhancements and bugs that have been fixed in 0.17.1.

Thanks
Expand Down
2 changes: 1 addition & 1 deletion doc/source/timedeltas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -401,4 +401,4 @@ Similar to :ref:`timeseries resampling <timeseries.resampling>`, we can resample

.. ipython:: python

s.resample('D')
s.resample('D').mean()
104 changes: 88 additions & 16 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Resample:
.. ipython:: python

# Daily means
ts.resample('D', how='mean')
ts.resample('D').mean()


.. _timeseries.overview:
Expand Down Expand Up @@ -1211,6 +1211,11 @@ Converting to Python datetimes
Resampling
----------

.. warning::

The interface to ``.resample`` has changed in 0.18.0 to be more groupby-like and hence more flexible.
See the :ref:`whatsnew docs <whatsnew_0180.breaking.resample>` for a comparison with prior versions.

Pandas has a simple, powerful, and efficient functionality for
performing resampling operations during frequency conversion (e.g., converting
secondly data into 5-minutely data). This is extremely common in, but not
Expand All @@ -1226,7 +1231,7 @@ See some :ref:`cookbook examples <cookbook.resample>` for some advanced strategi

ts = Series(randint(0, 500, len(rng)), index=rng)

ts.resample('5Min', how='sum')
ts.resample('5Min').sum()

The ``resample`` function is very flexible and allows you to specify many
different parameters to control the frequency conversion and resampling
Expand All @@ -1237,11 +1242,11 @@ an array and produces aggregated values:

.. ipython:: python

ts.resample('5Min') # default is mean
ts.resample('5Min').mean()

ts.resample('5Min', how='ohlc')
ts.resample('5Min').ohlc()

ts.resample('5Min', how=np.max)
ts.resample('5Min').max()

Any function available via :ref:`dispatching <groupby.dispatch>` can be given to
the ``how`` parameter by name, including ``sum``, ``mean``, ``std``, ``sem``,
Expand All @@ -1252,9 +1257,9 @@ end of the interval is closed:

.. ipython:: python

ts.resample('5Min', closed='right')
ts.resample('5Min', closed='right').mean()

ts.resample('5Min', closed='left')
ts.resample('5Min', closed='left').mean()

Parameters like ``label`` and ``loffset`` are used to manipulate the resulting
labels. ``label`` specifies whether the result is labeled with the beginning or
Expand All @@ -1263,11 +1268,11 @@ labels.

.. ipython:: python

ts.resample('5Min') # by default label='right'
ts.resample('5Min').mean() # by default label='right'

ts.resample('5Min', label='left')
ts.resample('5Min', label='left').mean()

ts.resample('5Min', label='left', loffset='1s')
ts.resample('5Min', label='left', loffset='1s').mean()

The ``axis`` parameter can be set to 0 or 1 and allows you to resample the
specified axis for a DataFrame.
Expand All @@ -1284,18 +1289,17 @@ frequency periods.
Up Sampling
~~~~~~~~~~~

For upsampling, the ``fill_method`` and ``limit`` parameters can be specified
to interpolate over the gaps that are created:
For upsampling, you can specify a way to upsample and the ``limit`` parameter to interpolate over the gaps that are created:

.. ipython:: python

# from secondly to every 250 milliseconds

ts[:2].resample('250L')
ts[:2].resample('250L').asfreq()

ts[:2].resample('250L', fill_method='pad')
ts[:2].resample('250L').ffill()

ts[:2].resample('250L', fill_method='pad', limit=2)
ts[:2].resample('250L').ffill(limit=2)

Sparse Resampling
~~~~~~~~~~~~~~~~~
Expand All @@ -1317,7 +1321,7 @@ If we want to resample to the full range of the series

.. ipython:: python

ts.resample('3T',how='sum')
ts.resample('3T').sum()

We can instead only resample those groups where we have points as follows:

Expand All @@ -1333,6 +1337,74 @@ We can instead only resample those groups where we have points as follows:

ts.groupby(partial(round, freq='3T')).sum()

Aggregation
~~~~~~~~~~~

Similar to :ref:`groupby aggregates <groupby.aggregate>` and the :ref:`window functions <stats.aggregate>`, a ``Resampler`` can be selectively
resampled.

Resampling a ``DataFrame``, the default will be to act on all columns with the same function.

.. ipython:: python

df = pd.DataFrame(np.random.randn(1000, 3),
index=pd.date_range('1/1/2012', freq='S', periods=1000),
columns=['A', 'B', 'C'])
r = df.resample('3T')
r.mean()

We can select a specific column or columns using standard getitem.

.. ipython:: python

r['A'].mean()

r[['A','B']].mean()

You can pass a list or dict of functions to do aggregation with, outputting a DataFrame:

.. ipython:: python

r['A'].agg([np.sum, np.mean, np.std])

If a dict is passed, the keys will be used to name the columns. Otherwise the
function's name (stored in the function object) will be used.

.. ipython:: python

r['A'].agg({'result1' : np.sum,
'result2' : np.mean})

On a resampled DataFrame, you can pass a list of functions to apply to each
column, which produces an aggregated result with a hierarchical index:

.. ipython:: python

r.agg([np.sum, np.mean])

By passing a dict to ``aggregate`` you can apply a different aggregation to the
columns of a DataFrame:

.. ipython:: python
:okexcept:

r.agg({'A' : np.sum,
'B' : lambda x: np.std(x, ddof=1)})

The function names can also be strings. In order for a string to be valid it
must be implemented on the Resampled object

.. ipython:: python

r.agg({'A' : 'sum', 'B' : 'std'})

Furthermore, you can also specify multiple aggregation functions for each column separately.

.. ipython:: python

r.agg({'A' : ['sum','std'], 'B' : ['mean','std'] })


.. _timeseries.periods:

Time Span Representation
Expand Down
64 changes: 53 additions & 11 deletions doc/source/whatsnew/v0.10.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -70,16 +70,59 @@ nfrequencies are unaffected. The prior defaults were causing a great deal of
confusion for users, especially resampling data to daily frequency (which
labeled the aggregated group with the end of the interval: the next day).

Note:

.. ipython:: python

dates = pd.date_range('1/1/2000', '1/5/2000', freq='4h')
series = Series(np.arange(len(dates)), index=dates)
series
series.resample('D', how='sum')
# old behavior
series.resample('D', how='sum', closed='right', label='right')
.. code-block:: python

In [1]: dates = pd.date_range('1/1/2000', '1/5/2000', freq='4h')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to change it here (as you do it in other places as well) , but just a general comment as this is something that has been bothering me for a time :-)
Can we use ISO formatted dates in our docs? It's just that this way a European person always has to think a second longer about what date it actually represents ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, np. I think I just copy pasted from somewhere else :)


In [2]: series = Series(np.arange(len(dates)), index=dates)

In [3]: series
Out[3]:
2000-01-01 00:00:00 0
2000-01-01 04:00:00 1
2000-01-01 08:00:00 2
2000-01-01 12:00:00 3
2000-01-01 16:00:00 4
2000-01-01 20:00:00 5
2000-01-02 00:00:00 6
2000-01-02 04:00:00 7
2000-01-02 08:00:00 8
2000-01-02 12:00:00 9
2000-01-02 16:00:00 10
2000-01-02 20:00:00 11
2000-01-03 00:00:00 12
2000-01-03 04:00:00 13
2000-01-03 08:00:00 14
2000-01-03 12:00:00 15
2000-01-03 16:00:00 16
2000-01-03 20:00:00 17
2000-01-04 00:00:00 18
2000-01-04 04:00:00 19
2000-01-04 08:00:00 20
2000-01-04 12:00:00 21
2000-01-04 16:00:00 22
2000-01-04 20:00:00 23
2000-01-05 00:00:00 24
Freq: 4H, dtype: int64

In [4]: series.resample('D', how='sum')
Out[4]:
2000-01-01 15
2000-01-02 51
2000-01-03 87
2000-01-04 123
2000-01-05 24
Freq: D, dtype: int64

In [5]: # old behavior
In [6]: series.resample('D', how='sum', closed='right', label='right')
Out[6]:
2000-01-01 0
2000-01-02 21
2000-01-03 57
2000-01-04 93
2000-01-05 129
Freq: D, dtype: int64

- Infinity and negative infinity are no longer treated as NA by ``isnull`` and
``notnull``. That they ever were was a relic of early pandas. This behavior
Expand Down Expand Up @@ -354,4 +397,3 @@ Adding experimental support for Panel4D and factory functions to create n-dimens
See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list.

Loading