Refactored Resample API breaking change #11841

jreback · 2015-12-14T23:14:06Z

~~on top of #11603~~

closes #11732
closes #12072
closes #9052
closes #12140

ToDo:

rewrite/expand main docs
add aggregate section

New API

In [4]: np.random.seed(1234)

In [5]: df = pd.DataFrame(np.random.rand(10,4),
                     columns=list('ABCD'),
                     index=pd.date_range('2010-01-01 09:00:00', periods=10, freq='s'))

In [6]: df
Out[6]: 
                            A         B         C         D
2010-01-01 09:00:00  0.191519  0.622109  0.437728  0.785359
2010-01-01 09:00:01  0.779976  0.272593  0.276464  0.801872
2010-01-01 09:00:02  0.958139  0.875933  0.357817  0.500995
2010-01-01 09:00:03  0.683463  0.712702  0.370251  0.561196
2010-01-01 09:00:04  0.503083  0.013768  0.772827  0.882641
2010-01-01 09:00:05  0.364886  0.615396  0.075381  0.368824
2010-01-01 09:00:06  0.933140  0.651378  0.397203  0.788730
2010-01-01 09:00:07  0.316836  0.568099  0.869127  0.436173
2010-01-01 09:00:08  0.802148  0.143767  0.704261  0.704581
2010-01-01 09:00:09  0.218792  0.924868  0.442141  0.909316

In [7]: df.resample('2s')
Out[7]: DatetimeIndexResampler [freq=<2 * Seconds>,axis=0,closed=left,label=left,convention=start,base=0]

In [8]: r = df.resample('2s')

In [9]: r.sum()
Out[9]: 
                            A         B         C         D
2010-01-01 09:00:00  0.971495  0.894701  0.714192  1.587231
2010-01-01 09:00:02  1.641602  1.588635  0.728068  1.062191
2010-01-01 09:00:04  0.867969  0.629165  0.848208  1.251465
2010-01-01 09:00:06  1.249976  1.219477  1.266330  1.224904
2010-01-01 09:00:08  1.020940  1.068634  1.146402  1.613897

In [10]: r[['A','B']].agg(['mean','sum'])
Out[10]: 
                            A                   B          
                         mean       sum      mean       sum
2010-01-01 09:00:00  0.485748  0.971495  0.447351  0.894701
2010-01-01 09:00:02  0.820801  1.641602  0.794317  1.588635
2010-01-01 09:00:04  0.433985  0.867969  0.314582  0.629165
2010-01-01 09:00:06  0.624988  1.249976  0.609738  1.219477
2010-01-01 09:00:08  0.510470  1.020940  0.534317  1.068634

Upsampling

In [11]: s = Series(np.arange(5,dtype='int64'),
   ....:               index=date_range('2010-01-01', periods=5, freq='Q'))

In [12]: s
Out[12]: 
2010-03-31    0
2010-06-30    1
2010-09-30    2
2010-12-31    3
2011-03-31    4
Freq: Q-DEC, dtype: int64

In [13]: s.resample('M').ffill()
Out[13]: 
2010-03-31    0
2010-04-30    0
2010-05-31    0
2010-06-30    1
2010-07-31    1
2010-08-31    1
2010-09-30    2
2010-10-31    2
2010-11-30    2
2010-12-31    3
2011-01-31    3
2011-02-28    3
2011-03-31    4
Freq: M, dtype: int64

In [14]: s.resample('M').asfreq()
Out[14]: 
2010-03-31     0
2010-04-30   NaN
2010-05-31   NaN
2010-06-30     1
2010-07-31   NaN
2010-08-31   NaN
2010-09-30     2
2010-10-31   NaN
2010-11-30   NaN
2010-12-31     3
2011-01-31   NaN
2011-02-28   NaN
2011-03-31     4
Freq: M, dtype: float64

jorisvandenbossche · 2015-12-15T12:21:45Z

Although I really like the fact of a more consistent API, I think this is so backwards incompatible that it really is problematic to just put it in a release like this.

I also kind of like the simplicity of a basic resample to explain in tutorials (although once it is not a basic resample anymore, this interface is nicer). A two-step method is more complicated than a 1-step..

Regarding to the back compat issue, there are maybe other ways to solve this? Eg, I think of using another name, some keyword indicating the behaviour (but this will get ugly), ....

jreback · 2015-12-15T12:33:37Z

Although I really like the fact of a more consistent API, I think this is so backwards incompatible that it really is problematic to just put it in a release like this.

pls explain. The point of breaking compat is to change future behavior. breaking changes always have short term pain, but shying away from actual long time needed inconsistencies is MUCH MUCH worse.

jreback · 2015-12-15T12:47:48Z

yes we could add a .resample2 method which is the new impl, and preserve the original with a deprecation warning. Then of course we'd have to maintain .resample2 for a bit.

shoyer · 2015-12-15T17:24:20Z

One thing we could do to help preserve backwards compat is to keep around how for now as a deprecated argument. If how is set, raise a deprecation warning and preserve the original behavior. This would at least keep cases like s.resample('24H', how='max') working, though it would indeed break s.resample('24H').

jreback · 2015-12-15T23:17:49Z

I already do the deprecation warning (and just return the result) if how is actually specified.

The only case this is actually breaking is if NO how is specified at all (which is prob very common)

In [1]: s = Series(range(5))

In [2]: s = Series(range(5),index=date_range('20130101',periods=5,freq='s'))

In [3]: s.resample('D',how='min')
/Users/jreback/miniconda/bin/ipython:1: FutureWarning: how is .resample()
the new syntax is .resample(...).min()
  #!/bin/bash /Users/jreback/miniconda/bin/python.app
Out[3]: 
2013-01-01    0
Freq: D, dtype: int64

In [4]: s.resample('D')
Out[4]: DatetimeIndexResampler [freq-><Day>,axis->0,closed->left,label->left,convention->start,base->0]

shoyer · 2015-12-15T23:50:50Z

This might be too magical and/or tricky to pull off, but we might add fallback methods to Resampler corresponding to the Series/DataFrame API that issue a deprecation warning, and then call .mean() followed by the desired operation.

jreback · 2015-12-16T00:37:11Z

hacking Resampler.__getattr__ gets almost all the way there, see here

In [1]: s = Series(range(5),index=date_range('20130101',periods=5,freq='s'))

In [2]: s.resample('D').ix[0]
pandas/tseries/resample.py:73: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
  return getattr(self._deprecate_api(),attr)
Out[2]: 2

In [2]: s.resample('D').sort_values()
pandas/tseries/resample.py:73: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
  return getattr(self._deprecate_api(),attr)
Out[2]: 
2013-01-01    2
dtype: int64

shoyer · 2015-12-16T01:53:06Z

@jreback nice. One thing we'll have to catch explicitly is __setitem__ and other assignment or in-place operations. We need to catch those operations so they don't silently fail.

jreback · 2015-12-16T22:56:48Z

so defined all of the arithmetic ops / comparison ops and instance checking
seems that we can masquerade as a Series (or DataFrame) pretty much, or at least
to work and show the deprecation warning.

In [1]: s = Series(range(5),index=date_range('20130101',periods=5,freq='s'))

In [2]: r = s.resample('H')

In [3]: r>2
pandas/tseries/resample.py:80: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
  result = self._deprecated()
Out[3]: 
2013-01-01    False
Freq: H, dtype: bool

In [3]: r*2
pandas/tseries/resample.py:80: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
  result = self._deprecated()
Out[3]: 
2013-01-01    4
Freq: H, dtype: int64

jreback · 2015-12-16T23:00:53Z

so its actually quite a tricky problem to catch this. imagine you had this code originally

r = series.resample('H')
r.iloc[0] = 5

The warning will show up, but the setting is actually on a copy. I prob should just raise.

shoyer · 2015-12-16T23:13:53Z

The warning will show up, but the setting is actually on a copy. I prob should just raise.

Indeed, this needs to raise. I would say this raising is more important than adding the Series/DataFrame facade. If we don't have a facade, things will fail loudly, which is unfortunate, but if we have a broken facade, things will fail silently!

jreback · 2015-12-17T11:59:43Z

@jorisvandenbossche how are we removing deprecation warnings on older whatsnew files?
I can just change the code I guess

jorisvandenbossche · 2015-12-17T12:27:59Z

@jreback See #6856 (comment) for general discussion on this.
I am personally in favor of converting the older whatsnew files to static code-blocks (and acknowledging that it are 'historic' pages that do not necessarily reflect the current state of the package).

For now, adding :okwarning: to the ipython blocks will suppress the warnings during doc building.

jorisvandenbossche · 2015-12-18T16:16:07Z

Although I really like the fact of a more consistent API, I think this is so backwards incompatible that it really is problematic to just put it in a release like this.

pls explain. The point of breaking compat is to change future behavior. breaking changes always have short term pain, but shying away from actual long time needed inconsistencies is MUCH MUCH worse.

To answer this: of course we have to keep a right balance between avoiding breaking changes and improving pandas. But that is not always very clear what the right balance is, and I could also imagine other parts of pandas that could use some breaking changes, but that we are reluctant about to do because of its impact (not to mention the indexing api ... :-))
And most API changes we do are not breaking changes (but deprecations), and are most of the time more corner cases (but OK, I work with time series and use resample a lot, so maybe I am a bit biased)

But, as you now updated to include a deprecation warning or fail with an informative error, it sounds already a lot better to me! (at least there will be no silent failure, that's probably the most important)

Maybe I will ping the mailing list about this, as it is a rather large change?

jreback · 2015-12-18T18:17:44Z

@jorisvandenbossche you e-mail to the list is good. thanks.

I agree about most changes, but some things just need to be more consistent, this and the window functions are in the groupby pattern (which is quite in-grained in pandas). So to me this is a no-brainer, EVEN if we had to break back-compat (in this case was able to provide a nice easy upgrade path so that makes this pretty easy to swallow IMHO).

if you want to review, this is also ready to go.

jorisvandenbossche · 2016-01-26T13:09:26Z

Some more usage feedback:

Custom agg/apply is broken for PeriodIndex:

s2 = pd.Series(np.random.randint(0,5,50),
           index=pd.period_range('2012-01-01', freq='H', periods=50))

In [29]: s2.resample('D').agg(lambda x: x.mean())
AttributeError: 'PeriodIndexResampler' object has no attribute 'grouper'

This worked before, so was probably not tested.

reindex is doing something strange:

In [37]: rng = pd.date_range('1/1/2012', periods=100, freq='S')

In [38]: ts = pd.Series(np.arange(len(rng)), index=rng)

In [40]: ts.resample('15s').reindex()
Out[40]:
2012-01-01 00:00:00     7.0
2012-01-01 00:00:15    22.0
2012-01-01 00:00:30    37.0
2012-01-01 00:00:45    52.0
2012-01-01 00:01:00    67.0
2012-01-01 00:01:15    82.0
2012-01-01 00:01:30    94.5
Freq: 15S, dtype: float64

In [42]: ts.resample('15s').mean()
Out[42]:
2012-01-01 00:00:00     7.0
2012-01-01 00:00:15    22.0
2012-01-01 00:00:30    37.0
2012-01-01 00:00:45    52.0
2012-01-01 00:01:00    67.0
2012-01-01 00:01:15    82.0
2012-01-01 00:01:30    94.5
Freq: 15S, dtype: float64

In [43]: ts.resample('15s').asfreq()
Out[43]:
2012-01-01 00:00:00     0
2012-01-01 00:00:15    15
2012-01-01 00:00:30    30
2012-01-01 00:00:45    45
2012-01-01 00:01:00    60
2012-01-01 00:01:15    75
2012-01-01 00:01:30    90
Freq: 15S, dtype: int32

So it returns the same as mean, although I would rather expect the same as asfreq ?

Combined up and downsampling (due to an irregular time series) has changed behaviour. Using:

rng = pd.date_range('1/1/2012', periods=100, freq='S')
ts = pd.Series(np.arange(len(rng)), index=rng)
ts2 = ts.iloc[[0,1,2,3,5,7,11,15,16,25,30]]

With 0.17.1:

In [32]: pd.__version__
Out[32]: u'0.17.1'

In [33]: ts2.resample('2s', how='mean', fill_method='ffill')
Out[33]:
2012-01-01 00:00:00     0.5
2012-01-01 00:00:02     2.5
2012-01-01 00:00:04     5.0
2012-01-01 00:00:06     7.0
2012-01-01 00:00:08     7.0
2012-01-01 00:00:10    11.0
2012-01-01 00:00:12    11.0
2012-01-01 00:00:14    15.0
2012-01-01 00:00:16    16.0
2012-01-01 00:00:18    16.0
2012-01-01 00:00:20    16.0
2012-01-01 00:00:22    16.0
2012-01-01 00:00:24    25.0
2012-01-01 00:00:26    25.0
2012-01-01 00:00:28    25.0
2012-01-01 00:00:30    30.0
Freq: 2S, dtype: float64

In [34]: ts2.resample('2s', how='mean').ffill()
Out[34]:
2012-01-01 00:00:00     0.5
2012-01-01 00:00:02     2.5
2012-01-01 00:00:04     5.0
2012-01-01 00:00:06     7.0
2012-01-01 00:00:08     7.0
2012-01-01 00:00:10    11.0
2012-01-01 00:00:12    11.0
2012-01-01 00:00:14    15.0
2012-01-01 00:00:16    16.0
2012-01-01 00:00:18    16.0
2012-01-01 00:00:20    16.0
2012-01-01 00:00:22    16.0
2012-01-01 00:00:24    25.0
2012-01-01 00:00:26    25.0
2012-01-01 00:00:28    25.0
2012-01-01 00:00:30    30.0
Freq: 2S, dtype: float64

With this branch:

In [64]: pd.__version__
Out[64]: '0.17.1+283.gd7c3efb'

In [65]: ts2.resample('2s', how='mean', fill_method='ffill')
C:\Anaconda\envs\devel\Scripts\ipython-script.py:1: FutureWarning: fill_method
s deprecated to .resample()
the new syntax is .resample(...).ffill()
if __name__ == '__main__':
Out[65]:
2012-01-01 00:00:00     0
2012-01-01 00:00:02     2
2012-01-01 00:00:04     3
2012-01-01 00:00:06     5
2012-01-01 00:00:08     7
2012-01-01 00:00:10     7
2012-01-01 00:00:12    11
2012-01-01 00:00:14    11
2012-01-01 00:00:16    16
2012-01-01 00:00:18    16
2012-01-01 00:00:20    16
2012-01-01 00:00:22    16
2012-01-01 00:00:24    16
2012-01-01 00:00:26    25
2012-01-01 00:00:28    25
2012-01-01 00:00:30    30
Freq: 2S, dtype: int32

In [66]: ts2.resample('2s').mean().ffill()
Out[66]:
2012-01-01 00:00:00     0.5
2012-01-01 00:00:02     2.5
2012-01-01 00:00:04     5.0
2012-01-01 00:00:06     7.0
2012-01-01 00:00:08     7.0
2012-01-01 00:00:10    11.0
2012-01-01 00:00:12    11.0
2012-01-01 00:00:14    15.0
2012-01-01 00:00:16    16.0
2012-01-01 00:00:18    16.0
2012-01-01 00:00:20    16.0
2012-01-01 00:00:22    16.0
2012-01-01 00:00:24    25.0
2012-01-01 00:00:26    25.0
2012-01-01 00:00:28    25.0
2012-01-01 00:00:30    30.0
Freq: 2S, dtype: float64

In [68]: ts2.resample('2s').ffill().resample('2s').mean()  # this gives the same result as  ts2.resample('2s', how='mean', fill_method='ffill'), but is thus not the same as in 0.17.1
Out[68]:
2012-01-01 00:00:00     0
2012-01-01 00:00:02     2
2012-01-01 00:00:04     3
2012-01-01 00:00:06     5
2012-01-01 00:00:08     7
2012-01-01 00:00:10     7
2012-01-01 00:00:12    11
2012-01-01 00:00:14    11
2012-01-01 00:00:16    16
2012-01-01 00:00:18    16
2012-01-01 00:00:20    16
2012-01-01 00:00:22    16
2012-01-01 00:00:24    16
2012-01-01 00:00:26    25
2012-01-01 00:00:28    25
2012-01-01 00:00:30    30
Freq: 2S, dtype: int32

groups and indices properties give a AttributeError: 'DatetimeIndexResampler' object has no attribute 'grouper' if you did not yet use the Resampler object (so grouper not yet initialized):

In [104]: rng = pd.date_range('1/1/2012', periods=100, freq='S')

In [105]: ts = pd.Series(np.arange(len(rng)), index=rng)

In [110]: rs = ts.resample('30s')

In [111]: rs.groups
AttributeError: 'DatetimeIndexResampler' object has no attribute 'grouper'

In [112]: rs.mean()
Out[112]:
2012-01-01 00:00:00    14.5
2012-01-01 00:00:30    44.5
2012-01-01 00:01:00    74.5
2012-01-01 00:01:30    94.5
Freq: 30S, dtype: float64

In [113]: rs.groups
Out[113]:
{Timestamp('2012-01-01 00:00:00', offset='30S'): 30,
Timestamp('2012-01-01 00:00:30', offset='30S'): 60,
Timestamp('2012-01-01 00:01:00', offset='30S'): 90,
Timestamp('2012-01-01 00:01:30', offset='30S'): 100}

jorisvandenbossche · 2016-01-26T13:11:42Z

Ah, the reindex comment from above is just because this was passed through the underlying deprecated ts.resample('15s', how='mean')evaluated series, ands.reindex()` without passing args just returns s.
But, you did use it somewhere in the docs, so I suppose this was a mistake there?

jreback · 2016-01-26T15:24:43Z

yes the .reindex as I noted above was a typo
addressing all of the others.

jreback · 2016-01-26T18:05:27Z

@jorisvandenbossche ok, latest push should fix everything you mentioned here (except for a couple of issues I have marked in #12140)

jreback · 2016-01-29T14:45:50Z

@jorisvandenbossche if you'd have a final look.

jreback · 2016-02-01T18:28:36Z

any more comments @jorisvandenbossche ?

jorisvandenbossche · 2016-02-02T13:39:41Z

@jreback Thanks for all the edits based on my comments. I quickly looked at some of them, and all looks good! Only responded to two of them (both about the whatsnew).

I don't have the time at the moment to take a final look, but I trust my previous rounds of comments and your edits that this is good to go!

My only issue is still the behavior with dicts in agg, as we discussed above but did not really reach consensus about. But maybe this shouldn't hold up merging this, as it would be good to let this be in master for still some time.
I will try to recapitulate my concerns later today in a bit more clear way (as your last comment was 'not sure what you are saying' :-)).

original API detection & warning support for isinstance / numeric ops support for comparison ops DOC: documentation updates w.r.t. aggregation

PEP on pandas/core/groupby.py

jreback · 2016-02-02T14:32:17Z

@jorisvandenbossche ok, I took out the aggregation clarification docs, they are the same as existing (my mistake), expect for a slightly better error message. The behavior has not changed AFICT from 0.17.1.

wesm · 2016-02-02T17:11:22Z

yay! let's get flake8 pandas completely clean now and flip the switch on Travis?

jreback · 2016-02-02T18:00:54Z

@wesm almost done: #12208

jreback added API Design Resample resample method labels Dec 14, 2015

jreback added this to the 0.18.0 milestone Dec 14, 2015

jreback force-pushed the resample branch from b91af3c to 0650c3b Compare December 15, 2015 23:06

jreback force-pushed the resample branch 2 times, most recently from 86b47f2 to 008c7b4 Compare December 16, 2015 01:43

jreback force-pushed the resample branch from 008c7b4 to 2a7dcbd Compare December 16, 2015 12:14

jreback force-pushed the resample branch from 8e52756 to 383fe21 Compare December 16, 2015 22:58

jreback force-pushed the resample branch from 383fe21 to 2760309 Compare December 17, 2015 11:56

jreback force-pushed the resample branch from 2760309 to 9209e60 Compare December 17, 2015 23:12

jreback force-pushed the resample branch 4 times, most recently from cae173b to b83dae1 Compare December 23, 2015 18:02

jreback force-pushed the resample branch from b83dae1 to a62b936 Compare January 3, 2016 00:30

jreback force-pushed the resample branch from d7c3efb to 9326f57 Compare January 26, 2016 17:53

jreback force-pushed the resample branch 2 times, most recently from 2cccc70 to 329567c Compare January 26, 2016 20:48

jreback mentioned this pull request Jan 28, 2016

Resample category data with timedelta index #12169

Closed

jreback force-pushed the resample branch from 329567c to 3f51530 Compare January 28, 2016 22:41

jreback mentioned this pull request Jan 29, 2016

Add aggregate method to DataFrame #1623

Closed

jreback force-pushed the resample branch from 3f51530 to 342b2e7 Compare January 29, 2016 14:45

jreback added 8 commits February 2, 2016 07:49

ENH: .resample API to groupby-like class, pandas-dev#11732

e570570

original API detection & warning support for isinstance / numeric ops support for comparison ops DOC: documentation updates w.r.t. aggregation

BUG: timedelta resample idempotency, pandas-dev#12072

83238ed

API: disallow renamed nested-dicts

68428d6

PEP updates

c54ea69

PEP on pandas/core/groupby.py

raise SpecificationError if we have an invalid aggregator

750556b

API: add doc examples for pandas-dev#9052

e243f18

fix according to comments

b4dfbc5

DOC: clean up aggregations docs, removing from whatsnew

b2056ca

jreback force-pushed the resample branch from 342b2e7 to b2056ca Compare February 2, 2016 14:31

jreback closed this in 1dc49f5 Feb 2, 2016

jreback mentioned this pull request Feb 2, 2016

Resampling converts int to float, but only in group by #12202

Closed

jorisvandenbossche mentioned this pull request Dec 2, 2019

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored Resample API breaking change #11841

Refactored Resample API breaking change #11841

jreback commented Dec 14, 2015

jorisvandenbossche commented Dec 15, 2015

jreback commented Dec 15, 2015

jreback commented Dec 15, 2015

shoyer commented Dec 15, 2015

jreback commented Dec 15, 2015

shoyer commented Dec 15, 2015

jreback commented Dec 16, 2015

shoyer commented Dec 16, 2015

jreback commented Dec 16, 2015

jreback commented Dec 16, 2015

shoyer commented Dec 16, 2015

jreback commented Dec 17, 2015

jorisvandenbossche commented Dec 17, 2015

jorisvandenbossche commented Dec 18, 2015

jreback commented Dec 18, 2015

jorisvandenbossche commented Jan 26, 2016

jorisvandenbossche commented Jan 26, 2016

jreback commented Jan 26, 2016

jreback commented Jan 26, 2016

jreback commented Jan 29, 2016

jreback commented Feb 1, 2016

jorisvandenbossche commented Feb 2, 2016

jreback commented Feb 2, 2016

wesm commented Feb 2, 2016

jreback commented Feb 2, 2016

Refactored Resample API breaking change #11841

Refactored Resample API breaking change #11841

Conversation

jreback commented Dec 14, 2015

jorisvandenbossche commented Dec 15, 2015

jreback commented Dec 15, 2015

jreback commented Dec 15, 2015

shoyer commented Dec 15, 2015

jreback commented Dec 15, 2015

shoyer commented Dec 15, 2015

jreback commented Dec 16, 2015

shoyer commented Dec 16, 2015

jreback commented Dec 16, 2015

jreback commented Dec 16, 2015

shoyer commented Dec 16, 2015

jreback commented Dec 17, 2015

jorisvandenbossche commented Dec 17, 2015

jorisvandenbossche commented Dec 18, 2015

jreback commented Dec 18, 2015

jorisvandenbossche commented Jan 26, 2016

jorisvandenbossche commented Jan 26, 2016

jreback commented Jan 26, 2016

jreback commented Jan 26, 2016

jreback commented Jan 29, 2016

jreback commented Feb 1, 2016

jorisvandenbossche commented Feb 2, 2016

jreback commented Feb 2, 2016

wesm commented Feb 2, 2016

jreback commented Feb 2, 2016