Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Fix pandas.Series.resample docstring #23197

Merged
merged 17 commits into from
Nov 10, 2018
Merged
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 132 additions & 74 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7366,46 +7366,67 @@ def resample(self, rule, how=None, axis=0, fill_method=None, closed=None,
label=None, convention='start', kind=None, loffset=None,
limit=None, base=0, on=None, level=None):
"""
Resample time-series data.

Convenience method for frequency conversion and resampling of time
series. Object must have a datetime-like index (DatetimeIndex,
PeriodIndex, or TimedeltaIndex), or pass datetime-like values
to the on or level keyword.
series. Object must have a datetime-like index (``DatetimeIndex``,
``PeriodIndex``, or ``TimedeltaIndex``), or pass datetime-like values
to the ``on`` or ``level`` keyword.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single backticks.


Parameters
----------
rule : string
the offset string or object representing target conversion
axis : int, optional, default 0
closed : {'right', 'left'}
rule : str
The offset string or object representing target conversion.
how : str
Method for down-/re-sampling, default to ‘mean’ for downsampling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the possible values? If they are a fixed set, can we have {'mean',...}? If it's a problem, leave it like it is, as this will go away soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible values are any valid string function that does some form of aggregation. Interestingly, this would also work with the example dataframe I shared above:

>>> df
   price  volume       time
0     10      50 2018-01-07
1     11      60 2018-01-14
2      9      40 2018-01-21
3     13     100 2018-01-28
4     14      50 2018-02-04
5     18     100 2018-02-11
6     17      40 2018-02-18
7     19      50 2018-02-25

>>> df.resample('M', on='time', how={'price': min, 'volume': sum})
            price  volume
time                     
2018-01-31      9     250
2018-02-28     14     240
>>> df.resample('M', on='time', how=['min', 'sum'])
           price     volume     
             min sum    min  sum
time                            
2018-01-31     9  43     40  250
2018-02-28    14  68     40  240

Happy to change the description of the parameter to something like:

how : str, dict, list
       Method for down-/re-sampling, default to `mean` for downsampling. Accepted combinations are:
       * string function name
       * list of string function names
       * dict of column names -> string function names (or list of string function names) 

I got inspiration from pandas.DataFrame.aggregate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f the name needs to be a string with the function name, I assume it can't be any arbitrary value. But as it's deprecated, let's not spend time on this.


.. deprecated:: 0.18.0
The new syntax is ``.resample(...).mean()``, or
``.resample(...).apply(<func>)``
axis : {0 or 'index'}, default 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

methods in generic.py are reused for both in Series and DataFrame. I guess for DataFrame we can also use axis='columns'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I wasn't sure, I'll add it. Thinking about it now perhaps we could've added more examples with DataFrame.

Which axis to use for up- or down-sampling. For ``Series`` this
will default to 0, i.e. `along the rows`. Must be
``DatetimeIndex``, ``TimedeltaIndex`` or ``PeriodIndex``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the must be. I don't think axis must be DatetimeIndex...

Also, don't use backticks around the "along the rows". Backticks are to tell that it's a reference (to a variable, a function, a class...). We don't use them for text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the must be. I don't think axis must be DatetimeIndex...

I took it from the main function description. As rule must be an offset string which are time-related magnitudes I decided to be explicit here. Also I ran some tests and got TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex.

In fact, following this logic, on should also have that sentence: Must be ``DatetimeIndex``, ``TimedeltaIndex`` or ``PeriodIndex``.

For instance:

>>> df = pd.DataFrame({'price': [10, 11, 9, 13, 14, 18, 17, 19], 'volume': [50, 60, 40, 100, 50, 100, 40, 50]})
>>> df['time'] = pd.date_range('01/01/2018', periods=8, freq='W')
>>> df
   price  volume       time
0     10      50 2018-01-07
1     11      60 2018-01-14
2      9      40 2018-01-21
3     13     100 2018-01-28
4     14      50 2018-02-04
5     18     100 2018-02-11
6     17      40 2018-02-18
7     19      50 2018-02-25
>>> df.resample('M', on='time').sum()
            price  volume
time                     
2018-01-31     43     250
2018-02-28     68     240

I hope I'm not missing anything here... resample is not a function I've used heavily personally.

Also, don't use backticks around the "along the rows". Backticks are to tell that it's a reference (to a variable, a function, a class...). We don't use them for text.

Slip of keyboard, I meant normal textual quotes ("). Changing.

fill_method : str, default None
Filling method for upsampling.

.. deprecated:: 0.18.0
The new syntax is ``.resample(...).<func>()``,
e.g. ``.resample(...).pad()``
closed : {'right', 'left'}, default None
Which side of bin interval is closed. The default is 'left'
for all frequency offsets except for 'M', 'A', 'Q', 'BM',
'BA', 'BQ', and 'W' which all have a default of 'right'.
label : {'right', 'left'}
label : {'right', 'left'}, default None
Which bin edge label to label bucket with. The default is 'left'
for all frequency offsets except for 'M', 'A', 'Q', 'BM',
'BA', 'BQ', and 'W' which all have a default of 'right'.
convention : {'start', 'end', 's', 'e'}
For PeriodIndex only, controls whether to use the start or end of
`rule`
kind: {'timestamp', 'period'}, optional
convention : {'start', 'end', 's', 'e'}, default 'start'
For ``PeriodIndex`` only, controls whether to use the start or
end of ``rule``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single backticks, in both cases, double backticks are for code (including possible values of variables like NaN). But for classes, variables, parameters... we use single backticks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, wasn't clear from the docstring docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would None be with double backtick in examples like "default None"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I think it probably should, but we don't have it anywhere, so let's leave it without backticks in the default for now

kind : {'timestamp', 'period'}, optional, default None
Pass 'timestamp' to convert the resulting index to a
``DateTimeIndex`` or 'period' to convert it to a ``PeriodIndex``.
By default the input representation is retained.
loffset : timedelta
Adjust the resampled time labels
loffset : timedelta, default None
Adjust the resampled time labels.
limit : int, default None
Maximum size gap when reindexing with ``fill_method``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


.. deprecated:: 0.18.0
base : int, default 0
For frequencies that evenly subdivide 1 day, the "origin" of the
aggregated intervals. For example, for '5min' frequency, base could
range from 0 through 4. Defaults to 0
on : string, optional
range from 0 through 4. Defaults to 0.
on : str, optional
For a DataFrame, column to use instead of index for resampling.
Column must be datetime-like.

.. versionadded:: 0.19.0

level : string or int, optional
level : str or int, optional
For a MultiIndex, level (name or number) to use for
resampling. Level must be datetime-like.
resampling. ``level`` must be datetime-like.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


.. versionadded:: 0.19.0

Expand All @@ -7422,6 +7443,10 @@ def resample(self, rule, how=None, axis=0, fill_method=None, closed=None,
To learn more about the offset strings, please see `this link
<http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases>`__.

See Also
--------
groupby : Group by mapping, function, label, or list of labels.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is for Series and DataFrame, I'd add both here too. One of the links will be self-referencing, but the other will point to the equivalent of the other class, which is useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.


Examples
--------

Expand Down Expand Up @@ -7511,83 +7536,116 @@ def resample(self, rule, how=None, axis=0, fill_method=None, closed=None,
Pass a custom function via ``apply``

>>> def custom_resampler(array_like):
... return np.sum(array_like)+5
... return np.sum(array_like) + 5

>>> series.resample('3T').apply(custom_resampler)
2000-01-01 00:00:00 8
2000-01-01 00:03:00 17
2000-01-01 00:06:00 26
Freq: 3T, dtype: int64

For a Series with a PeriodIndex, the keyword `convention` can be
For a Series with a PeriodIndex, the keyword ``convention`` can be
used to control whether to use the start or end of `rule`.

Resample a year by quarter using 'start' ``convention``. Values are
assigned to the first quarter of the period.

>>> s = pd.Series([1, 2], index=pd.period_range('2012-01-01',
freq='A',
periods=2))
... freq='A',
... periods=2))
>>> s
2012 1
2013 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the following examples, when using resample, it once uses .head() and the other time it shows 12 months. I don't like any of them. I'd use something like resampling a year to quarters, or a quarter to months... So we can show all the data, and it's just few rows that readers can quickly check.

Copy link
Contributor Author

@jmrr jmrr Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created two examples, one sampling a year into quarters using convention='start' and another sampling quarters into months using the convention='end'. If it's too much we can remove one of them.

Freq: A-DEC, dtype: int64

Resample by month using 'start' `convention`. Values are assigned to
the first month of the period.

>>> s.resample('M', convention='start').asfreq().head()
2012-01 1.0
2012-02 NaN
2012-03 NaN
2012-04 NaN
2012-05 NaN
Freq: M, dtype: float64

Resample by month using 'end' `convention`. Values are assigned to
the last month of the period.

>>> s.resample('M', convention='end').asfreq()
2012-12 1.0
2013-01 NaN
2013-02 NaN
2013-03 NaN
2013-04 NaN
2013-05 NaN
2013-06 NaN
2013-07 NaN
2013-08 NaN
2013-09 NaN
2013-10 NaN
2013-11 NaN
2013-12 2.0
>>> s.resample('Q', convention='start').asfreq()
2012Q1 1.0
2012Q2 NaN
2012Q3 NaN
2012Q4 NaN
2013Q1 2.0
2013Q2 NaN
2013Q3 NaN
2013Q4 NaN
Freq: Q-DEC, dtype: float64

Resample quarters by month using 'end' ``convention``. Values are
assigned to the last month of the period.

>>> q = pd.Series([1, 2, 3, 4], index=pd.period_range('2018-01-01',
... freq='Q',
... periods=4))
>>> q
2018Q1 1
2018Q2 2
2018Q3 3
2018Q4 4
Freq: Q-DEC, dtype: int64
>>> q.resample('M', convention='end').asfreq()
2018-03 1.0
2018-04 NaN
2018-05 NaN
2018-06 2.0
2018-07 NaN
2018-08 NaN
2018-09 3.0
2018-10 NaN
2018-11 NaN
2018-12 4.0
Freq: M, dtype: float64

For DataFrame objects, the keyword ``on`` can be used to specify the
column instead of the index for resampling.

>>> df = pd.DataFrame(data=9*[range(4)], columns=['a', 'b', 'c', 'd'])
>>> df['time'] = pd.date_range('1/1/2000', periods=9, freq='T')
>>> df.resample('3T', on='time').sum()
a b c d
time
2000-01-01 00:00:00 0 3 6 9
2000-01-01 00:03:00 0 3 6 9
2000-01-01 00:06:00 0 3 6 9
>>> d = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],
... 'volume': [50, 60, 40, 100, 50, 100, 40, 50]})
>>> df = pd.DataFrame(d)
>>> df['week_starting'] = pd.date_range('01/01/2018',
... periods=8,
... freq='W')
>>> df
price volume week_starting
0 10 50 2018-01-07
1 11 60 2018-01-14
2 9 40 2018-01-21
3 13 100 2018-01-28
4 14 50 2018-02-04
5 18 100 2018-02-11
6 17 40 2018-02-18
7 19 50 2018-02-25
>>> df.resample('M', on='week_starting').mean()
price volume
week_starting
2018-01-31 10.75 62.5
2018-02-28 17.00 60.0

For a DataFrame with MultiIndex, the keyword ``level`` can be used to
specify on level the resampling needs to take place.

>>> time = pd.date_range('1/1/2000', periods=5, freq='T')
>>> df2 = pd.DataFrame(data=10*[range(4)],
columns=['a', 'b', 'c', 'd'],
index=pd.MultiIndex.from_product([time, [1, 2]])
)
>>> df2.resample('3T', level=0).sum()
a b c d
2000-01-01 00:00:00 0 6 12 18
2000-01-01 00:03:00 0 4 8 12

See also
--------
groupby : Group by mapping, function, label, or list of labels.
specify on which level the resampling needs to take place.

>>> days = pd.date_range('1/1/2000', periods=4, freq='D')
>>> d2 = dict({'price': [10, 11, 9, 13, 14, 18, 17, 19],
... 'volume': [50, 60, 40, 100, 50, 100, 40, 50]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more space?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
... 'volume': [50, 60, 40, 100, 50, 100, 40, 50]})
... 'volume': [50, 60, 40, 100, 50, 100, 40, 50]})

>>> df2 = pd.DataFrame(
... d2,
... index=pd.MultiIndex.from_product(
... [days, ['morning', 'afternoon']]
... )
... )
>>> df2
price volume
2000-01-01 morning 10 50
afternoon 11 60
2000-01-02 morning 9 40
afternoon 13 100
2000-01-03 morning 14 50
afternoon 18 100
2000-01-04 morning 17 40
afternoon 19 50
>>> df2.resample('D', level=0).sum()
price volume
2000-01-01 21 110
2000-01-02 22 140
2000-01-03 32 150
2000-01-04 36 90
"""
from pandas.core.resample import (resample,
_maybe_process_deprecations)
Expand Down