-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. #20374
DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. #20374
Conversation
pandas/core/groupby.py
Outdated
Return a new grouper with our resampler appended | ||
Provide resampling when using a TimeGrouper. | ||
|
||
Given a grouper the function resamples it according to a string and an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"string" -> "frequency" and link to the frequency aliases:
:ref:`frequency aliases <timeseries.offset_aliases>`
pandas/core/groupby.py
Outdated
Provide resampling when using a TimeGrouper. | ||
|
||
Given a grouper the function resamples it according to a string and an | ||
optional list and dictionary of parameters. Returns a new grouper with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"optional list and dictionary of parameters" doesn't mean much to me as a user. I'd say remove that bit.
pandas/core/groupby.py
Outdated
|
||
Parameters | ||
---------- | ||
rule : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rule : str or Offset
I think. @jorisvandenbossche does that look OK for Offset
, instead of pandas.core.offsets.Offset
? We used just Offset
in a couple places.
pandas/core/groupby.py
Outdated
---------- | ||
rule : str | ||
The offset string or object representing target grouper conversion. | ||
*args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit too specific to the implementation to me. Do you have any specific args / kwargs that the user would care about? Otherwise, I'd just write it as
*args, **kwargs
For compatibility with other groupby methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some keywords to think about (explicitly document and show examples for)?
closed : {'right', 'left'}
label : {'right', 'left'}
loffset : timedelta
See pandas.core.generic.NDFrame.resample for some others that may or may not have an effect. I'm not sure.
pandas/core/groupby.py
Outdated
------- | ||
Grouper | ||
Return a new grouper with our resampler appended. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Also
--------
pandas.Grouper : specify a frequency to resample with when
grouping by a key.
DatetimeIndex.resample : Frequency conversion and resampling of
time series.
pandas/core/groupby.py
Outdated
|
||
Examples | ||
-------- | ||
Start by creating a DataFrame with 9 one minute timestamps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"9 one" is a bit awkward. Maybe "a length-9 DataFrame with minute frequency."
pandas/core/groupby.py
Outdated
Start by creating a DataFrame with 9 one minute timestamps. | ||
|
||
>>> idx = pd.date_range('1/1/2000', periods=9, freq='T') | ||
>>> df = pd.DataFrame(data=9*[range(4)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8: space around *
.
pandas/core/groupby.py
Outdated
>>> df = pd.DataFrame(data=9*[range(4)], | ||
... index=idx, | ||
... columns=['a', 'b', 'c', 'd']) | ||
>>> df.iloc[[6], [0]] = 5 # change a value for grouping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the inner lists, just df.iloc[6, 0] = 5
. Don't need the comment.
Codecov Report
@@ Coverage Diff @@
## master #20374 +/- ##
==========================================
- Coverage 92.24% 92.24% -0.01%
==========================================
Files 161 161
Lines 51318 51317 -1
==========================================
- Hits 47339 47338 -1
Misses 3979 3979
Continue to review full report at Codecov.
|
822c90c
to
92825c6
Compare
|
Better summary and parameters description.
92825c6
to
deea3c7
Compare
I'm confused about the 'See also' section. It auto-generates the message:
With the errors: I've replaced it with the commentary noted above in the responses. |
pandas/core/groupby/groupby.py
Outdated
label : {‘right’, ‘left’} | ||
Which bin edge label to label bucket with. | ||
loffset : timedelta | ||
Adjust the resampled time labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parameters are not in the signature, are they the possible kwargs
? If that's the case, we can add them as a list in the kwargs
description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, the are just some samples of the kwargs, as they were requested.
pandas/core/groupby/groupby.py
Outdated
2000-01-01 00:05:00 0 1 2 3 | ||
2000-01-01 00:06:00 5 1 2 3 | ||
2000-01-01 00:07:00 0 1 2 3 | ||
2000-01-01 00:08:00 0 1 2 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it'd be possible to use a more compact example. May be 4 rows of 1 minute intervals, that can be downsampled to 2 rows of 30 seconds? Also, I think 2 columns should be enough.
…Smaller sample. Kwargs review. See also review.
…core.groupby.DataFrameGroupBy.resample
Hello @pandres! Thanks for updating the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed some minor fixes.
@jbrockmendel can you take a quick look and merge on green if you're happy?
pandas/core/groupby/groupby.py
Outdated
|
||
Parameters | ||
---------- | ||
rule : str or Offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offset --> DateOffset
pandas/core/groupby/groupby.py
Outdated
Which side of bin interval is closed. | ||
* label : {'right', 'left'} | ||
Which bin edge label to label bucket with. | ||
* loffset : timedelta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "loffset" right? I don't know this section of the code all that well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not an expert myself, I guess the type should be DateOffset, tseries.offsets, timedelta, or str
?
For what I can see, valid keywords should be how
, fill_method
, limit
, kind
and on
(in get_resampler_for_grouping
), closed
, label
, how
, axis
, fill_method
, limit
, loffset
, kind
, convention
, base
(in TimeGrouper
), and key
, level
, freq
, axis
, sort
(in Grouper
).
What I'd do is to add the ones from get_resampler_for_grouping
as explicit arguments, and then document that **kwargs
will be passed to TimeGrouper
.
@jreback is it ok to change the signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
level
, axis
, freq
, key
, sort
are all part of the grouper and not args to .resample()
or any aggregation function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pandres can you replace the description by something like:
Possible arguments are `how`, `fill_method`, `limit`, `kind` and `on`, and other arguments of `TimeGrouper`.
We can improve that later in a separate PR, but I think we can merge all the rest of the changes for now.
Thanks!
@datapythonista I gave this a read and made some comments, but don't know this section of the code well enough to form an informed opinion. |
…' of github.com:pandres/pandas into docstring_pandas.core.groupby.DataFrameGroupBy.resample
…core.groupby.DataFrameGroupBy.resample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just rebased master. I think should be good to merge on green.
…core.groupby.DataFrameGroupBy.resample
Thanks @pandres! |
Thank you guys! |
* upstream/master: (25 commits) DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660) CI: Allow to compile docs with ipython 7.11 pandas-dev#22990 (pandas-dev#23655) ...
…fixed * upstream/master: DOC: Delete trailing blank lines in docstrings. (pandas-dev#23651) DOC: Change release and whatsnew (pandas-dev#21599) DOC: Fix format of the See Also descriptions (pandas-dev#23654) DOC: update pandas.core.groupby.DataFrameGroupBy.resample docstring. (pandas-dev#20374) ENH: Allow export of mixed columns to Stata strl (pandas-dev#23692) CLN: Remove unnecessary code (pandas-dev#23696) Pin flake8-rst version (pandas-dev#23699) Implement _most_ of the EA interface for DTA/TDA (pandas-dev#23643) CI: raise clone depth limit on CI BUG: Fix Series/DataFrame.rank(pct=True) with more than 2**24 rows (pandas-dev#23688) REF: Move Excel names parameter handling to CSV (pandas-dev#23690) DOC: Accessing files from a S3 bucket. (pandas-dev#23639) Fix errorbar visualization (pandas-dev#23674) DOC: Surface / doc mangle_dupe_cols in read_excel (pandas-dev#23678) DOC: Update is_sparse docstring (pandas-dev#19983) BUG: Fix read_excel w/parse_cols & empty dataset (pandas-dev#23661) Add to_flat_index method to MultiIndex (pandas-dev#22866) CLN: Move to_excel to generic.py (pandas-dev#23656) TST: IntervalTree.get_loc_interval should return platform int (pandas-dev#23660)
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
args and kwargs are described with asterisks as requested, even if the validator does not recognizes it.
See also
complains of duplication if left as in the commit in the docstring. Not showed above.