DOC: update the pandas.DataFrame.cummax docstring #20336

arminv · 2018-03-13T20:29:35Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
##################### Docstring (pandas.DataFrame.cummax)  #####################
################################################################################

Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative
maximum.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA.
*args, **kwargs :
    Additional keywords have no effect but might be accepted for
    compatibility with NumPy.

Returns
-------
cummax : Series or DataFrame

Examples
--------
**Series**

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummax()
0    2.0
1    NaN
2    5.0
3    5.0
4    5.0
dtype: float64

To include NA values in the operation, use ``skipna=False``

>>> s.cummax(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

**DataFrame**

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the maximum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.

>>> df.cummax()
     A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0

To iterate over columns and find the maximum in each row,
use ``axis=1``

>>> df.cummax(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0

See also
--------
pandas.core.window.Expanding.max : Similar functionality
    but ignores ``NaN`` values.
DataFrame.max : Return the maximum over
    DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'kwargs', 'args'} not documented
		Unknown parameters {'*args, **kwargs :'}
		Parameter "*args, **kwargs :" has no type


################################################################################
##################### Docstring (pandas.DataFrame.cummin)  #####################
################################################################################

Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative
minimum.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA.
*args, **kwargs :
    Additional keywords have no effect but might be accepted for
    compatibility with NumPy.

Returns
-------
cummin : Series or DataFrame

Examples
--------
**Series**

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cummin()
0    2.0
1    NaN
2    2.0
3   -1.0
4   -1.0
dtype: float64

To include NA values in the operation, use ``skipna=False``

>>> s.cummin(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

**DataFrame**

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the minimum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.

>>> df.cummin()
     A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0

To iterate over columns and find the minimum in each row,
use ``axis=1``

>>> df.cummin(axis=1)
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

See also
--------
pandas.core.window.Expanding.min : Similar functionality
    but ignores ``NaN`` values.
DataFrame.min : Return the minimum over
    DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'args', 'kwargs'} not documented
		Unknown parameters {'*args, **kwargs :'}
		Parameter "*args, **kwargs :" has no type


################################################################################
##################### Docstring (pandas.DataFrame.cumsum)  #####################
################################################################################

Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative
sum.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA.
*args, **kwargs :
    Additional keywords have no effect but might be accepted for
    compatibility with NumPy.

Returns
-------
cumsum : Series or DataFrame

Examples
--------
**Series**

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumsum()
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64

To include NA values in the operation, use ``skipna=False``

>>> s.cumsum(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

**DataFrame**

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the sum
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.

>>> df.cumsum()
     A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0

To iterate over columns and find the sum in each row,
use ``axis=1``

>>> df.cumsum(axis=1)
     A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0

See also
--------
pandas.core.window.Expanding.sum : Similar functionality
    but ignores ``NaN`` values.
DataFrame.sum : Return the sum over
    DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'args', 'kwargs'} not documented
		Unknown parameters {'*args, **kwargs :'}
		Parameter "*args, **kwargs :" has no type

################################################################################
##################### Docstring (pandas.DataFrame.cumprod) #####################
################################################################################

Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative
product.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The index or the name of the axis. 0 is equivalent to None or 'index'.
skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
    will be NA.
*args, **kwargs :
    Additional keywords have no effect but might be accepted for
    compatibility with NumPy.

Returns
-------
cumprod : Series or DataFrame

Examples
--------
**Series**

>>> s = pd.Series([2, np.nan, 5, -1, 0])
>>> s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

By default, NA values are ignored.

>>> s.cumprod()
0     2.0
1     NaN
2    10.0
3   -10.0
4    -0.0
dtype: float64

To include NA values in the operation, use ``skipna=False``

>>> s.cumprod(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64

**DataFrame**

>>> df = pd.DataFrame([[2.0, 1.0],
...                    [3.0, np.nan],
...                    [1.0, 0.0]],
...                    columns=list('AB'))
>>> df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0

By default, iterates over rows and finds the product
in each column. This is equivalent to ``axis=None`` or ``axis='index'``.

>>> df.cumprod()
     A    B
0  2.0  1.0
1  6.0  NaN
2  6.0  0.0

To iterate over columns and find the product in each row,
use ``axis=1``

>>> df.cumprod(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  0.0

See also
--------
pandas.core.window.Expanding.prod : Similar functionality
    but ignores ``NaN`` values.
DataFrame.prod : Return the product over
    DataFrame axis.
DataFrame.cummax : Return cumulative maximum over DataFrame axis.
DataFrame.cummin : Return cumulative minimum over DataFrame axis.
DataFrame.cumsum : Return cumulative sum over DataFrame axis.
DataFrame.cumprod : Return cumulative product over DataFrame axis.

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'args', 'kwargs'} not documented
		Unknown parameters {'*args, **kwargs :'}
		Parameter "*args, **kwargs :" has no type

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

The *args/**kwargs error is a bug in the parameter evaluation

datapythonista

I think we can do better with the examples, but good work.

datapythonista · 2018-03-13T21:45:27Z

pandas/core/generic.py

-    will be NA
+    will be NA.
+*args : any, default None
+**kwargs : any, default None


Can you have *args, **kwargs without the type

datapythonista · 2018-03-13T21:45:51Z

pandas/core/generic.py

+pandas.DataFrame.cummax : Return cumulative maximum over DataFrame axis.
+pandas.DataFrame.cummin : Return cumulative minimum over DataFrame axis.
+pandas.DataFrame.cumsum : Return cumulative sum over DataFrame axis.
+pandas.DataFrame.cumprod : Return cumulative product over DataFrame axis.


Can you get rid of the pandas. prefix

datapythonista · 2018-03-13T21:48:52Z

pandas/core/generic.py

@@ -8327,24 +8327,95 @@ def _doc_parms(cls):
 """

 _cnum_doc = """
+Return %(desc)s over a DataFrame or Series axis.
+
+Returns a DataFrame or Series of the same size containing the %(desc)s.

 Parameters
 ----------
 axis : %(axis_descr)s


Not sure were axis_descr is defined, but the format it {0 or 'index', 1 or 'columns'} if I'm not wrong. You can check recent merge PRs to be sure.

datapythonista · 2018-03-13T21:49:53Z

pandas/core/generic.py

+--------
+**DataFrame**
+
+Create a DataFrame:


We probably can get rid of this, I'm sure users will find out :)

datapythonista · 2018-03-13T21:52:31Z

pandas/core/generic.py

+0  9  7  9  7
+1  9  7  9  7
+2  9  7  9  7
+3  9  7  9  7


If this docstring is reused, and we want to keep it this way, I think we should add examples for all methods.

datapythonista · 2018-03-13T21:54:24Z

pandas/core/generic.py

+...                    [7, 5, 2, 7],
+...                    [3, 5, 2, 2],
+...                    [8, 0, 9, 0]],
+...                    columns=list('ABCD'))


I'd use a much smaller dataset, probably 2 columns and 3 or 4 rows. Smaller numbers would make it easier for users to see what's being added, specially if we reuse same dataframe for cumsum...

Also, you can add a NaN, so you can show an example with skipna.

codecov · 2018-03-14T10:19:28Z

Codecov Report

Merging #20336 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20336      +/-   ##
==========================================
+ Coverage    91.8%    91.8%   +<.01%     
==========================================
  Files         152      152              
  Lines       49201    49205       +4     
==========================================
+ Hits        45167    45171       +4     
  Misses       4034     4034

Flag	Coverage Δ
#multiple	`90.18% <100%> (ø)`	⬆️
#single	`41.85% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/generic.py	`95.85% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad50b1d...1147a0d. Read the comment docs.

arminv · 2018-03-14T10:22:52Z

@datapythonista Thank you for the helpful comments :) Please let me know if I need to change anything else.

datapythonista

Looking better. Just one comment I think can be useful... When you do more changes, generate the html version of one of the pages, and imagine that you're a user that never used these methods and wants to know what are they about. Is the documentation useful? May be too long? May be a bit confusing having all the methods shown together? Or does it help?

If as a user you think this documentation is really helpful and presents things in the most efficient and clear way, the PR should be mostly all right. The point is that, it's usually a bad practice to follow comments in the review blindly, feel free to disagree, and make sure you're happy with your changes and the final result.

datapythonista · 2018-03-14T11:39:20Z

pandas/core/generic.py


 Parameters
 ----------
-axis : %(axis_descr)s
+axis : {0 or 'index', 1 or 'columns'}, default 0


I think it's not technically right that default is 0, I think it's None, which I guess it's equivalent to 0.

Can you double check, and and change it if that's right. Something like {0 or 'index', 1 or 'columns'} or None, default None would probably be the most standard way if that's right. And a description about the axis would be useful (pointing out that None means index if that's the case).

If you check recent PRs there are some with a an axis parameter that you can check for reference.

This is right, cum_func (i.e. function corresponding to all cumulative methods) is defined with axis=None as default argument.

I also found this regarding the correct format of axis parameter.

Although it is technically None, in practice it is 0 for Series/DataFrame, so I would keep the documentation like this.
The technical reason is because for Panel it is 1, but Panel is deprecated and I think we should not care about them in the documentation.

datapythonista · 2018-03-14T11:40:38Z

pandas/core/generic.py

-    will be NA
+    will be NA.
+*args : default None
+**kwargs : default None


As *args and **kwargs are used in the standard Python way, no type or default value is needed, the user will understand. They can share a single line, simply:
*args, **kwargs

datapythonista · 2018-03-14T11:43:39Z

pandas/core/generic.py

+**axis**
+
+axis=None : Iterates over rows and finds the cumulative value in each column.
+If value is different from the previous one, updates it:


I'd personally simply say something like "By default, cumulative functions work on the index axis, meaning that row each row, they accumulate the values from the previous".

The second comment is only right for cummax. Now that you've got the other examples, you probably want to get rid of it.

datapythonista · 2018-03-14T11:44:17Z

pandas/core/generic.py

axis=None is the default, I'd simply use df.cummax()... and at in the explanation before that this is equivalent to axis=0 and axis='index'

datapythonista · 2018-03-14T11:47:27Z

pandas/core/generic.py

+
+>>> df = pd.DataFrame([[7, 1],
+...                    [3, 4],
+...                    [8, 0]],


I'd use smaller values. When illustrating cumprod it should be fast and easy for the user to understand that 2 * 3 = 6 and 6 * 1 = 6, while to know 7 * 3 * 8 they'll probably need a calculator to check if they understood it right.

Also, if you use a nan in one of the columns, you can reuse this example for the next section.

datapythonista · 2018-03-14T11:50:50Z

pandas/core/generic.py

+2  NaN NaN
+
+**Series**
+


As with all the functions this is getting very long, I'd probably avoid having examples for Series, Or may be just one.

If you keep them, personally I'd have Series first, and start the example from the simplest to the more complext

If we decide to have separate string examples for each method, we can keep the examples for Series.

If we decide to have separate string examples for each method, we can keep the examples for Series.

+ 1

Another suggestion would be to start with Series to just illustrate the concept of "cumulative max", as this will make the examples a little bit easier, and show the effect of NaNs. And then show DataFrame, saying that by default the same happens for each column of the DataFrame, and optionally use axis=1 to take cumulative max for each row.

Thanks for the suggestion. I agree, it is definitely easier to see what is going on with NaNs if we use a Series example instead of DataFrame. I will change it this way.

datapythonista · 2018-03-14T11:52:42Z

pandas/core/generic.py

+      A    B
+0   7.0  NaN
+1   NaN  4.0
+2  56.0  0.0


skipna=True is the default. So, if you add the NaN to the initial example, you can use df.cummax()... to illustrate skipna=True and go directly to show how it changes with skipna=False.

datapythonista · 2018-03-14T11:55:53Z

pandas/core/generic.py

-              _cnum_doc)
+                  axis_descr=axis_descr, accum_func_name=accum_func_name,
+                  examples=examples)
+    @Appender(_cnum_doc)


It's all right like this, but may be it'd be simpler to leave this as it was, and have the examples in _cnum_doc, instead of in a separate variable. As they're the same for all methods, there is not much value in having them separate.

Another option would be to have a different string for each method example, in that case, something similar to this would make more sense.

I think having separate string examples for each method makes everything clearer, especially when showing examples for use of skipna & axis. It also helps with keeping the docstring concise. For instance, now we can have a Series example for each method.

The disadvantage is user will only see examples for the method they’re checking, but I think this is ok because we are referencing all methods in the ‘See also’ section, which comes before 'Examples'.

In these PRs #20216 and #20217 examples for DataFrame.all and DataFrame.any are separate even though they are similar methods.

Yes, I am also in favor of splitting up the examples.

jorisvandenbossche

You are currently only adding the example to the cummax method, but your example section has examples for both cummin/cumsum/cumprod/cummax.
So we need to make a choice here.

I would personally only include in the docstring of cummax examples of cummax. But no need to throw them away, what you could do is split your existing _cummax_examples in multiple strings, with the examples separated for each method.

jorisvandenbossche · 2018-03-15T10:06:37Z

pandas/core/generic.py


 Parameters
 ----------
-axis : %(axis_descr)s
+axis : {0 or 'index', 1 or 'columns'}, default 0


Although it is technically None, in practice it is 0 for Series/DataFrame, so I would keep the documentation like this.
The technical reason is because for Panel it is 1, but Panel is deprecated and I think we should not care about them in the documentation.

jorisvandenbossche · 2018-03-15T10:14:19Z

pandas/core/generic.py

 See also
 --------
-pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality
+core.window.Expanding.%(accum_func_name)s : Similar functionality


sorry to go back and forth, but for this one we want to keep the pandas. (but for the others below the change is perfect!)

jorisvandenbossche · 2018-03-15T10:18:24Z

pandas/core/generic.py

+2  NaN NaN
+
+**Series**
+


If we decide to have separate string examples for each method, we can keep the examples for Series.

+ 1

Another suggestion would be to start with Series to just illustrate the concept of "cumulative max", as this will make the examples a little bit easier, and show the effect of NaNs. And then show DataFrame, saying that by default the same happens for each column of the DataFrame, and optionally use axis=1 to take cumulative max for each row.

jorisvandenbossche · 2018-03-15T10:19:13Z

pandas/core/generic.py

-              _cnum_doc)
+                  axis_descr=axis_descr, accum_func_name=accum_func_name,
+                  examples=examples)
+    @Appender(_cnum_doc)


Yes, I am also in favor of splitting up the examples.

arminv · 2018-03-15T12:44:53Z

pandas/core/generic.py

    but ignores ``NaN`` values.
+Series.%(outname)s : Return %(desc)s over Series axis.
+DataFrame.cummax : Return cumulative maximum over DataFrame axis.


@datapythonista @jorisvandenbossche Is it a good idea to also add: DataFrame.sum as a relevant method in the 'See also' section?

Yes, that might be a good idea, if you can automatically make it link in DataFrame.cumsum to DataFrame.sum, in DataFrame.cummin to DataFrame.min, etc (using the templating)

pep8speaks · 2018-03-16T19:19:04Z

Hello @arminv! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 17, 2018 at 16:19 Hours UTC

TomAugspurger · 2018-03-16T19:55:03Z

pandas/core/generic.py

+--------
+**Series**
+
+>>> s = pd.Series([2,np.nan,5,-1,0])


PEP8, spaces after ,

TomAugspurger · 2018-03-16T19:57:09Z

pandas/core/generic.py

+4    0.0
+dtype: float64
+
+skipna=True : Default value, ignores NaN values during operation:


I think we mostly use prose for the intermittent text in examples. Perhaps, say

By default, NA values are ignored. >>> s.cummin() ... To include NA values in the operation, use ``skipna=False`` >>> s.cummin(skipna=False)

To be clear, what you have is great and if the others are OK with it I'm +1 as well. Maybe wait to hear feedback from @datapythonista and @jorisvandenbossche before updating.

I think indeed your prose alternative reads a bit nicer.

TomAugspurger · 2018-03-16T19:57:28Z

pandas/core/generic.py

+1  3.0  NaN
+2  1.0  0.0
+
+skipna : Works in the same way as for Series.


Can probably be removed.

TomAugspurger · 2018-03-16T19:58:08Z

pandas/core/generic.py

+
+skipna : Works in the same way as for Series.
+
+axis=0 : Default value, equivalent to axis=None or axis='index'.


Same comment about the prose.

arminv · 2018-03-16T20:02:58Z

pandas/core/generic.py

 See also
 --------
 pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality
    but ignores ``NaN`` values.
+Series.%(outname)s : Return %(desc)s over Series axis.
+DataFrame.%(accum_func_name)s : Return the %(accum_func_name)s over
+    DataFrame axis.


@jorisvandenbossche I used templating and %(accum_func_name)s to add DataFrame.prod to 'See also' section of DataFrame.cumprod, etc. All methods look good except DataFrame.prod where it reads: 'Return the prod over DataFrame axis' instead of 'Return the product over...'. Any suggestions on this? Should we even worry about this?

One relatively easy solution would be to change the desc param to _make_cum_function from "cumulative produce" to "product", because we can put the "cumulative" (which is the same for each of the 4 functions" in the template itself. Then you can use that name in the see also description.

jorisvandenbossche

BTW, you are doing really great work here!

jorisvandenbossche · 2018-03-17T09:19:27Z

pandas/core/generic.py

 skipna : boolean, default True
    Exclude NA/null values. If an entire row/column is NA, the result
-    will be NA
+    will be NA.
+*args, **kwargs


Can you add here an explanation for args/kwargs: "Additional arguments have no effect but might be accepted for compatibility with NumPy."

jorisvandenbossche · 2018-03-17T09:24:24Z

pandas/core/generic.py

 See also
 --------
 pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality
    but ignores ``NaN`` values.
+Series.%(outname)s : Return %(desc)s over Series axis.
+DataFrame.%(accum_func_name)s : Return the %(accum_func_name)s over
+    DataFrame axis.


One relatively easy solution would be to change the desc param to _make_cum_function from "cumulative produce" to "product", because we can put the "cumulative" (which is the same for each of the 4 functions" in the template itself. Then you can use that name in the see also description.

jorisvandenbossche · 2018-03-17T09:25:45Z

pandas/core/generic.py

+4    0.0
+dtype: float64
+
+skipna=True : Default value, ignores NaN values during operation:


I think indeed your prose alternative reads a bit nicer.

arminv · 2018-03-17T12:30:00Z

I also checked HTML rendering of the corresponding Series cumulative methods (pandas.Series.cummax ,etc.) and they appear to be ok. There is a duplication in their 'See also' section but again I couldn't think of a simple way to fix this through templating.

jorisvandenbossche

Looks good!
Small comment about the "see also"

jorisvandenbossche · 2018-03-17T14:25:02Z

pandas/core/generic.py

 See also
 --------
 pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality
    but ignores ``NaN`` values.
+Series.%(outname)s : Return cumulative %(desc)s over Series axis.


I think this one can be left out, as it will already be included in one of the 4 last ones.

jorisvandenbossche · 2018-03-17T14:25:27Z

pandas/core/generic.py

+Series.%(outname)s : Return cumulative %(desc)s over Series axis.
+%(name2)s.%(accum_func_name)s : Return the %(desc)s over
+    %(name2)s axis.
+DataFrame.cummax : Return cumulative maximum over DataFrame axis.


This also %(name2)s like below ?

If I implement these changes, we won't get a reference to any corresponding Series in 'See also' of DataFrame methods and vice versa. That's why I left them like this (at the expense of repeating one of them).

I agree that we don't really need them because we have examples of both Series & DataFrame in all docstrings and this should be informative enough.

Just out of curiosity, is there a simple way to determine whether a Series or DataFrame method is being accessed at any one time? For example, if we were generating the doc for Series.cummax, we would determine it is a Series method. Based on this, we know that we need to show cummax but for a DataFrame.

The thing is that the docstring for both Series and DataFrame (of the same method) will be exactly the same (apart from some of the links here in see also). So linking to the same method but on the other object, is not that important I think, as the other one does not give you more information

Just out of curiosity, is there a simple way to determine whether a Series or DataFrame method is being accessed at any one time? For example, if we were generating the doc for Series.cummax, we would determine it is a Series method. Based on this, we know that we need to show cummax but for a DataFrame.

Yes, the cls that is passed to _make_cum_function is either the Series or DataFrame class. So you can pass cls.__name__ as a variable to the template, but I think this is already done as name2 ? Based on that you can know the other.

jorisvandenbossche · 2018-03-17T18:04:48Z

@arminv Thanks a lot for this PR!

arminv · 2018-03-17T18:16:47Z

Thank you guys for being so helpful. I look forward to contributing again.

…ame_describe * upstream/master: (158 commits) Add link to "Craft Minimal Bug Report" blogpost (pandas-dev#20431) BUG: fixed json_normalize for subrecords with NoneTypes (pandas-dev#20030) (pandas-dev#20399) BUG: ExtensionArray.fillna for scalar values (pandas-dev#20412) DOC" update the Pandas core window rolling count docstring" (pandas-dev#20264) DOC: update the pandas.DataFrame.plot.hist docstring (pandas-dev#20155) DOC: Only use ~ in class links to hide prefixes. (pandas-dev#20402) Bug: Allow np.timedelta64 objects to index TimedeltaIndex (pandas-dev#20408) DOC: add disallowing of Series construction of len-1 list with index to whatsnew (pandas-dev#20392) MAINT: Remove weird pd file DOC: update the Index.isin docstring (pandas-dev#20249) BUG: Handle all-NA blocks in concat (pandas-dev#20382) DOC: update the pandas.core.resample.Resampler.fillna docstring (pandas-dev#20379) BUG: Don't raise exceptions splitting a blank string (pandas-dev#20067) DOC: update the pandas.DataFrame.cummax docstring (pandas-dev#20336) DOC: update the pandas.core.window.x.mean docstring (pandas-dev#20265) DOC: update the api.types.is_number docstring (pandas-dev#20196) Fix linter (pandas-dev#20389) DOC: Improved the docstring of pandas.Series.dt.to_pytimedelta (pandas-dev#20142) DOC: update the pandas.Series.dt.is_month_end docstring (pandas-dev#20181) DOC: update the window.Rolling.min docstring (pandas-dev#20263) ...

arminv added 5 commits March 13, 2018 15:19

DOC: Improve the docstring of DataFrame.cummax

5ccedc2

Merge remote-tracking branch 'upstream/master' into docstring_cummax

04f70dd

Merge remote-tracking branch 'upstream/master' into docstring_cummax

aec6084

DOC: Improve the docstring of DataFrame.cummax()

4acf753

DOC: Improve the docstring of pandas.DataFrame.cummax

1214c93

datapythonista reviewed Mar 13, 2018

View reviewed changes

arminv added 2 commits March 14, 2018 01:27

Merge remote-tracking branch 'upstream/master' into docstring_cummax

a88e95a

DOC: Improve the docstring of DataFrame.cummax

fe94dad

jreback added Docs Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 14, 2018

datapythonista reviewed Mar 14, 2018

View reviewed changes

arminv added 2 commits March 14, 2018 17:46

Merge remote-tracking branch 'upstream/master' into docstring_cummax

f73b52f

Merge remote-tracking branch 'upstream/master' into docstring_cummax

33e5337

jorisvandenbossche reviewed Mar 15, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into docstring_cummax

15b38dd

arminv commented Mar 15, 2018

View reviewed changes

arminv added 2 commits March 16, 2018 15:18

Improved examples

3c30d18

Merge remote-tracking branch 'upstream/master' into docstring_cummax

0cb3168

arminv added 2 commits March 16, 2018 15:33

Addressed PEP8 issues

9d46623

Addressed PEP 8 issues

5d502cb

TomAugspurger reviewed Mar 16, 2018

View reviewed changes

arminv commented Mar 16, 2018

View reviewed changes

arminv added 2 commits March 16, 2018 22:48

Merge remote-tracking branch 'upstream/master' into docstring_cummax

e1e190f

Made See also of Series consistent

aa34ea0

jorisvandenbossche reviewed Mar 17, 2018

View reviewed changes

arminv added 2 commits March 17, 2018 06:59

Merge remote-tracking branch 'upstream/master' into docstring_cummax

94fc1b3

Merge remote-tracking branch 'upstream/master' into docstring_cummax

657feac

arminv added 3 commits March 17, 2018 07:53

Improved example wording. Addressed PEP8

77789a8

Merge remote-tracking branch 'upstream/master' into docstring_cummax

463eef7

More templating in See also.Fixed typos

b03c32a

Merge remote-tracking branch 'upstream/master' into docstring_cummax

9b05313

jorisvandenbossche reviewed Mar 17, 2018

View reviewed changes

Improved templating of See also section

1147a0d

jorisvandenbossche approved these changes Mar 17, 2018

View reviewed changes

jorisvandenbossche merged commit 699a48b into pandas-dev:master Mar 17, 2018

arminv deleted the docstring_cummax branch March 17, 2018 19:17

+                   A  B
+7  1
+21  4
+168  0

+9  7  9  7
+9  7  9  7
+9  7  9  7
+9  7  9  7


		skipna : Works in the same way as for Series.

		axis=0 : Default value, equivalent to axis=None or axis='index'.

DOC: update the pandas.DataFrame.cummax docstring #20336

DOC: update the pandas.DataFrame.cummax docstring #20336

Conversation

arminv commented Mar 13, 2018 • edited Loading

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 14, 2018 • edited Loading

Codecov Report

arminv commented Mar 14, 2018

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arminv Mar 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arminv Mar 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Mar 16, 2018 • edited Loading

Comment last updated on March 17, 2018 at 16:19 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arminv Mar 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arminv commented Mar 17, 2018 • edited Loading

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Mar 17, 2018

arminv commented Mar 17, 2018

arminv commented Mar 13, 2018 •

edited

Loading

codecov bot commented Mar 14, 2018 •

edited

Loading

arminv Mar 15, 2018 •

edited

Loading

arminv Mar 15, 2018 •

edited

Loading

pep8speaks commented Mar 16, 2018 •

edited

Loading

arminv Mar 16, 2018 •

edited

Loading

arminv commented Mar 17, 2018 •

edited

Loading