-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the pandas.DataFrame.plot.hist docstring #20155
DOC: update the pandas.DataFrame.plot.hist docstring #20155
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, remove the blank line after
Examples
--------
For the comment on **kwds
, remove the backticks around **kwds and replace the description with "Keyword arguments to pass on to :meth:pandas.DataFrame.plot
.".
For the "Returns" section, indicate the type the method returns and provide a clear and brief explanation. Avoid the use of "them" if not referring to the subject of the sentence.
pandas/plotting/_core.py
Outdated
|
||
Parameters | ||
---------- | ||
by : string or sequence | ||
by : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing sequence? A sequence of string can be passed to select the columns to be plotted.
Codecov Report
@@ Coverage Diff @@
## master #20155 +/- ##
==========================================
+ Coverage 91.77% 91.79% +0.02%
==========================================
Files 152 152
Lines 49205 49205
==========================================
+ Hits 45159 45169 +10
+ Misses 4046 4036 -10
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Added some comments
pandas/plotting/_core.py
Outdated
@@ -2951,21 +2951,46 @@ def box(self, by=None, **kwds): | |||
|
|||
def hist(self, by=None, bins=10, **kwds): | |||
""" | |||
Histogram | |||
Draw one histogram of the DataFrame's Series using matplotlib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the DataFrame's Series" -> "the DataFrame's columns" ?
The "using matplotlib" is not needed in this summary line, as all our plotting machinery is based on matplotlb
pandas/plotting/_core.py
Outdated
|
||
Parameters | ||
---------- | ||
by : string or sequence | ||
by : str or sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add , optional
pandas/plotting/_core.py
Outdated
|
||
Parameters | ||
---------- | ||
by : string or sequence | ||
by : str or sequence | ||
Column in the DataFrame to group by. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to indicate how this affects the resulting plot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this parameter is passed on to DataFrame.plot(), which is passed on to matplotlib's plot() later... but apparently it's not used in any place at all. I tried with different examples, but in all cases this parameter is ignored. So I'm keeping the old documentation just for the sake of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes, you seem to be correct. Will open an issue to discuss in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dukebody ah, yes, thanks for linking that
|
||
Returns | ||
------- | ||
axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them | ||
axes : matplotlib.AxesSubplot histogram. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "or ndarray of them" was not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it was not correct. This function always returns one matplotlib.AxesSubplot. It's different than DataFrame.hist(), which returns an ndarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looked, and the key to get this is passing subplots=True
, then the different columns are each plotted in a subplot in instead of in overlay, en then you get an ndarray of axes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. However this is actually a parameter that is passed on to the plot() function, but it's not part of the default plot.hist() parameters... so IMHO it's ok to say that the return will be just one. Don't you think so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should still document it here, since this is what the return value of this function can be.
(btw, you are welcome to add subplots
to the list of Parameters, we have a general issue about that we should document more of those common parameters in each of the methods)
pandas/plotting/_core.py
Outdated
See Also | ||
-------- | ||
:meth:`pandas.DataFrame.hist` : Draw histograms per DataFrame's Series. | ||
:meth:`pandas.Series.hist` : Draw a histogram with Series' data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the :meth`..`
part here, as sphinx will make those links automatically
pandas/plotting/_core.py
Outdated
... 'length': [ 1.5, 0.5, 1.2, 0.9, 3], | ||
... 'width': [ 0.7, 0.2, 0.15, 0.2, 1.1] | ||
... }, index= ['pig', 'rabbit', 'duck', 'chicken', 'horse']) | ||
>>> hist = df.plot.hist(bins = 3, xticks = range(4), alpha = 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inside the function call, no spaces around =
for PEP8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! Added few more comments, almost there!
pandas/plotting/_core.py
Outdated
bins : int, default 10 | ||
Number of histogram bins to be used. | ||
**kwds : optional | ||
Keyword arguments to pass on to :meth:`pandas.DataFrame.plot`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep the original sentence here? (it was originally how you have it now, but we changed it in one of the previous PRs to say that ""Additional keywords are documented in DataFrame.plot"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche We will have to revise this after all the sprint PRs have been merged, since people have been changing this and it might not be consistent anymore. But it will be fast and easy.
See #20348
pandas/plotting/_core.py
Outdated
... 'length': [ 1.5, 0.5, 1.2, 0.9, 3], | ||
... 'width': [ 0.7, 0.2, 0.15, 0.2, 1.1] | ||
... }, index = ['pig', 'rabbit', 'duck', 'chicken', 'horse']) | ||
>>> hist = df.plot.hist(bins = 3, xticks = range(4), alpha = 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I like the fact you used some more "real world" data, the problem is that with only 5 values, the histrogram will look rather strange. So I would maybe still generate some more random data in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the sprint some of us actually discussed about it: if you generate random data, each time the histogram will be different, and it will be kind of difficult to understand and to explain. So I tried to create some "real" data that can be small enough to generate a clear histogram that shows how it works. But if you think this data is not enough, I could try to improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want reproducible data, you can set a random seed. Also, if you use quite some data eg from a normal distrubution, the plot will look very similar each time.
I plotted the data above, and indeed I think it is too small, IMO it gives not a very good "sense" of what a good/useful histogram looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for using random data, as it imparts when histograms are useful (summarizing the distribution of a larger dataset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been thinking about it but I can't imagine an easy way to generate random data that, at the same time, it's easy to understand in a compact form (this is just an Example section) and also shows the direct relation with the resulting histogram. Could you tell me any suggestion?
On the other hand, in pandas.DataFrame.hist() docstring they used another "toy" example (which actually was my work too) and it's already merged...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [64]: df = pd.DataFrame({'a': np.random.randn(1000), 'b': np.random.randn(1000)*1.5 + 2})
In [65]: df.plot.hist(alpha=0.5)
Out[65]: <matplotlib.axes._subplots.AxesSubplot at 0x7fbaadcea6d8>
gives me
which I think is a reasonable plot to show functionality of hist ?
On the other hand, in pandas.DataFrame.hist() docstring they used another "toy" example (which actually was my work too) and it's already merged...
That's true, but IMO we should change it there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I found a random example that sounds quite natural and understandable. I hope you'll like it.
Hello @liopic! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on March 19, 2018 at 20:56 Hours UTC |
pandas/plotting/_core.py
Outdated
|
||
A histogram is a representation of the distribution of data. | ||
This function groups the values of all given Series in the DataFrame | ||
into bins, and draws all bins in only one matplotlib.AxesSubplot. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matplotlib.AxesSubplot
isn't a think. I'd say either "matplotlib axes" or :ref:`matplotlib.axes.Axes`.
pandas/plotting/_core.py
Outdated
`**kwds` : optional | ||
bins : int, default 10 | ||
Number of histogram bins to be used. | ||
**kwds : optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove : optional
.
pandas/plotting/_core.py
Outdated
|
||
See Also | ||
-------- | ||
pandas.DataFrame.hist : Draw histograms per DataFrame's Series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove pandas.
pandas/plotting/_core.py
Outdated
See Also | ||
-------- | ||
pandas.DataFrame.hist : Draw histograms per DataFrame's Series. | ||
pandas.Series.hist : Draw a histogram with Series' data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove pandas.
pandas/plotting/_core.py
Outdated
-------- | ||
pandas.DataFrame.hist : Draw histograms per DataFrame's Series. | ||
pandas.Series.hist : Draw a histogram with Series' data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add DataFrame.plot
pandas/plotting/_core.py
Outdated
... 'length': [ 1.5, 0.5, 1.2, 0.9, 3], | ||
... 'width': [ 0.7, 0.2, 0.15, 0.2, 1.1] | ||
... }, index = ['pig', 'rabbit', 'duck', 'chicken', 'horse']) | ||
>>> hist = df.plot.hist(bins = 3, xticks = range(4), alpha = 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for using random data, as it imparts when histograms are useful (summarizing the distribution of a larger dataset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice example!
(made a small amendment mainly for PEP8 fixup)
@liopic Thanks for the PR 👍 |
…ame_describe * upstream/master: (158 commits) Add link to "Craft Minimal Bug Report" blogpost (pandas-dev#20431) BUG: fixed json_normalize for subrecords with NoneTypes (pandas-dev#20030) (pandas-dev#20399) BUG: ExtensionArray.fillna for scalar values (pandas-dev#20412) DOC" update the Pandas core window rolling count docstring" (pandas-dev#20264) DOC: update the pandas.DataFrame.plot.hist docstring (pandas-dev#20155) DOC: Only use ~ in class links to hide prefixes. (pandas-dev#20402) Bug: Allow np.timedelta64 objects to index TimedeltaIndex (pandas-dev#20408) DOC: add disallowing of Series construction of len-1 list with index to whatsnew (pandas-dev#20392) MAINT: Remove weird pd file DOC: update the Index.isin docstring (pandas-dev#20249) BUG: Handle all-NA blocks in concat (pandas-dev#20382) DOC: update the pandas.core.resample.Resampler.fillna docstring (pandas-dev#20379) BUG: Don't raise exceptions splitting a blank string (pandas-dev#20067) DOC: update the pandas.DataFrame.cummax docstring (pandas-dev#20336) DOC: update the pandas.core.window.x.mean docstring (pandas-dev#20265) DOC: update the api.types.is_number docstring (pandas-dev#20196) Fix linter (pandas-dev#20389) DOC: Improved the docstring of pandas.Series.dt.to_pytimedelta (pandas-dev#20142) DOC: update the pandas.Series.dt.is_month_end docstring (pandas-dev#20181) DOC: update the window.Rolling.min docstring (pandas-dev#20263) ...
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks: