-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the pandas.DataFrame.plot.hist docstring #20155
Changes from 7 commits
d77de8a
28f1043
a494111
ddd6ebf
a395983
bc6dbac
ccadb34
df05060
dbed27e
37adb70
53b43aa
627170e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2951,21 +2951,46 @@ def box(self, by=None, **kwds): | |
|
||
def hist(self, by=None, bins=10, **kwds): | ||
""" | ||
Histogram | ||
Draw one histogram of the DataFrame's columns. | ||
|
||
A histogram is a representation of the distribution of data. | ||
This function groups the values of all given Series in the DataFrame | ||
into bins, and draws all bins in only one matplotlib.AxesSubplot. This | ||
is useful when the DataFrame's Series are in a similar scale. | ||
|
||
Parameters | ||
---------- | ||
by : string or sequence | ||
by : str or sequence, optional | ||
Column in the DataFrame to group by. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be good to indicate how this affects the resulting plot There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually this parameter is passed on to DataFrame.plot(), which is passed on to matplotlib's plot() later... but apparently it's not used in any place at all. I tried with different examples, but in all cases this parameter is ignored. So I'm keeping the old documentation just for the sake of it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, yes, you seem to be correct. Will open an issue to discuss in general. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dukebody ah, yes, thanks for linking that |
||
bins: integer, default 10 | ||
Number of histogram bins to be used | ||
`**kwds` : optional | ||
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
bins : int, default 10 | ||
Number of histogram bins to be used. | ||
**kwds : optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove |
||
Keyword arguments to pass on to :meth:`pandas.DataFrame.plot`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you keep the original sentence here? (it was originally how you have it now, but we changed it in one of the previous PRs to say that ""Additional keywords are documented in DataFrame.plot" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorisvandenbossche We will have to revise this after all the sprint PRs have been merged, since people have been changing this and it might not be consistent anymore. But it will be fast and easy. See #20348 |
||
|
||
Returns | ||
------- | ||
axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them | ||
axes : matplotlib.AxesSubplot histogram. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "or ndarray of them" was not correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, it was not correct. This function always returns one matplotlib.AxesSubplot. It's different than DataFrame.hist(), which returns an ndarray. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just looked, and the key to get this is passing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. However this is actually a parameter that is passed on to the plot() function, but it's not part of the default plot.hist() parameters... so IMHO it's ok to say that the return will be just one. Don't you think so? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we should still document it here, since this is what the return value of this function can be. (btw, you are welcome to add |
||
|
||
See Also | ||
-------- | ||
pandas.DataFrame.hist : Draw histograms per DataFrame's Series. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove |
||
pandas.Series.hist : Draw a histogram with Series' data. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add |
||
Examples | ||
-------- | ||
When using values between 0 and 3, calling hist() with bins = 3 will | ||
create three bins: one that groups values between 0 and 1, another | ||
for values between 1 and 2, and another for values between 2 and 3. | ||
We use alpha parameter to be able to see overlapping columns. | ||
|
||
.. plot:: | ||
:context: close-figs | ||
|
||
>>> df = pd.DataFrame({ | ||
... 'length': [ 1.5, 0.5, 1.2, 0.9, 3], | ||
... 'width': [ 0.7, 0.2, 0.15, 0.2, 1.1] | ||
... }, index = ['pig', 'rabbit', 'duck', 'chicken', 'horse']) | ||
>>> hist = df.plot.hist(bins = 3, xticks = range(4), alpha = 0.5) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. inside the function call, no spaces around There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although I like the fact you used some more "real world" data, the problem is that with only 5 values, the histrogram will look rather strange. So I would maybe still generate some more random data in this case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. During the sprint some of us actually discussed about it: if you generate random data, each time the histogram will be different, and it will be kind of difficult to understand and to explain. So I tried to create some "real" data that can be small enough to generate a clear histogram that shows how it works. But if you think this data is not enough, I could try to improve it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you want reproducible data, you can set a random seed. Also, if you use quite some data eg from a normal distrubution, the plot will look very similar each time. I plotted the data above, and indeed I think it is too small, IMO it gives not a very good "sense" of what a good/useful histogram looks like. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for using random data, as it imparts when histograms are useful (summarizing the distribution of a larger dataset). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have been thinking about it but I can't imagine an easy way to generate random data that, at the same time, it's easy to understand in a compact form (this is just an Example section) and also shows the direct relation with the resulting histogram. Could you tell me any suggestion? On the other hand, in pandas.DataFrame.hist() docstring they used another "toy" example (which actually was my work too) and it's already merged... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
gives me which I think is a reasonable plot to show functionality of hist ?
That's true, but IMO we should change it there as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I found a random example that sounds quite natural and understandable. I hope you'll like it. |
||
""" | ||
return self(kind='hist', by=by, bins=bins, **kwds) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matplotlib.AxesSubplot
isn't a think. I'd say either "matplotlib axes" or :ref:`matplotlib.axes.Axes`.