Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the pandas.DataFrame.plot.hist docstring #20155

Merged
merged 12 commits into from
Mar 19, 2018
38 changes: 32 additions & 6 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2951,21 +2951,47 @@ def box(self, by=None, **kwds):

def hist(self, by=None, bins=10, **kwds):
"""
Histogram
Draw one histogram of the DataFrame's columns.

A histogram is a representation of the distribution of data.
This function groups the values of all given Series in the DataFrame
into bins, and draws all bins in only one :ref:`matplotlib.axes.Axes`.
This is useful when the DataFrame's Series are in a similar scale.

Parameters
----------
by : string or sequence
by : str or sequence, optional
Column in the DataFrame to group by.
bins: integer, default 10
Number of histogram bins to be used
`**kwds` : optional
bins : int, default 10
Number of histogram bins to be used.
**kwds
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them
axes : matplotlib.AxesSubplot histogram.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "or ndarray of them" was not correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was not correct. This function always returns one matplotlib.AxesSubplot. It's different than DataFrame.hist(), which returns an ndarray.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked, and the key to get this is passing subplots=True, then the different columns are each plotted in a subplot in instead of in overlay, en then you get an ndarray of axes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. However this is actually a parameter that is passed on to the plot() function, but it's not part of the default plot.hist() parameters... so IMHO it's ok to say that the return will be just one. Don't you think so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should still document it here, since this is what the return value of this function can be.

(btw, you are welcome to add subplots to the list of Parameters, we have a general issue about that we should document more of those common parameters in each of the methods)


See Also
--------
DataFrame.hist : Draw histograms per DataFrame's Series.
Series.hist : Draw a histogram with Series' data.

Examples
--------
When we draw a dice 6000 times, we expect to get each value around 1000
times. But when we draw two dices and sum the result, the distribution
is going to be quite different. A histogram illustrates those
distributions.

.. plot::
:context: close-figs

>>> df = pd.DataFrame(
... np.random.randint(1, 7, 6000),
... columns = ['one'])
>>> df['two'] = df['one'] + np.random.randint(1, 7, 6000)
>>> ax = df.plot.hist(bins=12, alpha=0.5)
"""
return self(kind='hist', by=by, bins=bins, **kwds)

Expand Down