Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Fix by in DataFrame.plot.hist and DataFrame.plot.box #28373

Merged
merged 178 commits into from
Jul 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
178 commits
Select commit Hold shift + click to select a range
7e461a1
remove \n from docstring
charlesdong1991 Dec 3, 2018
1314059
fix conflicts
charlesdong1991 Jan 19, 2019
8bcb313
Merge remote-tracking branch 'upstream/master'
charlesdong1991 Jul 30, 2019
e36592c
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Sep 10, 2019
b2f45a6
fix by in hist
charlesdong1991 Sep 10, 2019
8b6e00a
make plot work
charlesdong1991 Sep 10, 2019
dc0c2ec
add _group_plot function
charlesdong1991 Sep 10, 2019
d803938
check function
charlesdong1991 Sep 10, 2019
33dd762
reformat
charlesdong1991 Sep 10, 2019
d59d642
put import up
charlesdong1991 Sep 10, 2019
66eb06c
add comments
charlesdong1991 Sep 10, 2019
ea267ad
Mimic group plot
charlesdong1991 Sep 10, 2019
8095224
fix import failure
charlesdong1991 Sep 10, 2019
31decc1
reformat
charlesdong1991 Sep 10, 2019
e4bdbd0
fix test
charlesdong1991 Sep 10, 2019
4033159
hacky fix
charlesdong1991 Sep 10, 2019
57a3bdf
fix isrot
charlesdong1991 Sep 10, 2019
8060223
fix tests
charlesdong1991 Sep 11, 2019
d666334
fix import failure
charlesdong1991 Sep 11, 2019
3216d59
fix import error
charlesdong1991 Sep 11, 2019
45f4b7f
Update imports
charlesdong1991 Sep 11, 2019
2b0785b
test imports
charlesdong1991 Sep 11, 2019
d79dba3
new change
charlesdong1991 Sep 12, 2019
eca597b
fix conflict
charlesdong1991 Sep 12, 2019
321fbd2
restore removed line
charlesdong1991 Sep 12, 2019
a7b9ae5
Remove unused line
charlesdong1991 Sep 12, 2019
d2d13fd
Disruptive change
charlesdong1991 Sep 12, 2019
5abedb6
should work this time
charlesdong1991 Sep 13, 2019
d73115a
Add in-code comments
charlesdong1991 Sep 13, 2019
d7998bb
remove print
charlesdong1991 Sep 13, 2019
1bbf7ea
reformat
charlesdong1991 Sep 13, 2019
a279f45
Dropna
charlesdong1991 Sep 13, 2019
2b793ea
Add isna for multi column
charlesdong1991 Sep 14, 2019
04de066
try to remove warning
charlesdong1991 Sep 15, 2019
4adc324
test if removing pd works
charlesdong1991 Sep 16, 2019
d0103a4
revert changes
charlesdong1991 Sep 16, 2019
f94dbb4
try if warning gone
charlesdong1991 Sep 16, 2019
0415cb0
try again
charlesdong1991 Sep 17, 2019
525200b
merge master and fix conflict
charlesdong1991 Dec 23, 2019
1ab4310
merge master and fix conflict
charlesdong1991 Jan 4, 2020
c005880
fix conflict and merge master
charlesdong1991 Jan 4, 2020
a1fabc5
Fix linting error
charlesdong1991 Jan 4, 2020
70453f1
Add test
charlesdong1991 Jan 4, 2020
b6579a5
remove unused code
charlesdong1991 Jan 4, 2020
e99f3dc
add test and make code more robust
charlesdong1991 Jan 4, 2020
99d6d67
remove comment
charlesdong1991 Jan 4, 2020
8e2fcf6
clean the code
charlesdong1991 Jan 4, 2020
d02f4ac
simplify code
charlesdong1991 Jan 4, 2020
947189c
simplify code
charlesdong1991 Jan 4, 2020
6b5203d
fix linting
charlesdong1991 Jan 4, 2020
27d0d21
Add doc for hist
charlesdong1991 Jan 4, 2020
48ff521
revert change
charlesdong1991 Jan 4, 2020
f39d948
fix warning
charlesdong1991 Jan 4, 2020
5d1705c
isort
charlesdong1991 Jan 4, 2020
90471aa
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Jan 11, 2020
46a8031
simplify code
charlesdong1991 Jan 11, 2020
57a96e6
simpler python
charlesdong1991 Jan 11, 2020
29127f0
remove unused
charlesdong1991 Jan 11, 2020
61bb97f
restore blank lines
charlesdong1991 Jan 11, 2020
62fb9e6
Add extensive tests
charlesdong1991 Jan 11, 2020
638174b
fix seed
charlesdong1991 Jan 11, 2020
02de005
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Jan 15, 2020
5adb25d
code change based on reviews
charlesdong1991 Jan 15, 2020
7051432
fix linting
charlesdong1991 Jan 15, 2020
adbde9f
update doc
charlesdong1991 Jan 15, 2020
5dfad18
merge master and fix conflict
charlesdong1991 Feb 10, 2020
abd10f3
code change based on reviews
charlesdong1991 Feb 10, 2020
c20d81a
fixup
charlesdong1991 Feb 10, 2020
07112c0
fixup
charlesdong1991 Feb 10, 2020
fb0b87c
code change on reviews
charlesdong1991 Feb 11, 2020
a6a8e57
fix isort
charlesdong1991 Feb 11, 2020
7f77f48
short code
charlesdong1991 Feb 12, 2020
c09bb19
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Feb 19, 2020
a120d27
simpler python
charlesdong1991 Feb 23, 2020
f87afee
add inline comment
charlesdong1991 Feb 23, 2020
82711ee
simplier pandas
charlesdong1991 Feb 23, 2020
60f7298
code change on JR review
charlesdong1991 Feb 23, 2020
071488b
fix linting
charlesdong1991 Feb 23, 2020
f2a0210
rebase and fix conflict
charlesdong1991 Feb 27, 2020
bb07e15
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Mar 8, 2020
867094a
code change on reviews
charlesdong1991 Mar 8, 2020
b0f06b2
Add docstring
charlesdong1991 Mar 8, 2020
111e89c
fix typo
charlesdong1991 Mar 15, 2020
6472053
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Mar 15, 2020
83ec868
remove blank
charlesdong1991 Mar 15, 2020
d6c8566
use more meaningful example
charlesdong1991 Mar 15, 2020
6a0ac8d
keep as is
charlesdong1991 Mar 15, 2020
49d0791
remove less useful comment
charlesdong1991 Mar 15, 2020
2bfbe78
change figsize
charlesdong1991 Mar 15, 2020
c5d7518
clean iter_data
charlesdong1991 Apr 4, 2020
03356ce
remove unused docs
charlesdong1991 Apr 4, 2020
7abc47d
cleaner pandas
charlesdong1991 Apr 4, 2020
db832b4
cleaner
charlesdong1991 Apr 4, 2020
be99a97
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Apr 4, 2020
9ae5987
fixup
charlesdong1991 Apr 4, 2020
10c2ad1
rename
charlesdong1991 Apr 4, 2020
ce8cfd4
code change on reviews
charlesdong1991 Apr 7, 2020
627cc02
fixup
charlesdong1991 Apr 7, 2020
4bfbf03
rebase and resolve conflicts
charlesdong1991 Apr 10, 2020
ee8972d
linting
charlesdong1991 Apr 10, 2020
12ff785
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Apr 10, 2020
163f920
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Apr 15, 2020
0839be2
annotation
charlesdong1991 May 1, 2020
142ee53
annotation
charlesdong1991 May 1, 2020
ef65137
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 May 1, 2020
d793703
reverse change on annotation
charlesdong1991 May 5, 2020
2710cf2
fixup
charlesdong1991 May 5, 2020
f76d2cb
remove
charlesdong1991 May 5, 2020
a5ecbd7
add missing annoatation
charlesdong1991 May 11, 2020
439be51
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 May 11, 2020
5fd420e
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 May 11, 2020
4b4832f
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 May 18, 2020
7425dff
code change on WA review
charlesdong1991 May 21, 2020
8ab4b90
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Jun 17, 2020
b06e454
solve mypy
charlesdong1991 Jun 19, 2020
79294ed
fix typo
charlesdong1991 Jun 19, 2020
9523bb9
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Jun 19, 2020
cd59370
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Jun 20, 2020
050ba95
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Jun 20, 2020
add406f
code change on reviews
charlesdong1991 Jun 23, 2020
bb22c53
fix linting
charlesdong1991 Jun 23, 2020
25214e6
rename
charlesdong1991 Jun 27, 2020
aaa5c95
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Jun 27, 2020
77e46f4
modulize reformat_y for hist
charlesdong1991 Jun 27, 2020
9de9c61
better annotation
charlesdong1991 Jun 27, 2020
af68d2e
improve annotation
charlesdong1991 Jun 27, 2020
b75015a
fix linting
charlesdong1991 Jun 27, 2020
b90303d
improve docstring
charlesdong1991 Jun 27, 2020
898fa9b
rebase
charlesdong1991 May 21, 2021
2ac32f5
remove added test file
charlesdong1991 May 21, 2021
f7bcdb7
revert change
charlesdong1991 May 21, 2021
aeb32e5
rebase
charlesdong1991 May 21, 2021
dc17959
fixup
charlesdong1991 May 21, 2021
4aee3e0
black
charlesdong1991 May 21, 2021
4eb466f
fixup
charlesdong1991 May 21, 2021
5160224
fix mypy
charlesdong1991 May 22, 2021
e2de0d3
fix mypy
charlesdong1991 May 22, 2021
b2b33ac
fix mypy
charlesdong1991 May 22, 2021
1199a93
fix mypy
charlesdong1991 May 22, 2021
c4a5842
fix mypy
charlesdong1991 May 22, 2021
6556414
fix mypy
charlesdong1991 May 22, 2021
826f277
fix flake8
charlesdong1991 May 22, 2021
891dc55
add by support for boxplot
charlesdong1991 May 22, 2021
4c4a158
doc
charlesdong1991 May 22, 2021
ea7e5b1
Add tests
charlesdong1991 May 22, 2021
006588e
flake8
charlesdong1991 May 22, 2021
4f0a1dc
move file
charlesdong1991 May 22, 2021
e1579e2
pprint label
charlesdong1991 May 24, 2021
f2c141f
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 May 27, 2021
5f96abd
rebase and fix conflict
charlesdong1991 May 31, 2021
06483af
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Jun 1, 2021
08f534d
rebase fix conflicts
charlesdong1991 Jun 3, 2021
f1c3a1f
Resolve conflicts and adopt all feedbacks from Marc
charlesdong1991 Jun 29, 2021
e6e96d3
parametrize tests
charlesdong1991 Jun 29, 2021
52e47f1
Fix test
charlesdong1991 Jun 29, 2021
bc2f282
Code changes based on Marc reviews
charlesdong1991 Jun 30, 2021
a43d3bb
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Jun 30, 2021
ceeb3c5
update doc
charlesdong1991 Jun 30, 2021
4fea841
version change
charlesdong1991 Jun 30, 2021
3ea2603
Use self.by
charlesdong1991 Jun 30, 2021
9f48139
better code
charlesdong1991 Jun 30, 2021
b1094e3
better inline comment
charlesdong1991 Jun 30, 2021
97bde59
code changes based on Marc reviews
charlesdong1991 Jun 30, 2021
444a964
minor fix
charlesdong1991 Jun 30, 2021
b66dad0
mypy
charlesdong1991 Jun 30, 2021
982f562
add future annotation
charlesdong1991 Jun 30, 2021
c76ad67
fix pre commit
charlesdong1991 Jun 30, 2021
2c1aa33
minor experimental fix
charlesdong1991 Jun 30, 2021
6896546
better doc string
charlesdong1991 Jun 30, 2021
3c54302
fixup doc fail
charlesdong1991 Jul 1, 2021
2d20178
code change on Macro reviews
charlesdong1991 Jul 1, 2021
a169dfd
Add more tests
charlesdong1991 Jul 1, 2021
d0b56ff
fixup
charlesdong1991 Jul 1, 2021
dec313c
code change on reviews
charlesdong1991 Jul 10, 2021
143f286
changes based on Jeff review
charlesdong1991 Jul 12, 2021
283286f
doc
charlesdong1991 Jul 12, 2021
f2a0736
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 Jul 12, 2021
f1aeee0
fix flake8
charlesdong1991 Jul 12, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ enhancement2

Other enhancements
^^^^^^^^^^^^^^^^^^
- Add support for assigning values to ``by`` argument in :meth:`DataFrame.plot.hist` and :meth:`DataFrame.plot.box` (:issue:`15079`)
- :meth:`Series.sample`, :meth:`DataFrame.sample`, and :meth:`.GroupBy.sample` now accept a ``np.random.Generator`` as input to ``random_state``. A generator will be more performant, especially with ``replace=False`` (:issue:`38100`)
- Additional options added to :meth:`.Styler.bar` to control alignment and display (:issue:`26070`)
- :meth:`Series.ewm`, :meth:`DataFrame.ewm`, now support a ``method`` argument with a ``'table'`` option that performs the windowing operation over an entire :class:`DataFrame`. See :ref:`Window Overview <window.overview>` for performance and functional benefits (:issue:`42273`)
Expand Down
20 changes: 20 additions & 0 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1237,6 +1237,11 @@ def box(self, by=None, **kwargs):
----------
by : str or sequence
Column in the DataFrame to group by.

.. versionchanged:: 1.4.0

Previously, `by` is silently ignore and makes no groupings

**kwargs
Additional keywords are documented in
:meth:`DataFrame.plot`.
Expand Down Expand Up @@ -1278,6 +1283,11 @@ def hist(self, by=None, bins=10, **kwargs):
----------
by : str or sequence, optional
Column in the DataFrame to group by.

.. versionchanged:: 1.4.0

Previously, `by` is silently ignore and makes no groupings

bins : int, default 10
Number of histogram bins to be used.
**kwargs
Expand Down Expand Up @@ -1309,6 +1319,16 @@ def hist(self, by=None, bins=10, **kwargs):
... columns = ['one'])
>>> df['two'] = df['one'] + np.random.randint(1, 7, 6000)
>>> ax = df.plot.hist(bins=12, alpha=0.5)

A grouped histogram can be generated by providing the parameter `by` (which
can be a column name, or a list of column names):

.. plot::
:context: close-figs

>>> age_list = [8, 10, 12, 14, 72, 74, 76, 78, 20, 25, 30, 35, 60, 85]
>>> df = pd.DataFrame({"gender": list("MMMMMMMMFFFFFF"), "age": age_list})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was by allowed before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do allow by, but don't do anything on that. i will also change to versionchanged

>>> ax = df.plot.hist(column=["age"], by="gender", figsize=(10, 8))
"""
return self(kind="hist", by=by, bins=bins, **kwargs)

Expand Down
28 changes: 24 additions & 4 deletions pandas/plotting/_matplotlib/boxplot.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
LinePlot,
MPLPlot,
)
from pandas.plotting._matplotlib.groupby import create_iter_data_given_by
from pandas.plotting._matplotlib.style import get_standard_colors
from pandas.plotting._matplotlib.tools import (
create_subplots,
Expand Down Expand Up @@ -135,18 +136,37 @@ def _make_plot(self):
if self.subplots:
self._return_obj = pd.Series(dtype=object)

for i, (label, y) in enumerate(self._iter_data()):
# Re-create iterated data if `by` is assigned by users
data = (
create_iter_data_given_by(self.data, self._kind)
if self.by is not None
else self.data
)

for i, (label, y) in enumerate(self._iter_data(data=data)):
ax = self._get_ax(i)
kwds = self.kwds.copy()

# When by is applied, show title for subplots to know which group it is
# just like df.boxplot, and need to apply T on y to provide right input
if self.by is not None:
y = y.T
ax.set_title(pprint_thing(label))

# When `by` is assigned, the ticklabels will become unique grouped
# values, instead of label which is used as subtitle in this case.
ticklabels = [
pprint_thing(col) for col in self.data.columns.levels[0]
]
else:
ticklabels = [pprint_thing(label)]

ret, bp = self._plot(
ax, y, column_num=i, return_type=self.return_type, **kwds
)
self.maybe_color_bp(bp)
self._return_obj[label] = ret

label = [pprint_thing(label)]
self._set_ticklabels(ax, label)
self._set_ticklabels(ax, ticklabels)
else:
y = self.data.values.T
ax = self._get_ax(0)
Expand Down
55 changes: 52 additions & 3 deletions pandas/plotting/_matplotlib/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from matplotlib.artist import Artist
import numpy as np

from pandas._typing import IndexLabel
from pandas.errors import AbstractMethodError
from pandas.util._decorators import cache_readonly

Expand Down Expand Up @@ -38,10 +39,12 @@
)

import pandas.core.common as com
from pandas.core.frame import DataFrame

from pandas.io.formats.printing import pprint_thing
from pandas.plotting._matplotlib.compat import mpl_ge_3_0_0
from pandas.plotting._matplotlib.converter import register_pandas_matplotlib_converters
from pandas.plotting._matplotlib.groupby import reconstruct_data_with_by
from pandas.plotting._matplotlib.style import get_standard_colors
from pandas.plotting._matplotlib.timeseries import (
decorate_axes,
Expand Down Expand Up @@ -99,7 +102,7 @@ def __init__(
self,
data,
kind=None,
by=None,
by: IndexLabel | None = None,
subplots=False,
sharex=None,
sharey=False,
Expand All @@ -124,13 +127,42 @@ def __init__(
table=False,
layout=None,
include_bool=False,
column: IndexLabel | None = None,
**kwds,
):

import matplotlib.pyplot as plt

self.data = data
self.by = by

# if users assign an empty list or tuple, raise `ValueError`
# similar to current `df.box` and `df.hist` APIs.
if by in ([], ()):
raise ValueError("No group keys passed!")
self.by = com.maybe_make_list(by)

# Assign the rest of columns into self.columns if by is explicitly defined
# while column is not, only need `columns` in hist/box plot when it's DF
# TODO: Might deprecate `column` argument in future PR (#28373)
if isinstance(data, DataFrame):
if column:
self.columns = com.maybe_make_list(column)
else:
if self.by is None:
self.columns = [
col for col in data.columns if is_numeric_dtype(data[col])
]
else:
self.columns = [
col
for col in data.columns
if col not in self.by and is_numeric_dtype(data[col])
]

# For `hist` plot, need to get grouped original data before `self.data` is
# updated later
if self.by is not None and self._kind == "hist":
self._grouped = data.groupby(self.by)

self.kind = kind

Expand All @@ -139,7 +171,9 @@ def __init__(
self.subplots = subplots

if sharex is None:
if ax is None:

# if by is defined, subplots are used and sharex should be False
if ax is None and by is None:
self.sharex = True
else:
# if we get an axis, the users should do the visibility
Expand Down Expand Up @@ -273,8 +307,15 @@ def _iter_data(self, data=None, keep_index=False, fillna=None):

@property
def nseries(self) -> int:

# When `by` is explicitly assigned, grouped data size will be defined, and
# this will determine number of subplots to have, aka `self.nseries`
if self.data.ndim == 1:
return 1
elif self.by is not None and self._kind == "hist":
return len(self._grouped)
elif self.by is not None and self._kind == "box":
return len(self.columns)
else:
return self.data.shape[1]

Expand Down Expand Up @@ -420,6 +461,14 @@ def _compute_plot_data(self):
if label is None and data.name is None:
label = "None"
data = data.to_frame(name=label)
elif self._kind in ("hist", "box"):
cols = self.columns if self.by is None else self.columns + self.by
data = data.loc[:, cols]

# GH15079 reconstruct data if by is defined
if self.by is not None:
self.subplots = True
data = reconstruct_data_with_by(self.data, by=self.by, cols=self.columns)

# GH16953, _convert is needed as fallback, for ``Series``
# with ``dtype == object``
Expand Down
127 changes: 127 additions & 0 deletions pandas/plotting/_matplotlib/groupby.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
from __future__ import annotations

import numpy as np

from pandas._typing import (
Dict,
IndexLabel,
)

from pandas.core.dtypes.missing import remove_na_arraylike

from pandas import (
DataFrame,
MultiIndex,
Series,
concat,
)


def create_iter_data_given_by(
data: DataFrame, kind: str = "hist"
) -> Dict[str, DataFrame | Series]:
"""
Create data for iteration given `by` is assigned or not, and it is only
used in both hist and boxplot.

If `by` is assigned, return a dictionary of DataFrames in which the key of
dictionary is the values in groups.
If `by` is not assigned, return input as is, and this preserves current
status of iter_data.

Parameters
----------
data : reformatted grouped data from `_compute_plot_data` method.
kind : str, plot kind. This function is only used for `hist` and `box` plots.

Returns
-------
iter_data : DataFrame or Dictionary of DataFrames

Examples
--------
If `by` is assigned:

>>> import numpy as np
>>> tuples = [('h1', 'a'), ('h1', 'b'), ('h2', 'a'), ('h2', 'b')]
>>> mi = MultiIndex.from_tuples(tuples)
>>> value = [[1, 3, np.nan, np.nan],
... [3, 4, np.nan, np.nan], [np.nan, np.nan, 5, 6]]
>>> data = DataFrame(value, columns=mi)
>>> create_iter_data_given_by(data)
{'h1': DataFrame({'a': [1, 3, np.nan], 'b': [3, 4, np.nan]}),
'h2': DataFrame({'a': [np.nan, np.nan, 5], 'b': [np.nan, np.nan, 6]})}
"""

# For `hist` plot, before transformation, the values in level 0 are values
# in groups and subplot titles, and later used for column subselection and
# iteration; For `box` plot, values in level 1 are column names to show,
# and are used for iteration and as subplots titles.
if kind == "hist":
level = 0
else:
level = 1

# Select sub-columns based on the value of level of MI, and if `by` is
# assigned, data must be a MI DataFrame
assert isinstance(data.columns, MultiIndex)
return {
col: data.loc[:, data.columns.get_level_values(level) == col]
for col in data.columns.levels[level]
}


def reconstruct_data_with_by(
data: DataFrame, by: IndexLabel, cols: IndexLabel
) -> DataFrame:
"""
Internal function to group data, and reassign multiindex column names onto the
result in order to let grouped data be used in _compute_plot_data method.

Parameters
----------
data : Original DataFrame to plot
by : grouped `by` parameter selected by users
cols : columns of data set (excluding columns used in `by`)

Returns
-------
Output is the reconstructed DataFrame with MultiIndex columns. The first level
of MI is unique values of groups, and second level of MI is the columns
selected by users.

Examples
--------
>>> d = {'h': ['h1', 'h1', 'h2'], 'a': [1, 3, 5], 'b': [3, 4, 6]}
>>> df = DataFrame(d)
>>> reconstruct_data_with_by(df, by='h', cols=['a', 'b'])
h1 h2
a b a b
0 1 3 NaN NaN
1 3 4 NaN NaN
2 NaN NaN 5 6
"""
grouped = data.groupby(by)

data_list = []
for key, group in grouped:
columns = MultiIndex.from_product([[key], cols])
sub_group = group[cols]
sub_group.columns = columns
data_list.append(sub_group)

data = concat(data_list, axis=1)
return data


def reformat_hist_y_given_by(
y: Series | np.ndarray, by: IndexLabel | None
) -> Series | np.ndarray:
"""Internal function to reformat y given `by` is applied or not for hist plot.

If by is None, input y is 1-d with NaN removed; and if by is not None, groupby
will take place and input y is multi-dimensional array.
"""
if by is not None and len(y.shape) > 1:
return np.array([remove_na_arraylike(col) for col in y.T]).T
return remove_na_arraylike(y)
Loading