-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Fix by
in DataFrame.plot.hist and DataFrame.plot.box
#28373
Merged
Merged
Changes from all commits
Commits
Show all changes
178 commits
Select commit
Hold shift + click to select a range
7e461a1
remove \n from docstring
charlesdong1991 1314059
fix conflicts
charlesdong1991 8bcb313
Merge remote-tracking branch 'upstream/master'
charlesdong1991 e36592c
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 b2f45a6
fix by in hist
charlesdong1991 8b6e00a
make plot work
charlesdong1991 dc0c2ec
add _group_plot function
charlesdong1991 d803938
check function
charlesdong1991 33dd762
reformat
charlesdong1991 d59d642
put import up
charlesdong1991 66eb06c
add comments
charlesdong1991 ea267ad
Mimic group plot
charlesdong1991 8095224
fix import failure
charlesdong1991 31decc1
reformat
charlesdong1991 e4bdbd0
fix test
charlesdong1991 4033159
hacky fix
charlesdong1991 57a3bdf
fix isrot
charlesdong1991 8060223
fix tests
charlesdong1991 d666334
fix import failure
charlesdong1991 3216d59
fix import error
charlesdong1991 45f4b7f
Update imports
charlesdong1991 2b0785b
test imports
charlesdong1991 d79dba3
new change
charlesdong1991 eca597b
fix conflict
charlesdong1991 321fbd2
restore removed line
charlesdong1991 a7b9ae5
Remove unused line
charlesdong1991 d2d13fd
Disruptive change
charlesdong1991 5abedb6
should work this time
charlesdong1991 d73115a
Add in-code comments
charlesdong1991 d7998bb
remove print
charlesdong1991 1bbf7ea
reformat
charlesdong1991 a279f45
Dropna
charlesdong1991 2b793ea
Add isna for multi column
charlesdong1991 04de066
try to remove warning
charlesdong1991 4adc324
test if removing pd works
charlesdong1991 d0103a4
revert changes
charlesdong1991 f94dbb4
try if warning gone
charlesdong1991 0415cb0
try again
charlesdong1991 525200b
merge master and fix conflict
charlesdong1991 1ab4310
merge master and fix conflict
charlesdong1991 c005880
fix conflict and merge master
charlesdong1991 a1fabc5
Fix linting error
charlesdong1991 70453f1
Add test
charlesdong1991 b6579a5
remove unused code
charlesdong1991 e99f3dc
add test and make code more robust
charlesdong1991 99d6d67
remove comment
charlesdong1991 8e2fcf6
clean the code
charlesdong1991 d02f4ac
simplify code
charlesdong1991 947189c
simplify code
charlesdong1991 6b5203d
fix linting
charlesdong1991 27d0d21
Add doc for hist
charlesdong1991 48ff521
revert change
charlesdong1991 f39d948
fix warning
charlesdong1991 5d1705c
isort
charlesdong1991 90471aa
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 46a8031
simplify code
charlesdong1991 57a96e6
simpler python
charlesdong1991 29127f0
remove unused
charlesdong1991 61bb97f
restore blank lines
charlesdong1991 62fb9e6
Add extensive tests
charlesdong1991 638174b
fix seed
charlesdong1991 02de005
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 5adb25d
code change based on reviews
charlesdong1991 7051432
fix linting
charlesdong1991 adbde9f
update doc
charlesdong1991 5dfad18
merge master and fix conflict
charlesdong1991 abd10f3
code change based on reviews
charlesdong1991 c20d81a
fixup
charlesdong1991 07112c0
fixup
charlesdong1991 fb0b87c
code change on reviews
charlesdong1991 a6a8e57
fix isort
charlesdong1991 7f77f48
short code
charlesdong1991 c09bb19
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 a120d27
simpler python
charlesdong1991 f87afee
add inline comment
charlesdong1991 82711ee
simplier pandas
charlesdong1991 60f7298
code change on JR review
charlesdong1991 071488b
fix linting
charlesdong1991 f2a0210
rebase and fix conflict
charlesdong1991 bb07e15
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 867094a
code change on reviews
charlesdong1991 b0f06b2
Add docstring
charlesdong1991 111e89c
fix typo
charlesdong1991 6472053
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 83ec868
remove blank
charlesdong1991 d6c8566
use more meaningful example
charlesdong1991 6a0ac8d
keep as is
charlesdong1991 49d0791
remove less useful comment
charlesdong1991 2bfbe78
change figsize
charlesdong1991 c5d7518
clean iter_data
charlesdong1991 03356ce
remove unused docs
charlesdong1991 7abc47d
cleaner pandas
charlesdong1991 db832b4
cleaner
charlesdong1991 be99a97
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 9ae5987
fixup
charlesdong1991 10c2ad1
rename
charlesdong1991 ce8cfd4
code change on reviews
charlesdong1991 627cc02
fixup
charlesdong1991 4bfbf03
rebase and resolve conflicts
charlesdong1991 ee8972d
linting
charlesdong1991 12ff785
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 163f920
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 0839be2
annotation
charlesdong1991 142ee53
annotation
charlesdong1991 ef65137
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 d793703
reverse change on annotation
charlesdong1991 2710cf2
fixup
charlesdong1991 f76d2cb
remove
charlesdong1991 a5ecbd7
add missing annoatation
charlesdong1991 439be51
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 5fd420e
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 4b4832f
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 7425dff
code change on WA review
charlesdong1991 8ab4b90
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 b06e454
solve mypy
charlesdong1991 79294ed
fix typo
charlesdong1991 9523bb9
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 cd59370
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 050ba95
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 add406f
code change on reviews
charlesdong1991 bb22c53
fix linting
charlesdong1991 25214e6
rename
charlesdong1991 aaa5c95
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 77e46f4
modulize reformat_y for hist
charlesdong1991 9de9c61
better annotation
charlesdong1991 af68d2e
improve annotation
charlesdong1991 b75015a
fix linting
charlesdong1991 b90303d
improve docstring
charlesdong1991 898fa9b
rebase
charlesdong1991 2ac32f5
remove added test file
charlesdong1991 f7bcdb7
revert change
charlesdong1991 aeb32e5
rebase
charlesdong1991 dc17959
fixup
charlesdong1991 4aee3e0
black
charlesdong1991 4eb466f
fixup
charlesdong1991 5160224
fix mypy
charlesdong1991 e2de0d3
fix mypy
charlesdong1991 b2b33ac
fix mypy
charlesdong1991 1199a93
fix mypy
charlesdong1991 c4a5842
fix mypy
charlesdong1991 6556414
fix mypy
charlesdong1991 826f277
fix flake8
charlesdong1991 891dc55
add by support for boxplot
charlesdong1991 4c4a158
doc
charlesdong1991 ea7e5b1
Add tests
charlesdong1991 006588e
flake8
charlesdong1991 4f0a1dc
move file
charlesdong1991 e1579e2
pprint label
charlesdong1991 f2c141f
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 5f96abd
rebase and fix conflict
charlesdong1991 06483af
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 08f534d
rebase fix conflicts
charlesdong1991 f1c3a1f
Resolve conflicts and adopt all feedbacks from Marc
charlesdong1991 e6e96d3
parametrize tests
charlesdong1991 52e47f1
Fix test
charlesdong1991 bc2f282
Code changes based on Marc reviews
charlesdong1991 a43d3bb
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 ceeb3c5
update doc
charlesdong1991 4fea841
version change
charlesdong1991 3ea2603
Use self.by
charlesdong1991 9f48139
better code
charlesdong1991 b1094e3
better inline comment
charlesdong1991 97bde59
code changes based on Marc reviews
charlesdong1991 444a964
minor fix
charlesdong1991 b66dad0
mypy
charlesdong1991 982f562
add future annotation
charlesdong1991 c76ad67
fix pre commit
charlesdong1991 2c1aa33
minor experimental fix
charlesdong1991 6896546
better doc string
charlesdong1991 3c54302
fixup doc fail
charlesdong1991 2d20178
code change on Macro reviews
charlesdong1991 a169dfd
Add more tests
charlesdong1991 d0b56ff
fixup
charlesdong1991 dec313c
code change on reviews
charlesdong1991 143f286
changes based on Jeff review
charlesdong1991 283286f
doc
charlesdong1991 f2a0736
Merge remote-tracking branch 'upstream/master' into fix_by_plot
charlesdong1991 f1aeee0
fix flake8
charlesdong1991 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
from __future__ import annotations | ||
|
||
import numpy as np | ||
|
||
from pandas._typing import ( | ||
Dict, | ||
IndexLabel, | ||
) | ||
|
||
from pandas.core.dtypes.missing import remove_na_arraylike | ||
|
||
from pandas import ( | ||
DataFrame, | ||
MultiIndex, | ||
Series, | ||
concat, | ||
) | ||
|
||
|
||
def create_iter_data_given_by( | ||
data: DataFrame, kind: str = "hist" | ||
) -> Dict[str, DataFrame | Series]: | ||
""" | ||
Create data for iteration given `by` is assigned or not, and it is only | ||
used in both hist and boxplot. | ||
|
||
If `by` is assigned, return a dictionary of DataFrames in which the key of | ||
dictionary is the values in groups. | ||
If `by` is not assigned, return input as is, and this preserves current | ||
status of iter_data. | ||
|
||
Parameters | ||
---------- | ||
data : reformatted grouped data from `_compute_plot_data` method. | ||
kind : str, plot kind. This function is only used for `hist` and `box` plots. | ||
|
||
Returns | ||
------- | ||
iter_data : DataFrame or Dictionary of DataFrames | ||
|
||
Examples | ||
-------- | ||
If `by` is assigned: | ||
|
||
>>> import numpy as np | ||
>>> tuples = [('h1', 'a'), ('h1', 'b'), ('h2', 'a'), ('h2', 'b')] | ||
>>> mi = MultiIndex.from_tuples(tuples) | ||
>>> value = [[1, 3, np.nan, np.nan], | ||
... [3, 4, np.nan, np.nan], [np.nan, np.nan, 5, 6]] | ||
>>> data = DataFrame(value, columns=mi) | ||
>>> create_iter_data_given_by(data) | ||
{'h1': DataFrame({'a': [1, 3, np.nan], 'b': [3, 4, np.nan]}), | ||
'h2': DataFrame({'a': [np.nan, np.nan, 5], 'b': [np.nan, np.nan, 6]})} | ||
""" | ||
|
||
# For `hist` plot, before transformation, the values in level 0 are values | ||
# in groups and subplot titles, and later used for column subselection and | ||
# iteration; For `box` plot, values in level 1 are column names to show, | ||
# and are used for iteration and as subplots titles. | ||
if kind == "hist": | ||
level = 0 | ||
else: | ||
level = 1 | ||
|
||
# Select sub-columns based on the value of level of MI, and if `by` is | ||
# assigned, data must be a MI DataFrame | ||
assert isinstance(data.columns, MultiIndex) | ||
return { | ||
col: data.loc[:, data.columns.get_level_values(level) == col] | ||
for col in data.columns.levels[level] | ||
} | ||
|
||
|
||
def reconstruct_data_with_by( | ||
data: DataFrame, by: IndexLabel, cols: IndexLabel | ||
) -> DataFrame: | ||
""" | ||
Internal function to group data, and reassign multiindex column names onto the | ||
result in order to let grouped data be used in _compute_plot_data method. | ||
|
||
Parameters | ||
---------- | ||
data : Original DataFrame to plot | ||
by : grouped `by` parameter selected by users | ||
cols : columns of data set (excluding columns used in `by`) | ||
|
||
Returns | ||
------- | ||
Output is the reconstructed DataFrame with MultiIndex columns. The first level | ||
of MI is unique values of groups, and second level of MI is the columns | ||
selected by users. | ||
|
||
Examples | ||
-------- | ||
>>> d = {'h': ['h1', 'h1', 'h2'], 'a': [1, 3, 5], 'b': [3, 4, 6]} | ||
>>> df = DataFrame(d) | ||
>>> reconstruct_data_with_by(df, by='h', cols=['a', 'b']) | ||
h1 h2 | ||
a b a b | ||
0 1 3 NaN NaN | ||
1 3 4 NaN NaN | ||
2 NaN NaN 5 6 | ||
""" | ||
grouped = data.groupby(by) | ||
|
||
data_list = [] | ||
for key, group in grouped: | ||
columns = MultiIndex.from_product([[key], cols]) | ||
sub_group = group[cols] | ||
sub_group.columns = columns | ||
data_list.append(sub_group) | ||
|
||
data = concat(data_list, axis=1) | ||
return data | ||
|
||
|
||
def reformat_hist_y_given_by( | ||
y: Series | np.ndarray, by: IndexLabel | None | ||
) -> Series | np.ndarray: | ||
"""Internal function to reformat y given `by` is applied or not for hist plot. | ||
|
||
If by is None, input y is 1-d with NaN removed; and if by is not None, groupby | ||
will take place and input y is multi-dimensional array. | ||
""" | ||
if by is not None and len(y.shape) > 1: | ||
return np.array([remove_na_arraylike(col) for col in y.T]).T | ||
return remove_na_arraylike(y) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was by allowed before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do allow
by
, but don't do anything on that. i will also change toversionchanged