Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/BUG: boxplot now supports layout #7035

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented May 4, 2014

Closes #6769. Added layout kw to boxplot. I think tests can be improved if boxplot can return ndarray with the same shape as layout, as the same manner as hist (maybe after #4472).

Bug fix

It includes the fix to hide unnecessary axes than required in boxplot and hist. For example, layout=(2, 2) is specified for 3 subplots. I've listed affected cases as below.

import pandas as pd
import numpy as np

df = pd.DataFrame({'3g': 'A A A B B B C C C'.split(),
                   '2g': [1, 2, 1, 2, 1, 2, 1, 2, 2],
                   'values': np.random.rand(9), 
                   'values2': np.random.rand(9),
                   'values3': np.random.rand(9)})
# BoxPlot

# specify 3 columns with by keyword -> should be 3 axes in (2, 2) layout
df.boxplot(column=['values', 'values2', 'values3'], by='2g')

# groupby results in 3 groups -> should be 3 axes 
df.groupby('3g').boxplot()

# groupby results in 3 groups, with column kw -> should be 3 axes in (2, 2) layout
df.groupby('3g').boxplot(column=['values', 'values2', 'values3'])

# Histogram

# specify 3 columns without by kw -> should be 3 axes in (2, 2) layout
df.hist(column=['values', 'values2', 'values3'])

# groupby results 3 groups results of by kw -> should be 3 axes in (2, 2) layout
df.hist(column=['values'], by='3g')
# This results KeyError: '3g' in current master, and the error has been fixed.

# groupby results in 3 groups -> should be 3 axes in (2, 2) layout
df.hist(by='3g')

# layout contains more size than groups -> should be 2 axes in specified layout
df.hist(by='2g', layout=(2, 2))

@jreback jreback added this to the 0.14.1 milestone May 4, 2014
@sinhrks
Copy link
Member Author

sinhrks commented May 5, 2014

I haven't attached sample data. Modified.

@jreback
Copy link
Contributor

jreback commented May 7, 2014

@sinhrks is this different than #6991 ?

@TomAugspurger

@sinhrks
Copy link
Member Author

sinhrks commented May 10, 2014

@jreback Different. #6991 is to specify the target ax / figure for _subplots. This one is to specify the layout to be created by _supblots.

@jreback
Copy link
Contributor

jreback commented May 10, 2014

rebase

@TomAugspurger ok by u?

@TomAugspurger
Copy link
Contributor

For the first example df.boxplot(column=['values', 'values2', 'values3'], by='2g'), this is what I see
boxplot

(The box plots are draw in master)

@sinhrks
Copy link
Member Author

sinhrks commented May 10, 2014

Strange. I can find no logic to hide unnecessary axes in _grouped_plot_by_column in current master, and additional axes in bottom-left when I try. (Unrelated to the issue, but I've increased data to plot box cleary).
figure_1

@TomAugspurger
Copy link
Contributor

Sorry if I was unclear. that picture was from your PR.

This is what I see on master:
master

So your PR is correctly hiding the lower right box. I just wasn't sure why the boxplots weren't draw on my first picture.

@TomAugspurger
Copy link
Contributor

But it looks like it was just a scaling issue. Setting an axes ylim manually after plotting gets it correct.

You may want to look into how the ylim is set and see if you can get a better default.

@TomAugspurger
Copy link
Contributor

This is fine by me, pending that setting of the plot limits.

@sinhrks
Copy link
Member Author

sinhrks commented May 10, 2014

Thanks to confirm. I misunderstood that you're referring to axes issue. Maybe limit issue is the same as #5517?

Is there anything should be confirmed in this PR?

@TomAugspurger
Copy link
Contributor

My mistake. I have matplotlib 1.4 on that other environment. This PR is good!

@TomAugspurger
Copy link
Contributor

Merged via ebc4fed

@TomAugspurger
Copy link
Contributor

I kinda messed up the merge commit. I added my fix on the release conflict instead of continueing. So now we'll have

ebc4fed release conflict
49f3616 Merge pull request #7041 from sinhrks/unstack

in the history. @jreback is that OK?

@jreback
Copy link
Contributor

jreback commented May 10, 2014

yeh it didn't presever the original one. no biggie though.

@cpcloud
Copy link
Member

cpcloud commented May 16, 2014

@sinhrks FWIW seaborn handles these kinds of things almost trivially and allows for much prettier plots than pandas does. FacetGrid is especially neat, and so far it hasn't let me down. I guess I'm just kind of down on pandas plotting lately since there seems to be a fair number of quality plotting libraries out there that handle all the annoying details. Layout, color cycling, and plotting by multiple groups come to mind. In addition, most of the seaborn plots that can be called on individual columns or column names can be called with FacetGrid. Finally, you can call most of the matplotlib pyplot functions like plot and hist on FacetGrid. This has effectively nullified me using any of the pandas plotting methods (I used to use them for "quick" analyses, but seaborn is much faster at getting to a useful result). Ok, I'm done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

groupby boxplot creates more subplots than are needed
4 participants