Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use catplot with melt for different categories #1702

Closed
amueller opened this issue Mar 21, 2019 · 7 comments · Fixed by #2196
Closed

Can't use catplot with melt for different categories #1702

amueller opened this issue Mar 21, 2019 · 7 comments · Fixed by #2196

Comments

@amueller
Copy link

I would have expected that I can visualize different categorical variables in a count plot using catplot and melt.
However, it looks like the data is converted to categorical before grouping for the columns, and so the categories are shared among all the count plots. That doesn't really make sense if the different columns correspond to different categories.
Am I overlooking something or is there a different way of doing this?

import pandas as pd
import seaborn as sns
ames = pd.read_excel("http://www.amstat.org/publications/jse/v19n3/decock/AmesHousing.xls")

cat_cols = ['MS Zoning', 'Street', 'Alley', 'Lot Shape', 'Land Contour',
       'Utilities', 'Lot Config', 'Land Slope', 'Neighborhood', 'Condition 1',
       'Condition 2', 'Bldg Type', 'House Style', 'Roof Style', 'Roof Matl',
       'Exterior 1st', 'Exterior 2nd', 'Mas Vnr Type', 'Exter Qual',
       'Exter Cond', 'Foundation', 'Bsmt Qual', 'Bsmt Cond', 'Bsmt Exposure',
       'BsmtFin Type 1', 'BsmtFin Type 2', 'Heating', 'Heating QC',
       'Central Air', 'Electrical', 'Kitchen Qual', 'Functional',
       'Fireplace Qu', 'Garage Type', 'Garage Finish', 'Garage Qual',
       'Garage Cond', 'Paved Drive', 'Pool QC', 'Fence', 'Misc Feature',
       'Sale Type', 'Sale Condition']
ames_cat = ames[cat_cols]
sns.catplot(x='value', col='variable', data=ames_cat.melt(), sharex=False, sharey=False, col_wrap=5, kind='count')
@mwaskom
Copy link
Owner

mwaskom commented Mar 21, 2019

Indeed, catplot has some logic that unifies categories across the facets, because sharex (which just gets passed down to subplots) won't do it on its own. But it doesn't not do that if the user specifies sharex=False.

(Notes to myself: this is a bit tricky because it will have to figure out from the _CategoricalPlotter which way the plot is oriented, i.e. it's not as simple as depending on sharex).

Anyway you can use FacetGrid directly, which usually is risky because the categorical axis won't be shared, but that's what you want here:

g = sns.FacetGrid(ames_cat.melt(), col="variable", sharex=False, sharey=False, col_wrap=5)
g.map(sns.countplot, "value", order=None)

(If I understand what you're aiming for, this should do it).

@amueller
Copy link
Author

It does! someone should collect all your issue replies and make a seaborn cook-book.
Thank you for your quick and insightful reply!.

I'm not sure if I understand your explanation as I'm using sharex=False and I still observe the behavior. But I have the solution I was looking for!

@amueller
Copy link
Author

Ok possibly silly follow-up: Is this the idiomatic way to make the bars horizontal?

g = sns.FacetGrid(ames_cat.melt(), col="variable", sharex=False, sharey=False, col_wrap=5)
g.map(lambda x, **kwargs: sns.countplot(y=x, **kwargs), "value", order=None)

@mwaskom mwaskom reopened this Mar 22, 2019
@mwaskom
Copy link
Owner

mwaskom commented Mar 22, 2019

I'm going to keep this open because — while I think that Facet Grid plots should generally share x/y axes — share{x,y} is right there in the function signature and should do the right thing.

@mwaskom
Copy link
Owner

mwaskom commented Mar 22, 2019

Ok possibly silly follow-up: Is this the idiomatic way to make the bars horizontal?

I've been meaning to let you pass None to .map() which would make this easier but it's not currently implemented. In the meantime that would work, or you could use .map_dataframe:

g = sns.FacetGrid(ames_cat.melt(), col="variable", sharex=False, sharey=False, col_wrap=5)
g.map_dataframe(sns.countplot, y="value", order=None)

The one disadvantage is that you'd need to set the axes labels yourself (g.set_axis_labels), though here that doesn't strike me as necessary.

@amueller
Copy link
Author

amueller commented Mar 24, 2019

Thank you, I'll look more into map_dataframe!

@esmd
Copy link

esmd commented Feb 4, 2020

I'm having the same issue for countplot with dodge position. I'm using this dataset from Kaggle.

dictionary=dictionary.loc[~dictionary['Variable Name'].isin(['icu_admit_type', 'pred'])]
cols_categorical=dictionary.loc[dictionary['Data Type'] == "string"]['Variable Name'].tolist()
cols_categorical3=cols_categorical.copy()
cols_categorical3.append('hospital_death')

sharex=False works with the following code but the resulting plots are stacked

out=['bmi', 'apache_2_diagnosis','apache_3j_diagnosis']
a=[i for i in cols_categorical3 if i not in out]
dt=training_v2[a].melt(id_vars='hospital_death')
order=sorted(dt.variable.unique())
g = sns.FacetGrid(dt, col="variable", col_wrap=3,hue='hospital_death',height=5,sharex=False,sharey=False)
g.map(sns.countplot, "value")
g.add_legend()
[plt.setp(ax.get_xticklabels(), rotation=90) for ax in g.axes.flat]
g.fig.subplots_adjust(hspace=.9)
plt.show()

the following code provides dodge position but sharex=False does not work

out=['bmi', 'apache_2_diagnosis','apache_3j_diagnosis']
a=[i for i in cols_categorical3 if i not in out]
dt=training_v2[a].melt(id_vars='hospital_death')
sns.catplot(x='value',col='variable',col_wrap=3,hue='hospital_death',kind='count', data=dt,sharex=False,sharey=False)
plt.show()

any idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants