Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FacetGrid data no longer contains all columns from original input dataframe #2622

Closed
SWu opened this issue Jul 26, 2021 · 3 comments · Fixed by #2623
Closed

FacetGrid data no longer contains all columns from original input dataframe #2622

SWu opened this issue Jul 26, 2021 · 3 comments · Fixed by #2623
Labels

Comments

@SWu
Copy link

SWu commented Jul 26, 2021

After recently upgrading from seaborn 0.9.1 to 0.11.1, I noticed that after using relplot on a dataframe, the returned FacetGrid object only contains a transformed dataframe which makes FacetGrid.facet_data() return a lot less useful for additional plot manipulation using information from the original dataframe. A minimal repro:

>>> import seaborn as sns
>>> import pandas as pd
>>> import numpy as np
>>> data = pd.DataFrame(np.random.randint(0,3,size=(15, 7)), columns=list('ABCDEFG'))
>>> data.columns
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G'], dtype='object')
>>> g = sns.relplot(x='A', y='B', row='C', col='D', hue='E', data=data)
>>> g.data.columns
Index(['x', 'y', 'hue', 'size', 'style', 'units', 'C', 'D'], dtype='object')
>>> for (i, j, k), data_ijk in g.facet_data():
...     data_ijk.F  # this no longer works

Previously, g.data would contain all of the original dataframe columns, and I was relying on that so that I could use g.facet_data() to augment plots with additional information from the original dataframe.

Is there a way to get back that functionality?

@SWu SWu changed the title FacetGrid data no longer the same as original input dataframe FacetGrid data no longer contains all columns from original input dataframe Jul 26, 2021
@mwaskom
Copy link
Owner

mwaskom commented Jul 26, 2021

This is a bug that was partially fixed in #2581. I say partially because the current solution preserves only the columns that are actually used in the relplot (but with their original names, if present). The problem was caused (and the solution made more complicated) when relplot was enhanced to plot wide-form data or vectors that are not in the input dataframe. There may be a better solution for the normal case of long input data with named vectors.

In the meantime, I think if you just reassign your input dataframe to the g.data attribute, the rest of your operations should work as expected.

@SWu
Copy link
Author

SWu commented Jul 26, 2021

Thanks. I did figure out that workaround to reassign back to g.data, which I can use for now.

I'm not sure of all the implications of the changes to support wide-form data, but I wonder if there's a better fix here to store separate attributes for the manipulated dataframes used internally by plotting functionality, while exposing the original data in the .data FacetGrid attribute.

@mwaskom
Copy link
Owner

mwaskom commented Jul 27, 2021

Thanks again for the reproducible report. See #2623 for a fix. Would be great if you have a chance to test it out with your actual usecase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants