Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FacetGrid.data on object returned from relplot and displot #2623

Merged
merged 5 commits into from
Aug 2, 2021

Conversation

mwaskom
Copy link
Owner

@mwaskom mwaskom commented Jul 27, 2021

Fixes #2622

Context: relplot and displot no longer use the original approach of "set up a FacetGrid, map an axes-level function" such that the FacetGrid owns the data. This is because they can handle wide data and data passed directly as vectors, using the relevant module-level plotting class to parse the inputs. But as a consequence, the .data attribute on the FacetGrid that they return does not have all the columns that were passed into the function.

This PR attempts to improve that situation by merging the input dataframe with the processed dataframe used internally (which handles, e.g. the transform from wide data or vectors that were passed directly to plot kwargs, often without names).

Because input data specification is so flexible, this is a little tricky, and there may be some corner cases. For instance, if you pass a series object to one of the plot variables and it shares a name with one of the columns in the input dataframe, its data won't be represented in the output data. It may be possible to handle weird cases like that, but for now I am going to aim for simplicity / consistency with what was originally possible.

Test case with displot

g = sns.displot(
    data=tips.to_dict(orient="list"),
    x="total_bill",
    hue=tips["smoker"].rename("y_var"),
    col=tips["time"].to_numpy(),
)
print(g.data.head())

v0.11.1:

   total_bill y_var   _col_
0       16.99    No  Dinner
1       10.34    No  Dinner
2       21.01    No  Dinner
3       23.68    No  Dinner
4       24.59    No  Dinner

This branch:

   total_bill   tip     sex smoker  day    time  size   _col_ y_var
0       16.99  1.01  Female     No  Sun  Dinner     2  Dinner    No
1       10.34  1.66    Male     No  Sun  Dinner     3  Dinner    No
2       21.01  3.50    Male     No  Sun  Dinner     3  Dinner    No
3       23.68  3.31    Male     No  Sun  Dinner     2  Dinner    No
4       24.59  3.61  Female     No  Sun  Dinner     4  Dinner    No

Test case with relplot

g = sns.relplot(
    data=tips,
    x="total_bill",
    y=tips["tip"].to_numpy(),
    col=tips["time"].rename("col_var"),
)
print(g.data.head())

v0.11.1:

       x     y   hue  size style units   NaN col_var
0  16.99  1.01  None  None  None  None  None  Dinner
1  10.34  1.66  None  None  None  None  None  Dinner
2  21.01  3.50  None  None  None  None  None  Dinner
3  23.68  3.31  None  None  None  None  None  Dinner
4  24.59  3.61  None  None  None  None  None  Dinner

This branch:

   total_bill   tip     sex smoker  day    time  size   _y_ col_var
0       16.99  1.01  Female     No  Sun  Dinner     2  1.01  Dinner
1       10.34  1.66    Male     No  Sun  Dinner     3  1.66  Dinner
2       21.01  3.50    Male     No  Sun  Dinner     3  3.50  Dinner
3       23.68  3.31    Male     No  Sun  Dinner     2  3.31  Dinner
4       24.59  3.61  Female     No  Sun  Dinner     4  3.61  Dinner

TODO

  • Mention in release notes

@codecov
Copy link

codecov bot commented Jul 27, 2021

Codecov Report

Merging #2623 (9e66a00) into master (9c3dba6) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2623   +/-   ##
=======================================
  Coverage   97.45%   97.46%           
=======================================
  Files          17       17           
  Lines        6374     6386   +12     
=======================================
+ Hits         6212     6224   +12     
  Misses        162      162           
Impacted Files Coverage Δ
seaborn/distributions.py 96.40% <100.00%> (+0.02%) ⬆️
seaborn/relational.py 99.69% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c3dba6...9e66a00. Read the comment docs.

@mwaskom mwaskom merged commit cef0a2d into master Aug 2, 2021
@mwaskom mwaskom deleted the fix_figlevel_data branch August 2, 2021 00:02
@mwaskom mwaskom modified the milestones: v0.12.0, v0.11.2 Aug 6, 2021
mwaskom added a commit that referenced this pull request Aug 6, 2021
* Merge the input data with the internal data in relplot

* Merge input data with internal data in catplot

* Handle input dicts in figure level data merge

* Fix tests of relplot/displot with dict data

* Mention in release notes

(cherry picked from commit cef0a2d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FacetGrid data no longer contains all columns from original input dataframe
1 participant