-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul of categorical distribution plots #410
Conversation
aa86b4e
to
1233017
Compare
This was only really a staticmethod for early testing convenience.
This still needs a lot of cleaning up and testing. Most, but not all of it is here.
Trying to handle cases with 0 or 1 observations in a bin, but not every option works currently.
This commit also sucked in the new comments in the violinplot kde estimation method
This was causing issues with the old version of matplotlib so I am just killing it for now.
This is mostly done from my perspective (modulo #423), but I'm looking for testers to try it out on some real data and find any weird corner-cases before I merge. |
Due to the repeated values, I wouldn't be surprised if this was supposed to fail. If that's the case, should it fail more gracefully? Difficult to debug presently. from io import StringIO
import numpy as np
import matplotlib.pyplot as plt
import seaborn
strfile = """\
category,epazone,parameter,station,qual,res
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.000000094994903
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.000000094994903
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",inflow,ND,2.0
Wetland Basin,4,"Lead, Dissolved",outflow,ND,2.0
Wetland Basin,6,"Lead, Dissolved",inflow,ND,0.5
Wetland Basin,6,"Lead, Dissolved",outflow,ND,0.5
Wetland Basin,6,"Lead, Dissolved",inflow,=,0.8199999928474426
Wetland Basin,6,"Lead, Dissolved",outflow,=,2.5999999046325684
Wetland Basin,6,"Lead, Dissolved",inflow,=,1.5199999809265137
Wetland Basin,6,"Lead, Dissolved",outflow,=,7.449999809265137
Wetland Basin,6,"Lead, Dissolved",inflow,ND,0.5
Wetland Basin,6,"Lead, Dissolved",outflow,ND,0.5
Wetland Basin,6,"Lead, Dissolved",inflow,=,15.899999618530273
Wetland Basin,6,"Lead, Dissolved",outflow,=,2.190000057220459
Wetland Basin,6,"Lead, Dissolved",inflow,=,3.5899999141693115
Wetland Basin,6,"Lead, Dissolved",outflow,=,6.840000152587891
Wetland Basin,6,"Lead, Dissolved",inflow,=,1.5199999809265137
Wetland Basin,6,"Lead, Dissolved",outflow,=,3.9600000381469727
Wetland Basin,6,"Lead, Dissolved",inflow,=,1.5399999618530273
Wetland Basin,6,"Lead, Dissolved",outflow,=,3.2200000286102295
Wetland Basin,7,"Lead, Dissolved",inflow,=,0.5600000023841858
Wetland Basin,7,"Lead, Dissolved",outflow,=,2.0899999141693115
Wetland Basin,7,"Lead, Dissolved",inflow,=,0.46000000834465027
Wetland Basin,7,"Lead, Dissolved",outflow,=,0.550000011920929
Wetland Basin,7,"Lead, Dissolved",inflow,=,0.7799999713897705
Wetland Basin,7,"Lead, Dissolved",outflow,=,0.5899999737739563
Wetland Basin,7,"Lead, Dissolved",inflow,=,0.3700000047683716
Wetland Basin,7,"Lead, Dissolved",outflow,=,0.27000001072883606
Wetland Basin,7,"Lead, Dissolved",inflow,=,0.23999999463558197
Wetland Basin,7,"Lead, Dissolved",outflow,=,0.3700000047683716
Wetland Basin,7,"Lead, Dissolved",inflow,=,0.6399999856948853
Wetland Basin,7,"Lead, Dissolved",outflow,=,0.8399999737739563
Wetland Basin,7,"Lead, Dissolved",inflow,=,0.5699999928474426
Wetland Basin,7,"Lead, Dissolved",outflow,=,1.2899999618530273
"""
df = pandas.read_csv(StringIO(strfile))
df['logres'] = np.log(df['res'])
fig, ax = plt.subplots()
seaborn.violinplot(x='epazone', y='logres', hue='station', data=df, ax=ax, split=True) Again, this is all very awesome. |
Ding ding, we have a winner. Fixed with bb8af70 |
Your dataset is also doing some weird things with the area scaling when I remove the hue nesting, but I'm not entirely sure it's a "bug" or what the right way to fix it would be. |
Overhaul of categorical distribution plots
Ohhhhh man, @mwaskom you are my hero. Can't wait to play around with these. Categorical, horizontal violinplots with from long dataframes, compatible with FacetGrid... yum. |
Not sure if it's the right place for the comment, just want to say that combining e.g. P.S. |
TLDR: The boxplot and violinplot APIs are changing, for the better, but in a way that will be mildly disruptive. There is also a new function,
stripplot
.There's some examples below, but to really see these functions in action, check out the new API docs that take advantage of automated figure collection for docstring examples:
boxplot
|violinplot
|stripplot
Changes/enhancements to
boxplot
andviolinplot
This PR updates and unifies the API for boxplot and violinplot. Both functions maintain backwards-compatibility in terms of the kind of data they accept, but the syntax has changed. These functions are now invoked with
x
,y
parameters that are either vectors of data or names of variables in a long-form DataFrame passed to the newdata
parameter. You can still pass wide-form DataFrames or arrays todata
, but it is no longer the first positional argument.In other words instead of doing
You would now do
Existing code that uses these functions will probably break, but can be easily updated. I don't like these kind of disruptive API changes, but in this case the new API has a lot of virtues and creating a smoother upgrade path would have been too complicated to reasonably handle.
The upshot of this is that both functions now work seamlessly in context of a FacetGrid. Additionally, by using named variables and a
data
object, it's much easier to apply transformations to the data in the body of the seaborn call. It also just generally decreases the cognitive overhead of remembering that the API for boxplot/violinplot is different from that for regplot and friends.To sweeten this change, there are a variety of other enhancements (and a few other API breaks):
hue
argument to boxplot and violinplot, which allows for nested grouping the plot elements by a third categorical variable. For violinplot, this nesting can also be accomplished by splitting the violins when there are two levels of thehue
variable. To make this functionality feasible, the ability to specify where the plots will be draw in data coordinates has been removed. These plots now are drawn at set positions, like (and identical to) barplot and pointplot.palette
parameter to boxplot/violinplot. Thecolor
parameter still exists, but no longer does double-duty in accepting the name of a seaborn palette.palette
supersedescolor
so that it can be used with a FacetGrid.scale
andscale_hue
parameters to violinplot. These control how the width of the violins are scaled. The default isarea
, which is different from how the violins used to be drawn. Usescale='width'
to get the old behavior. You can also usescale="count"
to scale by the number of observations in each bin.box
kind of interior plot in violinplot, which shows the whisker range in addition to the quartiles. Useinner='quartile'
to get the old style.New
stripplot
functionThis PR also introduces the stripplot function, which draws a scatterplot where one of the variables is categorical. This plot has the same API as boxplot and violinplot. It is useful both on its own and when composed with one of these other plot kinds to show both the observations and underlying distribution.
Backend details
For the aficionados, this PR involves a complete rewrite of the code for these functions. It's much better organized, abstracted, and tested. That means it will be easier to keep these functions on a common API going forward, and to add enhancements with more confidence that they won't lead to regressions.
Next up will probably be to bring the
barplot
/pointplot
code into this framework, which in some ways is better and more robust than what those run on. Also coming soon....swarmplot
.