-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[discussion] Adding ECDF to seaborn? #1536
Comments
I'd like to see this, and would be happy to help implement it. |
I like this idea as well. Perhaps it would be best to plot it as a simple step function as in R's |
@dsaxton great idea! I have updated the code sample above such that the step function is the default, while the scatter is the alternative. This is mostly a convenience thing, just to keep the implementation simple, but yes, if @mwaskom agrees to this PR, then I'm happy to work together with you on the more fancy version you're describing. |
Sorry folks but I'm basically unavailable to review new features for at least the next few months. ECDF plots are nice idea in general but the API would need a lot of thinking — the proposed example (which I understand is just a proof of concept) would not fit in with existing seaborn tools. Given that the only manipulation of the data itself can be done in one line I'd strongly encourage you to try to get this into pandas or matplotlib itself. BTW, you may be aware but you can use an arbitrary unidimensional plotter (including self-defined ones) on the diagonal of a |
@mwaskom totally understand, no worries! Thanks for taking the time to reply nonetheless 😄. Also, thanks for the pointers! |
Gonna leave this open because it would be good to revisit in the future (though if you consider that "add qqplots" is one of the oldest open issues, the timescale of "the future" may be unbounded!) |
For what it's worth, I modified @ericmjl 's original code since I found it useful. Happy to help on the development. def ecdf(x=None, data=None, ax=None, step=True, palette=None, **kwargs):
""" Empirical Cumulative Distribution Function
Arguments
-----------
x : str or array-like, optional
Inputs for plotting long-form data.
data : DataFrame, array, or list of arrays, optional
Dataset for plotting. If `x` and `y` are absent, this is interpreted as wide-form.
Otherwise, data is expected to be long-form.
ax : matplotlib.axes, optional
Axes object to draw the plot onto, otherwise uses the current axes.
step : bool, optional
Whether or not to plot ECDF as horizontal steps
palette : palette name, list, or dict, optional
Colors to use for the different levels of the `hue` variable. Should be somthing that
can be interpreted by `color_palette()` or a dictionary mapping hue levels to
matplotlib colors.
**kwargs : Other keyword arguments are passed through to `plt.step` or `plt.scatter` at draw time
"""
# if no axes object create one
if ax is None:
fig, ax = plt.subplots()
# set palette if passed
if palette is not None:
sns.set_palette(palette)
# safety check on data
if x is None and data is None:
raise ValueError('No data passed')
if isinstance(x, str) and data is None:
raise ValueError('Unable to understand how to interpret data')
if isinstance(x, str) and data is not None:
xlabel = x
x = data[x]
elif isinstance(x, pd.Series):
xlabel = x.name
elif isinstance(x, np.ndarray):
xlabel = 'X'
ylabel = 'ECDF'
# sort values and get cumulative sum
x_val, cdf = np.sort(x), np.arange(1, len(x) + 1) / len(x)
if step:
ax.step(x_val, cdf, **kwargs)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
else:
ax.scatter(x_val, cdf, **kwargs)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
return ax |
This doesn't seem to work with x= a list. Error message
|
Here is an alternate implementation I use which is based off lineplot and compatible with all lineplot features (except for confidence intervals which don't make sense). `
` |
This is almost done. Feel free to weigh in on #2141 if you have thoughts about features or implementation. |
Closed by #2141 |
@mwaskom referencing this tweet re: ECDFs; I have a simple implementation ready to go which I have stored in textexpander, but I think it might be a useful contribution to
seaborn
users.The simplest unit of visualization is a scatterplot, for which an API might be:
With this plotting unit, it can be easily inserted into the pairplot as a replacement for the histogram that occurs on the diagonal (as an option for end-users, of course, not mandatory). I can also see extensions to other kinds of plots, for example, plotting multiple ECDFs on the same axes object.
As I understand it,
distplot
exists, and yes, granted, visualizing histograms is quite idiomatic for many users. That said, I do see some advantages of using ECDFs over histograms, the biggest one being that all data points are plotted, meaning it is impossible to bias the data using bins. I have more details in a blog post, but at a high level, the other biggest advantage I can see is reading off quantiles from the data easily. Also, compared to estimating a KDE, we make no assumptions regarding how the data are distributed (though yes, we can debate whether this is a good or bad thing).If you're open to having ECDFs inside seaborn, I'm happy to work on a PR for this. Might need some guidance to see if there's special things I need to look out for in the codebase (admittedly, it'll be my first PR to seaborn). Please let me know; I'm also happy to discuss more here before taking any action.
The text was updated successfully, but these errors were encountered: