-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Scatter plots of one variable vs another #2277
Conversation
Could not review pull request. It may be too large, or contain no reviewable changes. |
Could not review pull request. It may be too large, or contain no reviewable changes. |
xarray/plot/facetgrid.py
Outdated
@@ -280,7 +280,8 @@ def map_dataarray_line(self, x=None, y=None, hue=None, **kwargs): | |||
self : FacetGrid object | |||
|
|||
""" | |||
from .plot import line, _infer_line_data | |||
from .plot import (_infer_line_data, _infer_scatter_data, | |||
line, dataset_scatter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E128 continuation line under-indented for visual indent
xarray/plot/plot.py
Outdated
|
||
def dataset_scatter(ds, x=None, y=None, hue=None, col=None, row=None, | ||
col_wrap=None, sharex=True, sharey=True, aspect=None, | ||
size=None, subplot_kws=None, add_legend=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E241 multiple spaces after ','
xarray/plot/plot.py
Outdated
if size is None: | ||
size = 3 | ||
elif figsize is not None: | ||
raise ValueError('cannot provide both `figsize` and `size` arguments') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (82 > 79 characters)
xarray/plot/plot.py
Outdated
g = FacetGrid(data=ds, col=col, row=row, col_wrap=col_wrap, | ||
sharex=sharex, sharey=sharey, figsize=figsize, | ||
aspect=aspect, size=size, subplot_kws=subplot_kws) | ||
return g.map_dataarray_line(x=x, y=y, hue=hue, plotfunc=dataset_scatter, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (90 > 79 characters)
xarray/plot/plot.py
Outdated
aspect=aspect, size=size, subplot_kws=subplot_kws) | ||
return g.map_dataarray_line(x=x, y=y, hue=hue, plotfunc=dataset_scatter, **kwargs) | ||
|
||
xplt, yplt, hueplt, xlabel, ylabel, huelabel = _infer_scatter_data(ds, x, y, hue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E501 line too long (85 > 79 characters)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yohai Thanks! This is fantastic.
I've done an initial review with some API feedback but things mostly look good and it's pretty functional. Of course, it needs docs & tests but this is a nice start.
xarray/plot/facetgrid.py
Outdated
_, _, hueplt, xlabel, ylabel, huelabel = _infer_line_data( | ||
darray=self.data.loc[self.name_dicts.flat[0]], | ||
x=x, y=y, hue=hue) | ||
elif plotfunc == dataset_scatter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this bit should be in a separate map_dataset
function that can be reused as the Dataset plotting API becomes more complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's easy to do. thanks.
xarray/plot/facetgrid.py
Outdated
_, _, hueplt, xlabel, ylabel, huelabel = _infer_line_data( | ||
darray=self.data.loc[self.name_dicts.flat[0]], | ||
x=x, y=y, hue=hue) | ||
darray=self.data.loc[self.name_dicts.flat[0]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E126 continuation line over-indented for hanging indent
xarray/plot/facetgrid.py
Outdated
self._mappables.append(mappable) | ||
|
||
data = _infer_scatter_data( | ||
ds=self.data.loc[self.name_dicts.flat[0]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E126 continuation line over-indented for hanging indent
This seems like a very cool and useful feature! A few comments:
Happy to provide a more detailed review once the tests are implemented. |
xarray/tests/test_plot.py
Outdated
class TestScatterPlots(PlotTestCase): | ||
def setUp(self): | ||
das = [DataArray(np.random.randn(3, 3, 4, 4), | ||
dims=['x', 'row', 'col', 'hue'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E127 continuation line over-indented for visual indent
I disagree here. In many cases, we already follow the naming conventions from Seaborn instead, which uses meaningful names. c and s are pretty meaningless. |
It is possibly worth taking a look at the recent (not yet released)
I guess I can see the virtue in sticking with matplotlib's old |
I don't have an opinion about naming variables and would be happy with whatever decision y'all make. For the code -- I added tests and changed the logic a bit. Following @dcherian's suggestion, now the default behavior is no longer coloring hues with discrete values (legend) but rather with a continuous scale (colorbar). It does make actually more sense and I think it should also be the default behavior for regular line plots. This is the API now: A = xr.DataArray(np.zeros([3, 20, 4, 4]), dims=[ 'x', 'y', 'z', 'w'],
coords=[np.sort(np.random.randn(k)) for k in [3,20,4,4]])
ds=xr.Dataset({'A': A.x+A.y+A.z+A.w,
'B': -0.2/A.x-2.3*A.y-np.abs(A.z)**0.123+A.w**2})
ds.A.attrs['units'] = 'Aunits'
ds.B.attrs['units'] = 'Bunits'
ds.z.attrs['units'] = 'Zunits'
ds.plot.scatter(x='A', y='B') Specifying ds.plot.scatter(x='A',y='B', hue='z')
ds['z']= ['who', 'let','dog','out']
ds.plot.scatter(x='A',y='B', hue='z') If you want a discrete legend even for numeric ds.plot.scatter(x='A',y='B', hue='w', discrete_legend=True) I am a bit bothered by the fact that this is not only a different coloring method, it's a very different style altogether (under the hood using Of course, faceting works as you think it should: ds.plot.scatter(x='A',y='B', hue='z',col='x')
ds.plot.scatter(x='A',y='B', hue='w',col='x', col_wrap=2) |
This is looking really nice. Coincidentally, the new version of Seaborn was released today, and has a whole new doc section on "relational plots": http://seaborn.pydata.org/tutorial/relational.html#relational-tutorial It's probably worth a look over to see if it has good ideas worth stealing, or if we want to make intentional deviations from its behavior in xarray. |
* master: (68 commits) enable sphinx.ext.napoleon (pydata#3180) remove type annotations from autodoc method signatures (pydata#3179) Fix regression: IndexVariable.copy(deep=True) casts dtype=U to object (pydata#3095) Fix distributed.Client.compute applied to DataArray (pydata#3173) More annotations in Dataset (pydata#3112) Hotfix for case of combining identical non-monotonic coords (pydata#3151) changed url for rasterio network test (pydata#3162) to_zarr(append_dim='dim0') doesn't need mode='a' (pydata#3123) BUG: fix+test groupby on empty DataArray raises StopIteration (pydata#3156) Temporarily remove pynio from py36 CI build (pydata#3157) missing 'about' field (pydata#3146) Fix h5py version printing (pydata#3145) Remove the matplotlib=3.0 constraint from py36.yml (pydata#3143) disable codecov comments (pydata#3140) Merge broadcast_like docstrings, analyze implementation problem (pydata#3130) Update whats-new for pydata#3125 and pydata#2334 (pydata#3135) Fix tests on big-endian systems (pydata#3125) XFAIL tests failing on ARM (pydata#2334) Add broadcast_like. (pydata#3086) Better docs and errors about expand_dims() view (pydata#3114) ...
I don't know what to do about this test failure:
|
Yay, tests pass. I'll merge in a few days (cc @pydata/xarray, @yohai ) |
Thanks for this
…On Mon, Aug 5, 2019 at 12:04 PM Deepak Cherian ***@***.***> wrote:
Yay, tests pass. I'll merge in a few days (cc @pydata/xarray
<https://github.com/orgs/pydata/teams/xarray>, @yohai
<https://github.com/yohai> )
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2277?email_source=notifications&email_token=ABPA5PNDIYQDJME4CUJREDDQDBFQNA5CNFSM4FJJSKRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3SJCVA#issuecomment-518295892>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABPA5PMUANDUGDQKZHTIPYDQDBFQNANCNFSM4FJJSKRA>
.
--
Yohai Bar Sinai
Post Doctoral Fellow
John A. Paulson School of Engineering and Applied Sciences
Harvard University
|
* master: pyupgrade one-off run (pydata#3190) mfdataset, concat now support the 'join' kwarg. (pydata#3102) reduce the size of example dataset in dask docs (pydata#3187) add climpred to related-projects (pydata#3188) bump rasterio to 1.0.24 in doc building environment (pydata#3186) More annotations (pydata#3177) Support for __array_function__ implementers (sparse arrays) [WIP] (pydata#3117) Internal clean-up of isnull() to avoid relying on pandas (pydata#3132) Call darray.compute() in plot() (pydata#3183) BUG: fix + test open_mfdataset fails on variable attributes with list… (pydata#3181)
Merging. |
Thanks @dcherian ! |
Yeah! It owes a lot to your hard work too. Hopefully people find this useful. |
* commit 'f172c673': ENH: Scatter plots of one variable vs another (pydata#2277) Escape code markup (pydata#3189)
whats-new.rst
for all changes andapi.rst
for new APIsize
?Say you have two variables in a
Dataset
and you want to make a scatter plot of one vs the other, possibly using different hues and/or faceting. This is useful if you want to inspect the data to see whether two variables have some underlying relationships between them that you might have missed. It's something that I found myself manually writing the code for quite a few times, so I thought it would be better to have it as a feature. I'm not sure if this is actually useful for other people, but I have the feeling that it probably is.First, set up dataset with two variables:
Now, we can plot all values of
A
vs all values ofB
:What a mess. Wouldn't it be nice if you could color each point according to the value of some coordinate, say
w
?Huh! There seems to be some underlying structure there. Can we also facet over a different coordinate?
or two coordinates?
The logic is that dimensions that are not faceted/hue are just stacked using
xr.stack
and plotted. Only variables that have exactly the same dimensions are allowed.Regarding implementation -- I am certainly not sure about the API and I probably haven't thought about edge cases with missing data or nans or whatnot, so any input would be welcome. Also, there might be a simpler implementation by first using
to_array
and then using existing line plot functions, but I couldn't find it.