-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support RGB[A] arrays in plot.imshow() #1796
Conversation
DataArray( | ||
easy_array((10, 15, 3), start=0), | ||
dims=['y', 'x', 'band'], | ||
).plot.imshow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to check that the colors have actually rendered correctly by introspecting deeper into the figures / axes that are generated by imshow
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's almost certainly possible - but I have absolutely no idea where to start!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay relying on matplotlib to plot colors properly. One sane option would be to mock plt.imshow
to ensure it gets called properly, but we don't bother with that for our current tests. But that shouldn't get in the way of integration tests that verify that we can actually call matplotlib without any errors being raised.
I like this. My main concern is that I'd like a fully explicit way to opt into this behavior. Perhaps |
I have a pretty strong preference for RGB images 'just working' - requiring a new argument would mean imshow has a different signature to all other plot methods without adding any information. If it's an optional argument, it barely does anything - in the rare case where the heuristic fails, you can supply In many ways showing an RGB image is a special case, but this is the least-special-case I could make work - and it's consistent with the obvious meaning and upstream behavior of |
I should clarify: I don't think we should require the explicit name for the RGB dimension, but I think we should have an argument around so that's an option if desired. This is nice because it gives better error messages when things go wrong and makes it easier to write self-documented code. This would be consistent with the rest of our visualization functions, which provide an option for being fully explicit but don't require it. |
Optional argument coming up then 😄 Anything else I need to do? |
b378a48
to
540300d
Compare
Done - now with optional argument and nicer commit history, Anything else? |
xarray/plot/plot.py
Outdated
robust = False | ||
del flat | ||
# Clip range to [0, 1] to avoid visual artefacts | ||
darray.values[:] = np.clip(darray.values, 0, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you verify that this doesn't modify the DataArray object on which the plot method is being called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I've tossed in a .copy(deep=True)
to guarantee that there's no mutation.
xarray/plot/plot.py
Outdated
# All data will be masked, so skip percentile calculation | ||
vmin, vmax = 0, 1 | ||
if vmin is None: | ||
vmin = np.percentile(flat, ROBUST_PERCENTILE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure that this handling of the robust argument makes sense. Does this resemble what vmin/vmax do by default on imshow?
Parts of this that seem strange to me:
robust=True
does more than merely clipping the data to the robust range, unlike the behavior with colormaps. The data gets extra normalization, even if it all already fell within the central 95%.- all channels are treated together, rather than independently, including the alpha channel!
I'm sure this behavior is useful sometimes, but maybe it's better to leave it out for now. We could make robust=True an error for now, and potentially save color normalization for another keyword argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic makes robust=True
work exactly the same way on single-band and RGB greyscale images. The handling is only distinct because the 'normalise to [0, 1]' step is implicit for single bands but must be explicit for RGB. Scaling each band independently would completely change the colour of the image, which is almost never what you'd want to do.
It does make sense to normalise RGB without considering or changing the alpha channel though; I'll play with this a little more. I can't think of a way to describe this that isn't "here's a weird edge case...", and the implementation would be even worse.
As to usefulness, it's basically essential to sensibly visualise surface reflectance data (ie high-quality satellite images) - they're represented for analysis as albedo, ie fraction of light reflected, and if you display that directly it's a very dark and muddy image (example below; same scene as above). So I'd strongly prefer to keep this in!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still feels strange to me. With colormaps, all robust=True
does is explicitly set vmin and vmax based on the data. But vmin and vmax are entirely ignored here.
What if we interpreted vmin/vmax with rgb plots as setting the bounds for normalization? The defaults would be (0, 1) for floats and (smallest int, largest int) for int dtypes. Then, robust=True could again simply set vmin/vmax based on the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually agree that the current behavior is strange and that your suggestion is an improvement (though I'd use [0, 255] for all ints) - but it's consistent with matplotlib! My proposed plan has some extra steps though:
- Merge this pull. We will need the stretching machinery regardless, unless Xarray drops support for matplotlib<2.2
- Patch imshow doesn't normalize the color range in RGB images matplotlib/matplotlib#9391 upstream. (I'm already working on a patch for this)
- Implement the new matplotlib interface in Xarray. I could do that in this pull, but then if the matplotlib implementation doesn't match we'd have to break API compatibility - I'd really prefer to have a smaller API now and get it right the first time, later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do that in this pull, but then if the matplotlib implementation doesn't match we'd have to break API compatibility - I'd really prefer to have a smaller API now and get it right the first time, later.
I agree. But I'm also concerned about the behavior for robust=True
potentially needing to change to match upstream.
My preference would be to leave out support for robust
with RGB plots until the matplotlib behavior lands. You could write a small helper function for RGB normalization (maybe even it include it in the docs as an example), e.g.,
import xarray.ufuncs as xu
def robust_stretch(data):
vmin, vmax = data.quantile(q=[0.02, 0.98])
bounded = xu.minimum(xu.maximum(data, vmin), vmax)
returned (bounded - vmin) / (vmax - vmin)
Then you could plot things like arr.pipe(robust_stretch).plot.imshow(col='time', col_wrap=5)
.
It's a few more characters but not a terrible work around.
@@ -446,12 +468,13 @@ def newplotfunc(darray, x=None, y=None, figsize=None, size=None, | |||
"Use colors keyword instead.", | |||
DeprecationWarning, stacklevel=3) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add rgb
as an actual argument inline, after x and y?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could, but this would be a breaking change for anyone using positional arguments. I'd therefore prefer not to, as it's just as easy to do the breaking bit later.
Please confirm if you'd like the breaking change; otherwise I'll leave it as-is.
xarray/plot/utils.py
Outdated
""" | ||
if imshow and darray.ndim == 3: | ||
return _infer_xy_labels_3d(darray, x, y, rgb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure this raises an informative error message if RGB is provided and the input has the wrong number of dimensions or isn't being plotted with imshow.
""" | ||
Determine x and y labels for showing RGB images. | ||
|
||
Attempts to infer which dimension is RGB/RGBA by size and order of dims. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than all this complex logic with warnings in ambiguous cases, why not always treat the last and/or remaining (after explicit x/y labels) dimension as RGB? I think that solves the convenience use cases, without hard to understand/predict inference logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
# but doesn't warn if dimensions specified | ||
arr.plot.imshow(rgb='band') | ||
arr.plot.imshow(x='x', y='y') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests for errors related to rgb
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
DataArray( | ||
easy_array((10, 15, 3), start=0), | ||
dims=['y', 'x', 'band'], | ||
).plot.imshow() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay relying on matplotlib to plot colors properly. One sane option would be to mock plt.imshow
to ensure it gets called properly, but we don't bother with that for our current tests. But that shouldn't get in the way of integration tests that verify that we can actually call matplotlib without any errors being raised.
xarray/plot/plot.py
Outdated
In this case, ``robust=True`` will saturate the image in the | ||
usual way, consistenly between all bands and facets. | ||
|
||
This method will clip oversaturated pixels to the valid range, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually strictly better? If so, why didn't it get changed during matplotlib's recent style overhaup? Was it simply overlooked or a more careful decision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it is strictly better, because it avoids the ambiguity a colour could actually represent any value at all modulo the range.
I believe it was overlooked because relatively few people start with crudely-scaled RGB images, though I wasn't involved closely enough to know for sure (or fix it then!). Unfortunately such "scaling" is pretty common to go from albedo to useful dynamic range in imagery, as I mentioned about robust
above. Reference: matplotlib/matplotlib#9391 (and links from that issue).
Thanks for the review @shoyer! I'll do what I can in the next few days, but that might not be much at all before I get back from a no-internet camping trip around Jan 8th. Items:
|
I'll add another todo (can be done in a separate PR if you don't have time ;-): modify the gallery example to plot the RGB image instead of the BW with our rasterio example here. |
OK, @shoyer & @fmaussion - I think I'm done again! The only thing I haven't done is change the handling of Hopefully this covers the substantial changes; but let me know if there's anything else as I'd really like this merged before the 0.10.1 release - and ideally released in time for demos at my summer school in January 😄 |
xarray/plot/plot.py
Outdated
@@ -445,10 +445,38 @@ def newplotfunc(darray, x=None, y=None, figsize=None, size=None, | |||
# Decide on a default for the colorbar before facetgrids | |||
if add_colorbar is None: | |||
add_colorbar = plotfunc.__name__ != 'contour' | |||
imshow_rgb = plotfunc.__name__ == 'imshow' and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: per PEP8, please use parentheses to group this expression instead of an explicit line continuation.
xarray/plot/utils.py
Outdated
assert darray.ndim == 3 | ||
not_none = [a for a in (x, y, rgb) if a is not None] | ||
if len(set(not_none)) < len(not_none): | ||
raise ValueError('Dimensions passed as x, y, and rgb must be unique.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you include the values for these dimensions in the error message?
xarray/plot/plot.py
Outdated
# there isn't one, and set it to transparent where data is masked. | ||
if z.shape[-1] == 3: | ||
z = np.ma.concatenate((z, np.ma.ones(z.shape[:2] + (1,))), 2) | ||
z[np.sum(z.mask, axis=-1, dtype=bool),-1] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're sure this doesn't mutate the inputs, right?
style note: add a space after the comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would also be clearer as with np.any(z.mask, axis=-1)
rather than np.sum
.
xarray/plot/plot.py
Outdated
# All data will be masked, so skip percentile calculation | ||
vmin, vmax = 0, 1 | ||
if vmin is None: | ||
vmin = np.percentile(flat, ROBUST_PERCENTILE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still feels strange to me. With colormaps, all robust=True
does is explicitly set vmin and vmax based on the data. But vmin and vmax are entirely ignored here.
What if we interpreted vmin/vmax with rgb plots as setting the bounds for normalization? The defaults would be (0, 1) for floats and (smallest int, largest int) for int dtypes. Then, robust=True could again simply set vmin/vmax based on the data.
More review fixes 😄 I'm pretty sure the build failure is just Travis being flaky - it passes on my machine. Could someone restart it? |
Just restarted them, let's see how it goes |
Okay... I've now seen three Travis runs. In every one there's been a pass, a fail, and three errors... but different jobs each time. At this point I'm ready to give up and wait for the Travis team to fix it 😕 The code is ready to go though! 🎉 |
@shoyer makes some good points about things that should be fixed upstream in Matplotlib - namely normalization of RGB images, but I'm also going to move my color fixes (clipping instead of modulo) upstream instead. Timeline:
|
@Zac-HD Sounds good. At this point I don't think there are any blockers left for the 0.10.1 release (I wanted to get the pydap fix in, which we recently merged). |
Looks like |
Should be fixed by #1814 which I just merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now. If there are no other comments I will merge tomorrow.
Includes new `rgb` keyword to tell imshow about that dimension, and much error handling in inference.
Thanks @Zac-HD ! |
git diff upstream/master **/*py | flake8 --diff
whats-new.rst
for all changesThis patch brings
xarray.plot.imshow
up to parity withmatplotlib.pyplot.imshow
:Usingrobust=True
for easy saturation is really nice. Having it adjust each channel and facet in the same way is essential for this to work, which it does.Matplotlib wraps out-of-range colors, leading to crazy maps and serious interpretation problems if it's only a small region. Xarray clips (ie saturates) to the valid range instead.I'm going to implement clip-to-range and color normalization upstream in matplotlib, then open a second PR here so that Xarray can use the same interface.
And that's the commit log! It's not really a big feature, but each of the parts can be fiddly so I've broken
the commits up logically 😄
Finally, a motivating example: visible-light Landsat data before, during (top-right), and after a fire at Sampson's Flat, Australia: