-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalisation for RGB imshow #1819
Conversation
CC @shoyer - hopefully I'm in time for Xarray 0.10.1 😉 |
xarray/plot/plot.py
Outdated
q=[ROBUST_PERCENTILE, 1 - ROBUST_PERCENTILE]) | ||
vmin = vmin if vmin is not None else vmin_tmp | ||
vmax = vmax if vmax is not None else vmax_tmp | ||
del vmin_tmp, vmax__tmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo. vmax_tmp
xarray/plot/plot.py
Outdated
if vmin is not None or vmax is not None: | ||
vmin = vmin if vmin is not None else darray.min() | ||
vmax = vmax if vmax is not None else darray.max() | ||
darray = (darray.astype(np.float32) - vmin) / (vmax - vmin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ran into dynamic range issues w/ uint32
being converted to float32
, particularly as some people use an extreme high value as a bad-data flag. matplotlib/matplotlib#10133
xarray/plot/plot.py
Outdated
@@ -625,6 +641,10 @@ def imshow(x, y, z, ax, **kwargs): | |||
dimension can be interpreted as RGB or RGBA color channels and | |||
allows this dimension to be specified via the kwarg ``rgb=``. | |||
|
|||
Unlike matplotlib, Xarray can apply ``vmin`` and ``vmax`` to RGB or RGBA | |||
data, by applying a single scaling factor and offset to all bands. | |||
Passing ``robust=True`` infers ``vmin`` and ``vmax`` in the usual way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... infers vmin and vmax from the 96-th percentile of the data distribution. (Or reference the “usual way” somehow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for this. Can you add an additional note to the whats new page in the docs?
xarray/plot/plot.py
Outdated
vmin = vmin if vmin is not None else vmin_tmp | ||
vmax = vmax if vmax is not None else vmax_tmp | ||
del vmin_tmp, vmax__tmp | ||
robust=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 nits: 1) let garbage collection handle deleting your temp variables, 2) space in robust = False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we use locals()
somewhere below, which is why this is necessary (but yes it's pretty ugly)
xarray/tests/test_plot.py
Outdated
# Test for crash when normalising RGB data. (further tests require a | ||
# reference image, as the args used to be silently ignored instead) | ||
da = easy_array((5, 5, 3), start=-0.6, stop=1.4) | ||
da.plot.imshow(**kwds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can test that the range of the values in this plot are correct.
xarray/plot/plot.py
Outdated
vmin_tmp, vmax_tmp = darray.quantile( | ||
q=[ROBUST_PERCENTILE, 1 - ROBUST_PERCENTILE]) | ||
vmin = vmin if vmin is not None else vmin_tmp | ||
vmax = vmax if vmax is not None else vmax_tmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should skip the quantile calculation when both vmin/vmax are set. Then again, if robust is True, the behavior is poorly defined and maybe an error would be better.
007ebff
to
4d3a1b7
Compare
xarray/plot/plot.py
Outdated
vmin, vmax = None, None | ||
# There's a cyclic dependency via DataArray, so we can't | ||
# import xarray.ufuncs in global or outer scope. | ||
import xarray.ufuncs as xu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be able to fix this.
xarray/plot/plot.py
Outdated
if vmin is not None or vmax is not None: | ||
vmin = vmin if vmin is not None else darray.min() | ||
vmax = vmax if vmax is not None else darray.max() | ||
darray = darray.astype('f4' if vmax - vmin < 2 ** 32 else 'f8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use np.finfo('f4').max
here instead of 2**32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was uint32 that gave us trouble on Matplotlib. i.e. way less than f4.max (I think?). See matplotlib/matplotlib#10133. I was too lazy to figure out the actual limit (I guess its 2^N, where N is how many bits a float has dynamic range over. I think its 32-1-8 = 2**23?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you can use np.iinfo
instead:
Out[12]: iinfo(min=0, max=4294967295, dtype=uint32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but thats not the problem. The problem is that if you convert (a-vmin)/(1e9) then you can't tell the difference between a=1 and a=2 anymore if you only use float32.
This came up in Matplotlib because the conversion to float happened with an automatic vmin and vmax, and the user had a large value as a bad-data flag. Then the user subsequently set vmin - vmax = 10 or so, but the convert to float32 had erased all the dynamic between 0 and 10.
Maybe xarray
doesn't have this problem because you trust the user to set vmin
and vmax
during the imshow
call. But if someone grabs the image handle and tries to set the limits after the call they could have this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more deeply I analyse matplotlib/matplotlib#10133, the more suspicious I get - there are actually two places you could get loss of precision: subtracting vmin
(if <<0, loss of precision for high values), and dividing by (vmin - vmax)
(loss if vmax is large).
I think for Xarray, the best option is actually just to use np.float64
in all cases. We already have a habit of upcasting all the time anyway*, and if that's lossy you're in trouble anyway.
* (including uint16
to float64
when decoding variables from a file, which is silly - issue incoming for that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vmin and A are still ints when vmin is removed from A.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case you are vulnerable to underflow, and hence wraparound of values - that's why I cast to float64 here first!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow.
The case we were running into was
A = np.array( [[1, 2, 1e9], [3, 4, 5]], dtype=np.uint32)
im = plt.imshow(A)
im.set_clim([0., 5.])
When the data in the image is first input, vmin = 1
and vmax = 1e9
, and the underflow occurred when we cast to np.float32
, which we did to help accommodate very large images. Hence the check on vmax - vmin
to see if it is larger than 1e8 and to use np.float64
for the case instead.
Casting to float64
first is great but apparently people sometimes have huge images, hence the extra rigamarole w/ using float32
if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vmin and A are still ints when vmin is removed from A.
Going and reading the code again, it's not - a_min is subtracted from A_scaled, which has a floating dtype. In which case you're also safe from overflow/underflow, and have just added handling for loss of precision 😄
I remain happy with this solution - for Xarray, including passing non-RGB images directly to matplotlib - and suggest we take any further discussion to the upstream pull.
@jhamman - all done 😄 |
xarray/plot/plot.py
Outdated
# There's a cyclic dependency via DataArray, so we can't | ||
# import xarray.ufuncs in global or outer scope. | ||
import xarray.ufuncs as xu | ||
darray = xu.minimum(xu.maximum(darray.astype('f4'), 0), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to type cast this back to a 4byte float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer - do you have a solution off had for the cyclic import problem here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to downcast, in that it will work even if we don't, but downcasting ~halves memory use further inside matplotlib which can be a 5-10x multiple of the input data size. Happy to remove if you don't think the performance hit is worth it.
The best alternative solution I can see is refactoring xarray.ufuncs
to do all the imports inside a function or class, and pay the performance cost when creating ufuncs and slightly increased overhead in dispatch. Personally, I prefer a slight slowdown when creating scaled RGB images - much less common, and it's probably negligible given the array operations involved anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with downcasting, let's just add a comment to explain why.
I don't think we can avoid the cycle import issue. This is a downside of our method heavy design. However, it would be nice if we moved the import statement to the top of the method.
xarray/plot/plot.py
Outdated
# Scale interval [vmin .. vmax] to [0 .. 1] and clip to bounds | ||
if vmin is not None or vmax is not None: | ||
vmin = vmin if vmin is not None else darray.min() | ||
vmax = vmax if vmax is not None else darray.max() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the default here for vmin/vmax be 0/1 rather than the min/max values of the array? That would seem a little less surprising to me. Otherwise setting one of vmin
/vmax
would cause the other to be automatically set which seems strange.
xarray/plot/plot.py
Outdated
# There's a cyclic dependency via DataArray, so we can't | ||
# import xarray.ufuncs in global or outer scope. | ||
import xarray.ufuncs as xu | ||
darray = xu.minimum(xu.maximum(darray.astype('f4'), 0), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with downcasting, let's just add a comment to explain why.
I don't think we can avoid the cycle import issue. This is a downside of our method heavy design. However, it would be nice if we moved the import statement to the top of the method.
xarray/plot/plot.py
Outdated
@@ -449,10 +449,28 @@ def newplotfunc(darray, x=None, y=None, figsize=None, size=None, | |||
if imshow_rgb: | |||
# Don't add a colorbar when showing an image with explicit colors | |||
add_colorbar = False | |||
# Calculate vmin and vmax automatically for `robust=True` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this logic to a separate helper function, e.g., _rescale_imshow_rgb(darray, vmin, vmax, robust)
? This method is already getting pretty long.
Method extracted, default for missing bounds fixed, comment added, import moved. Also added a check that you haven't accidentally reversed the bounds while supplying only one. |
elif vmax is None: | ||
vmax = 255 if np.issubdtype(darray.dtype, np.integer) else 1 | ||
if vmax < vmin: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These error checks look great, thanks! Can you add a test that covers them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. @jhamman any further concerns?
Ping @jhamman? |
In it goes. Thanks @Zac-HD. |
Follow-up to #1796, where normalisation and clipping of RGB[A] values were deferred so that we could match any upstream API. matplotlib/matplotlib#10220 implements clipping to the valid range, but a strong consensus against RGB normalisation in matplotlib has emerged.
This pull therefore implements normalisation, and clips values only where our normalisation has pushed them out of range.