-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new odc.mask
and odc.crop
methods for masking and cropping xarray
data by geometries
#114
Conversation
odc.crop
method for cropping and masking xarray
data to geometries
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #114 +/- ##
===========================================
- Coverage 98.12% 97.91% -0.22%
===========================================
Files 25 25
Lines 4224 4315 +91
===========================================
+ Hits 4145 4225 +80
- Misses 79 90 +11 ☔ View full report in Codecov by Sentry. |
This is great, thanks Robbi. Can you please add this method to docs, here: Lines 39 to 52 in 6ce7524
|
odc/geo/_xr_interop.py
Outdated
poly = poly.to_crs(crs=xx.odc.crs) | ||
bbox = poly.boundingbox | ||
|
||
# Verify that `poly` overlaps with `xx` extent | ||
if poly.intersects(xx.odc.geobox.extent): | ||
# Crop `xx` to the bounding box of `poly` | ||
ydim, xdim = spatial_dims(xx) | ||
xx_cropped = xx.sel( | ||
{xdim: slice(bbox.left, bbox.right), ydim: slice(bbox.top, bbox.bottom)} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is best covered by GeoBox.enclosing
followed by geobox intersect, this should give you an roi of the original image overlapping with the geometry, which you can then use with .isel
to select a sub-region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact we should overload overlap_roi
to allow geometry on input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this?
xx.odc.geobox.overlap_roi(xx.odc.geobox.enclosing(poly))
>>> (slice(118, 285, None), slice(132, 276, None))
The output tuple doesn't include coordinate names, and I get ValueError: Unsupported key-type <class 'tuple'>
if I try xx[roi]
. Do I still need to grab coord names with spatial_dims
or is there any way to pass an ROI to isel
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, but would be nice to avoid the spatial_dims
step:
roi = xx.odc.geobox.overlap_roi(xx.odc.geobox.enclosing(poly))
ydim, xdim = spatial_dims(xx)
xx_cropped = xx.isel({ydim: roi[0], xdim: roi[1]})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of dimension names there is .odc.ydim
which tells you which tells you where y,x
dims are, always next to each other, from that one can construct roi
object that slices to all dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably common enough functionality that needs to be exposed on .odc
accessor
- Input: 2d slice into spatial dimensions
- Output:
- Nd slice into data variable(s) with extra dimensions filled with
:
slices - New Dataset/DataArray sliced into geo dims
- Nd slice into data variable(s) with extra dimensions filled with
So np.s_[100:200, :30] -> np.s_[:, 100:200, :30]
in the common case, but for rgb data it would be -> np.s_[:, 100:200, :30, :]
.
odc/geo/_xr_interop.py
Outdated
# Verify that `poly` overlaps with `xx` extent | ||
if poly.intersects(xx.odc.geobox.extent): | ||
# Crop `xx` to the bounding box of `poly` | ||
ydim, xdim = spatial_dims(xx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy complains here about not handling the case when spatial_dims returns None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you recommend for that case? Just raise an exception? I 'm trying to think of a scenario where we'd expect this functionality to work with undefined spatial dims...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert is enough for mypy usualy, like
sdims = spatial_dims(xx)
assert sdims
ydim, xdim = sdims
but here we should probably use ydim
property that returns an integer index of y
dimension, instead of dimension names.
I think it might be better to expose 2 methods:
@robbibt also when happy with changes please rebase it all to a single commit:
|
I agree, both a |
odc.crop
method for cropping and masking xarray
data to geometriesodc.mask
and odc.crop
methods for masking and cropping xarray
data by geometries
) | ||
|
||
# Mask data outside rasterized `poly` | ||
xx_masked = xx.where(rasterized.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, xx.where
in that form always changes output to float{64|32}+nan
representation. Which is a valid and often desired output format, but not always. One might also expect to retain existing pixel data type with nodata
on output.
We should certainly support both. But it's less clear to me which one should be a default
- Personally I would expect to get
int16, nodata=-9999
on output if that was given on input, but I can see howfloat32, nan
and evenfloat64, nan
can be also desired. - But I can also see that some would expect and prefer to see
float32, nan
output formask
operation.
In the code as it stands we should at least ensure that nodata
attribute is not present on xx_masked
output array, I still don't have any intuition for xarray
attributes, when they stay and when they go...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this occurred to me too - I definitely think both options should be supported. My take would be that "most" of our more general users simply want a way to quickly set pixels outside of a polygon to NaN
- which is a very common workflow for things like pixel drills or zonal statistics. Getting back int16, nodata=-9999
is definitely useful but perhaps for a smaller number of more advanced users/workflows. I would be in favour of the simpler NaN
option being the default.
The approach taken in dea_tools.datahandling.load_ard
seems like it could be a good template: https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Tools/dea_tools/datahandling.py#L214-L221
dtype="auto"
: When 'auto' is used, the data will be converted tofloat32
if masking is used, otherwise data will be returned in the native data type of the data.dtype="native"
: Data will be returned in the native data type of the data, including native nodata valuedtype='float{16|32|64}'
: Custom dtype
I guess given mask
always applies a mask, we probably don't need "auto"... so a default of "float32" with an option for "native" could work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the moment, I've added some code to strip nodata attributes:
https://github.com/opendatacube/odc-geo/blob/crop/odc/geo/_xr_interop.py#L291-L296
odc/geo/_xr_interop.py
Outdated
# Reproject `poly` to match `xx` | ||
poly = poly.to_crs(crs=xx.odc.crs) | ||
|
||
# Verify that `poly` overlaps with `xx` extent | ||
if not poly.intersects(xx.odc.geobox.extent): | ||
raise ValueError( | ||
"The supplied `poly` must overlap spatially with the extent of `xx`." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same story as before: it's dangerous to perform coordinate transform, if it can be avoided by pushing into more basic operations like .enclosing
, then it should be. You'd then check if overlap_roi returns a null set and raise error then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, have revised to remove the coordinate transform, and simply run enclosing
> overlap_roi
then verify the ROI isn't empty using roi_is_empty
:
https://github.com/opendatacube/odc-geo/blob/crop/odc/geo/_xr_interop.py#L330-L341
@@ -657,6 +748,13 @@ def geobox(self) -> Optional[SomeGeoBox]: | |||
"""Query :py:class:`~odc.geo.geobox.GeoBox` or :py:class:`~odc.geo.gcp.GCPGeoBox`.""" | |||
return self._state.geobox | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where did this come from? bad merge or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think I accidently went one commit to far when running git reset --soft HEAD~<number of commits to merge into one>
. Although it confuses me that the same code is in the crop
branch and develop
but is still being detected as a diff...
https://github.com/opendatacube/odc-geo/blob/crop/odc/geo/_xr_interop.py#L768-L773
https://github.com/opendatacube/odc-geo/blob/develop/odc/geo/_xr_interop.py#L660-L665
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robbibt thanks for this, I think we should merge this as is, we can do further interface polishing, before next release.
Thanks @Kirill888 - should I combine the commits into one again? And what merge option do you use here? |
I usually go with Rebase and merge for cleaner history |
This PR introduces a new
.odc.crop
method that supports:xarray.Dataset
orxr.DataArray
to the spatial extent of a datacubeGeometry
polygonThis will be very useful for pixel drill workflows that regularly involve a sequence of loading data, cropping to a certain extent using
.sel()
, then rasterizing a polygon and applying it as a mask.I've tested that this works on both datasets and data arrays, and have included a check that the polygon overlaps with the geobox extent. Would welcome any other feedback to make this more robust!