Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new odc.mask and odc.crop methods for masking and cropping xarray data by geometries #114

Merged
merged 3 commits into from
Jan 2, 2024

Conversation

robbibt
Copy link
Contributor

@robbibt robbibt commented Dec 17, 2023

This PR introduces a new .odc.crop method that supports:

  • Clipping an xarray.Dataset or xr.DataArray to the spatial extent of a datacube Geometry polygon
  • Optionally apply the polygon as a mask by rasterizing it to the new geobox

image

This will be very useful for pixel drill workflows that regularly involve a sequence of loading data, cropping to a certain extent using .sel(), then rasterizing a polygon and applying it as a mask.

I've tested that this works on both datasets and data arrays, and have included a check that the polygon overlaps with the geobox extent. Would welcome any other feedback to make this more robust!

@robbibt robbibt changed the title Add new crop method for cropping xr data to geometries Add new odc.crop method for cropping and masking xarray data to geometries Dec 17, 2023
@robbibt robbibt requested a review from Kirill888 December 17, 2023 23:06
Copy link

codecov bot commented Dec 17, 2023

Codecov Report

Attention: 11 lines in your changes are missing coverage. Please review.

Comparison is base (eb12c48) 98.12% compared to head (40330a7) 97.91%.
Report is 12 commits behind head on develop.

Files Patch % Lines
odc/geo/_xr_interop.py 88.37% 5 Missing ⚠️
odc/geo/overlap.py 33.33% 4 Missing ⚠️
odc/geo/geobox.py 95.12% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #114      +/-   ##
===========================================
- Coverage    98.12%   97.91%   -0.22%     
===========================================
  Files           25       25              
  Lines         4224     4315      +91     
===========================================
+ Hits          4145     4225      +80     
- Misses          79       90      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Dec 17, 2023

@github-actions github-actions bot temporarily deployed to pull request December 17, 2023 23:07 Inactive
@Kirill888
Copy link
Member

Kirill888 commented Dec 17, 2023

This is great, thanks Robbi.

Can you please add this method to docs, here:

odc-geo/docs/api.rst

Lines 39 to 52 in 6ce7524

assign_crs
rasterize
spatial_dims
wrap_xr
xr_coords
xr_reproject
xr_zeros
colorize
to_rgba
add_to
rio_reproject
to_cog
write_cog
compress

Comment on lines 273 to 319
poly = poly.to_crs(crs=xx.odc.crs)
bbox = poly.boundingbox

# Verify that `poly` overlaps with `xx` extent
if poly.intersects(xx.odc.geobox.extent):
# Crop `xx` to the bounding box of `poly`
ydim, xdim = spatial_dims(xx)
xx_cropped = xx.sel(
{xdim: slice(bbox.left, bbox.right), ydim: slice(bbox.top, bbox.bottom)}
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is best covered by GeoBox.enclosing followed by geobox intersect, this should give you an roi of the original image overlapping with the geometry, which you can then use with .isel to select a sub-region.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combination of these two should give you a slice into original image covering given geometry:

def enclosing(self, region: Union[Geometry, BoundingBox]) -> "GeoBox":

def overlap_roi(self, other: "GeoBox", tol: float = 1e-8) -> NormalizedROI:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact we should overload overlap_roi to allow geometry on input.

Copy link
Contributor Author

@robbibt robbibt Dec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this?

xx.odc.geobox.overlap_roi(xx.odc.geobox.enclosing(poly))
>>> (slice(118, 285, None), slice(132, 276, None))

The output tuple doesn't include coordinate names, and I get ValueError: Unsupported key-type <class 'tuple'> if I try xx[roi]. Do I still need to grab coord names with spatial_dims or is there any way to pass an ROI to isel directly?

Copy link
Contributor Author

@robbibt robbibt Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but would be nice to avoid the spatial_dims step:

roi = xx.odc.geobox.overlap_roi(xx.odc.geobox.enclosing(poly))
ydim, xdim = spatial_dims(xx)
xx_cropped = xx.isel({ydim: roi[0], xdim: roi[1]})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... the problem is that you can only use an ROI to index a xr.DataArray, not an xr.Dataset...

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of dimension names there is .odc.ydim which tells you which tells you where y,x dims are, always next to each other, from that one can construct roi object that slices to all dimensions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably common enough functionality that needs to be exposed on .odc accessor

  • Input: 2d slice into spatial dimensions
  • Output:
    • Nd slice into data variable(s) with extra dimensions filled with : slices
    • New Dataset/DataArray sliced into geo dims

So np.s_[100:200, :30] -> np.s_[:, 100:200, :30] in the common case, but for rgb data it would be -> np.s_[:, 100:200, :30, :].

odc/geo/_xr_interop.py Outdated Show resolved Hide resolved
# Verify that `poly` overlaps with `xx` extent
if poly.intersects(xx.odc.geobox.extent):
# Crop `xx` to the bounding box of `poly`
ydim, xdim = spatial_dims(xx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy complains here about not handling the case when spatial_dims returns None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you recommend for that case? Just raise an exception? I 'm trying to think of a scenario where we'd expect this functionality to work with undefined spatial dims...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert is enough for mypy usualy, like

sdims = spatial_dims(xx)
assert sdims
ydim, xdim = sdims

but here we should probably use ydim property that returns an integer index of y dimension, instead of dimension names.

@github-actions github-actions bot temporarily deployed to pull request December 17, 2023 23:55 Inactive
@github-actions github-actions bot temporarily deployed to pull request December 18, 2023 00:21 Inactive
@Kirill888
Copy link
Member

I think it might be better to expose 2 methods:

  • .mask(geom, ...options) -> xr.DataArray | xr.Dataset returning masked result in the source geobox
  • .crop(geom, ...options) -> xr.DataArray | xr.Dataset returning cropped and possibly masked result

@robbibt also when happy with changes please rebase it all to a single commit:

  1. git reset --soft HEAD~<number of commits to merge into one>
  2. commit everything again as a single change this time
  3. push to github: git push --force-with-lease

@robbibt
Copy link
Contributor Author

robbibt commented Dec 19, 2023

I think it might be better to expose 2 methods:

  • .mask(geom, ...options) -> xr.DataArray | xr.Dataset returning masked result in the source geobox
  • .crop(geom, ...options) -> xr.DataArray | xr.Dataset returning cropped and possibly masked result

@robbibt also when happy with changes please rebase it all to a single commit:

  1. git reset --soft HEAD~<number of commits to merge into one>
  2. commit everything again as a single change this time
  3. push to github: git push --force-with-lease

I agree, both a mask and crop method would be useful. I'll have another shot soon and will rebase everything after.

@robbibt robbibt changed the title Add new odc.crop method for cropping and masking xarray data to geometries Add new odc.mask and odc.crop methods for masking and cropping xarray data by geometries Dec 28, 2023
@github-actions github-actions bot temporarily deployed to pull request December 28, 2023 05:58 Inactive
odc/geo/_xr_interop.py Outdated Show resolved Hide resolved
)

# Mask data outside rasterized `poly`
xx_masked = xx.where(rasterized.data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, xx.where in that form always changes output to float{64|32}+nan representation. Which is a valid and often desired output format, but not always. One might also expect to retain existing pixel data type with nodata on output.

We should certainly support both. But it's less clear to me which one should be a default

  • Personally I would expect to get int16, nodata=-9999 on output if that was given on input, but I can see how float32, nan and even float64, nan can be also desired.
  • But I can also see that some would expect and prefer to see float32, nan output for mask operation.

In the code as it stands we should at least ensure that nodata attribute is not present on xx_masked output array, I still don't have any intuition for xarray attributes, when they stay and when they go...

Copy link
Contributor Author

@robbibt robbibt Dec 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this occurred to me too - I definitely think both options should be supported. My take would be that "most" of our more general users simply want a way to quickly set pixels outside of a polygon to NaN - which is a very common workflow for things like pixel drills or zonal statistics. Getting back int16, nodata=-9999 is definitely useful but perhaps for a smaller number of more advanced users/workflows. I would be in favour of the simpler NaN option being the default.

The approach taken in dea_tools.datahandling.load_ard seems like it could be a good template: https://github.com/GeoscienceAustralia/dea-notebooks/blob/develop/Tools/dea_tools/datahandling.py#L214-L221

  • dtype="auto": When 'auto' is used, the data will be converted to float32 if masking is used, otherwise data will be returned in the native data type of the data.
  • dtype="native": Data will be returned in the native data type of the data, including native nodata value
  • dtype='float{16|32|64}': Custom dtype

I guess given mask always applies a mask, we probably don't need "auto"... so a default of "float32" with an option for "native" could work.

Copy link
Contributor Author

@robbibt robbibt Dec 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the moment, I've added some code to strip nodata attributes:
https://github.com/opendatacube/odc-geo/blob/crop/odc/geo/_xr_interop.py#L291-L296

odc/geo/_xr_interop.py Outdated Show resolved Hide resolved
Comment on lines 312 to 319
# Reproject `poly` to match `xx`
poly = poly.to_crs(crs=xx.odc.crs)

# Verify that `poly` overlaps with `xx` extent
if not poly.intersects(xx.odc.geobox.extent):
raise ValueError(
"The supplied `poly` must overlap spatially with the extent of `xx`."
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story as before: it's dangerous to perform coordinate transform, if it can be avoided by pushing into more basic operations like .enclosing, then it should be. You'd then check if overlap_roi returns a null set and raise error then.

Copy link
Contributor Author

@robbibt robbibt Dec 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, have revised to remove the coordinate transform, and simply run enclosing > overlap_roi then verify the ROI isn't empty using roi_is_empty:

https://github.com/opendatacube/odc-geo/blob/crop/odc/geo/_xr_interop.py#L330-L341

@@ -657,6 +748,13 @@ def geobox(self) -> Optional[SomeGeoBox]:
"""Query :py:class:`~odc.geo.geobox.GeoBox` or :py:class:`~odc.geo.gcp.GCPGeoBox`."""
return self._state.geobox

@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did this come from? bad merge or something?

Copy link
Contributor Author

@robbibt robbibt Dec 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I accidently went one commit to far when running git reset --soft HEAD~<number of commits to merge into one>. Although it confuses me that the same code is in the crop branch and develop but is still being detected as a diff...
https://github.com/opendatacube/odc-geo/blob/crop/odc/geo/_xr_interop.py#L768-L773
https://github.com/opendatacube/odc-geo/blob/develop/odc/geo/_xr_interop.py#L660-L665

@github-actions github-actions bot temporarily deployed to pull request December 30, 2023 03:03 Inactive
@github-actions github-actions bot temporarily deployed to pull request December 30, 2023 04:31 Inactive
Copy link
Member

@Kirill888 Kirill888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robbibt thanks for this, I think we should merge this as is, we can do further interface polishing, before next release.

@robbibt
Copy link
Contributor Author

robbibt commented Jan 1, 2024

@robbibt thanks for this, I think we should merge this as is, we can do further interface polishing, before next release.

Thanks @Kirill888 - should I combine the commits into one again? And what merge option do you use here?

image

@Kirill888
Copy link
Member

I usually go with Rebase and merge for cleaner history

@robbibt robbibt merged commit a9fbfc5 into develop Jan 2, 2024
20 checks passed
@Kirill888 Kirill888 deleted the crop branch May 7, 2024 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants