How can I create a mask based on distance from a given result? #8604

tomchor · 2024-01-11T17:36:30Z

tomchor
Jan 11, 2024

I'd like to create a mask based on distance from points in given DataArray that have a specific value.

To clarify: consider the example below where I create a random DataArray on an irregularly-spaced x direction and then set all the values between x=40 and x=60 to 0:

import numpy as np
import xarray as xr

da = xr.DataArray(np.random.randn(30), dims=["x"], coords=dict(x=np.logspace(0, 2, 30)))
da.loc[dict(x=slice(40, 60))] = 0

I'd like to create a mask or something to eventually zero out all values at a distance (say 20 units) from any exactly-zero values here. Since I know that zero values appear between [40, 60] I know I can just set everything to zero between [20, 80] in this particular MWE.

But assuming I don't know where the zero values are, what's a good way to do this?

The way I've been doing this is by doing a convolution kernel with the distance I want (re-utilizing the code from here) but as the directions get larger, the convolution starts getting really expensive. I'm positive there's a better way to do this but I haven't been able to find it 😬.

Thanks in advance!

PS: this needs to work in n-dimentional DataArrays! So in a way I guess my MWE is kinda misleading since this is a much easier problem in 1D...

Here's an equivalent MWE but in 2D that's more similar to my actual situation:

import numpy as np
import xarray as xr

x = np.logspace(0, 2, 30)
y = np.linspace(0, 100, 30)
da = xr.DataArray(np.random.randn(30, 30), dims=["x", "y"], coords=dict(x=x, y=y))
da.loc[dict(x=slice(60, None), y=slice(40, 60))] = 0

Ockenfuss · 2024-01-16T14:00:33Z

Ockenfuss
Jan 16, 2024

Hello @tomchor ,

unfortunately, I don`t quite understand your example: You draw values from a normal distribution, therefore you will never get exactly zero?

Let`s assume you want to set all regions around sufficiently close-to-zero values to zero as well. (Do I understand this correctly?)
You can do this by first extracting all of those near-zero values, setting the rest of the array to nan:

import numpy as np
import xarray as xr
np.random.seed(123)
x = np.logspace(0, 2, 30)
y = np.linspace(0, 100, 30)
da = xr.DataArray(np.random.randn(30, 30), dims=["x", "y"], coords=dict(x=x, y=y))

da_near_zero=da.where(abs(da)<0.01)
da_zero=da_near_zero*0
da_zero.plot()

If the region size is just a fixed number of grid cells, you could probably use a rolling maximum to expand the non-nan regions in da_zero at this point.

However, if you have a irregular grid and the region size is defined based on coordinates, things are more difficult. The interpolate_na() function in xarray has a limit parameter. A few weeks ago, I opened a pull request which would allow to specify the limit in coordinate units as well (#8577). If this will be reviewed and merged at some point, you might use this functionality to ffill/bfill the zero values 20 units in every direction. Right now, you can try to have a look at one of the functions I wrote in this PR: _get_gap_masks. It is available if you check out the corresponding branch. Then, your problem could be solved like this:

from xarray.core.missing import _get_gap_masks
#%% X direction
limit_mask,_,_=_get_gap_masks(da_zero, 'x', limit=20, limit_use_coordinate=True)
zero_exanded=xr.zeros_like(da).where(limit_mask)
zero_exanded.plot()
# %% Y direction
limit_mask,_,_=_get_gap_masks(zero_exanded, 'y', limit=20, limit_use_coordinate=True)
zero_exanded=xr.zeros_like(da).where(limit_mask)
zero_exanded.plot()
# %%
result=xr.where(zero_exanded==0, 0, da)
result.plot()

Best,
Paul

0 replies

scottyhq · 2024-01-16T16:51:09Z

scottyhq
Jan 16, 2024
Maintainer

Thanks @tomchor for the interesting problem. Agreed, would be great to clarify if you 1. have a single continuous 'zero' region and 2. have irregular coordinate spacing in different dimensions and 3. do in face have >2D datasets

mask or something to eventually zero out all values at a distance (say 20 units) from any exactly-zero values here

Just want to add that this sounds like a 'dilation' or 'buffer' methods for vector geometries which take into consideration CRS and therefore distance, but as far as I know only work for the 2D case. Scipy also has https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.binary_dilation.html but not sure about making this aware of irregular grid spacing (perhaps what @Ockenfuss has implemented is the way forward for something like this!)

1 reply

tomchor Jan 17, 2024
Author

Interesting references! I'll look them up. Here are the clarifications:

Yes, I have a single continuous exactly-zero region. (Well, exact to machine precision...)
Yes, exactly one of the dimensions has irregular spacing.
Yes, I do in fact have >2D datasets. In particular I have both 2D and 3D datases. Although the 2D datasets are the most important, so I'd be okay with a solution that only works for 2D.

tomchor · 2024-01-17T02:54:13Z

tomchor
Jan 17, 2024
Author

I actually came up with an algorithm that does what I described in 2D but unfortunately it doesn't generalize super well (I think). Basically it identifies the borders in a given dimension (by diffing), gets their location, creates a DataArray of distances from the border, and the rest is easy.

Here's a snippet of my code (the snippet by itself is non-working). The DataArray ds.land_mask is pretty much what it sounds: it has values that are 1 for the land and 0 for the water and that's it. xC and yC are the x and y coordinates.

x_boundary_locations = ds.xC.where(ds.land_mask.astype(float).diff("xC")).max("xC")
x_boundary_locations = x_boundary_locations.where(np.isfinite(x_boundary_locations), other=400)
x_squared_distances = (ds.xC - x_boundary_locations)**2
squared_distances.append(x_squared_distances)

y_boundary_locations_north = ds.yC.where(ds.land_mask.astype(float).diff("yC")).max("yC")
y_boundary_locations_north = y_boundary_locations_north.where(np.isfinite(y_boundary_locations_north), other=0)

y_boundary_locations_south = ds.yC.where(ds.land_mask.astype(float).diff("yC")).min("yC")
y_boundary_locations_south = y_boundary_locations_south.where(np.isfinite(y_boundary_locations_south), other=0)

y_squared_distances = np.sqrt(xr.concat([(ds.yC - y_boundary_locations_south)**2, (ds.yC - y_boundary_locations_north)**2], dim="aux").min("aux"))
squared_distances.append(y_squared_distances)

ds["distance_from_boundary"] = np.sqrt(xr.concat(squared_distances, dim="aux").sum("aux")).where(ds.water_mask, other=0)
ds["distance_from_boundary"] = xr.concat([np.sqrt(x_squared_distances), ds.distance_from_boundary], dim="aux").min("aux").where(ds.water_mask, other=0)

it seems to work okay. Here's my region expanding as you go left (I also included the distance from the left boundary in the expansion):

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I create a mask based on distance from a given result? #8604

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How can I create a mask based on distance from a given result? #8604

tomchor Jan 11, 2024

Replies: 3 comments · 1 reply

Ockenfuss Jan 16, 2024

scottyhq Jan 16, 2024 Maintainer

tomchor Jan 17, 2024 Author

tomchor Jan 17, 2024 Author

tomchor
Jan 11, 2024

Replies: 3 comments 1 reply

Ockenfuss
Jan 16, 2024

scottyhq
Jan 16, 2024
Maintainer

tomchor Jan 17, 2024
Author

tomchor
Jan 17, 2024
Author