Question regarding hazard data subsetting given asset's lat/lon pairs #263

emileten · 2024-04-04T08:06:00Z

This is a question about how the code works, not a feature request or a bug report. Let me know if there is a better place for this (I didn't find a discussion section in the repo).

I've been diving in physrisk yesterday. I am trying to understand how the logic of selecting pixels in the hazard data for given assets locations provided by the client works. In other words, I am trying to find where the mapping : assets lat/lons -> hazard data pixels lives and understand it.

I believe I've identified the location : in zarr_reader, specifically in the get_curves method. The precise line of code where the spatial subsetting happens seems to be this one where we make use of the zarr-py get_coordinate_selection method. We pass index values (iz,iy,ix) corresponding to the pixels we're selecting in the data.

So, the mapping logic must happen right before. I am not able though to understand how the mapping works. Where exactly are the lat/lons mapped to the zarr indices and how does that logic work in get_curves ? I first thought it's in the call to the _get_coordinates method, but this one doesn't access the zarr store at all.

Could you shed a little bit of light on this ? Note that I may be looking at the wrong place entirely, let me know if so !

Thank you.

cc @geohacker

The text was updated successfully, but these errors were encountered:

tomalrussell · 2024-04-04T09:28:09Z

Hi @emileten - I was curious about this recently too (but haven't worked directly on physrisk so Joe or colleagues might want to jump in).

The key metadata is pulled from the store a few lines earlier in get_curves:

physrisk/src/physrisk/data/zarr_reader.py

Lines 96 to 99 in 0feb814

    
           # OSC-specific attributes contain transform and return periods 
        
           t = z.attrs["transform_mat3x3"]  # type: ignore 
        
           transform = Affine(t[0], t[1], t[2], t[3], t[4], t[5]) 
        
           crs = z.attrs.get("crs", "epsg:4326")

the Affine transform is a 3x3 matrix that transforms zarr indices to coordinates. _get_coordinates finds the inverse and uses it to transform the coordinates to zarr indices. #L348-L350
the crs defaults to EPSG:4326 latitude/longitude, but could be another coordinate reference system. _get_coordinates transforms the lat/lons (for example into eastings/northings in a Molleweide projection) if necessary. #L342-L346

emileten · 2024-04-04T11:04:07Z

Thanks @tomalrussell that's very helpful !

So, if I am understanding correctly, we're not storing the coordinate labels in the zarr store, but only this transform_mat3x3 matrix that is used to convert indices to coordinate labels.

My deeper question sort of remains. What actually happens if the input lat/lon wasn't in the hazard zarr store in the first place (i.e there is no exact match) ? I would have expected the behavior of the code to be equivalent to a 'nearest neighbor match' -- i.e, we pick the zarr coordinate that's the closest to the provided lat/lon.

The answer to my question, I guess, is in how this matrix is defined?

Edit: or rather, is this the role of the interpolation method ?

joemoorhouse · 2024-04-04T14:45:43Z

Hi @emileten and thanks for the help @tomalrussell!

The affine matrix lets us transform from the geographical space e.g. latitudes and longitudes to pixel space directly, so no labels needed. In general the result of the transform is fractional pixel indices so then we can decide whether to calculate the point by bilinear interpolation or take the containing pixel or other variations.

See, e.g.
https://rasterio.readthedocs.io/en/stable/topics/transforms.html

Geotiffs support affine matrices so nothing new under the sun here actually (i.e. it is pretty widely adopted). NetCDF and XArrays do have coordinate labels of course. In the hazard repo we have an OscZarr class that reads the Zarr arrays as Xarrays (e.g. for generating map tiles) - that is, we can convert between representations as required; coordinate labels can be inferred from affine matrix and array dimensions.

Xarray has its own format for saving arrays in Zarr format but there we end up with separate arrays for the coordinates. Downsides of this are performance (need always to load up coordinates before doing anything and cache that in certain circumstances) and having a nested structure. Hence we went more the Geotiff way. Zarr may set a standard for such things in the future that we would presumably want to align to but has none yet as far as I know.

Hope that helps!
Joe

emileten · 2024-04-05T09:00:37Z

Ok great I think I understand what's happening now, thanks for the additional input @joemoorhouse. So, when we try to subset a zarr to a latitude/longitude pair that don't have an exact match in the zarr coordinates (per the transform matrix definition), this lat/lon pair results in fractional pixel indices. No data point corresponds to that of course, and we have to make a decision. We allow for the following decisions :

interpolation="floor" : we take the 'previous' pixel data. This is the default.
interpolation="linear": we take the average of the surrounding pixels data (weighted by fractional distance)
interpolation="max": we take the max of the surrounding pixels data
... "min" ... same as above but minimum

Does this sound correct?

joemoorhouse · 2024-04-08T11:15:05Z

Hi @emileten, yes that's right. It turns out that how to do linear interpolation depends on how pixels are interpreted, i.e. pixel-is-area or pixel-is-point. We assume the former (pixel-is-area) - I think that is true for all data sets we are going to need. But actually the difference is only 0.5 pixels. I mention just in case you see 'pixel-is-area' and are wondering what that means.

emileten · 2024-04-08T14:27:11Z

Thanks @joemoorhouse that helps enough I think, I'll close this conversation

tomalrussell · 2024-04-22T15:45:25Z

Hi both - in case of interest, there's some discussion around standardising this way of storing the affine geotransform as metadata (rather than the full coordinate dimension arrays) over at zarr-developers/geozarr-spec#17

emileten closed this as completed Apr 8, 2024

emileten mentioned this issue Apr 16, 2024

Discussion : writing xarray-compatible zarr stores os-climate/hazard#66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding hazard data subsetting given asset's lat/lon pairs #263

Question regarding hazard data subsetting given asset's lat/lon pairs #263

emileten commented Apr 4, 2024

tomalrussell commented Apr 4, 2024

emileten commented Apr 4, 2024 •

edited

Loading

joemoorhouse commented Apr 4, 2024

emileten commented Apr 5, 2024

joemoorhouse commented Apr 8, 2024

emileten commented Apr 8, 2024

tomalrussell commented Apr 22, 2024

Question regarding hazard data subsetting given asset's lat/lon pairs #263

Question regarding hazard data subsetting given asset's lat/lon pairs #263

Comments

emileten commented Apr 4, 2024

tomalrussell commented Apr 4, 2024

emileten commented Apr 4, 2024 • edited Loading

joemoorhouse commented Apr 4, 2024

emileten commented Apr 5, 2024

joemoorhouse commented Apr 8, 2024

emileten commented Apr 8, 2024

tomalrussell commented Apr 22, 2024

emileten commented Apr 4, 2024 •

edited

Loading