-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding hazard data subsetting given asset's lat/lon pairs #263
Comments
Hi @emileten - I was curious about this recently too (but haven't worked directly on The key metadata is pulled from the store a few lines earlier in physrisk/src/physrisk/data/zarr_reader.py Lines 96 to 99 in 0feb814
|
Thanks @tomalrussell that's very helpful ! So, if I am understanding correctly, we're not storing the coordinate labels in the zarr store, but only this My deeper question sort of remains. What actually happens if the input lat/lon wasn't in the hazard zarr store in the first place (i.e there is no exact match) ? I would have expected the behavior of the code to be equivalent to a 'nearest neighbor match' -- i.e, we pick the zarr coordinate that's the closest to the provided lat/lon. The answer to my question, I guess, is in how this matrix is defined? Edit: or rather, is this the role of the interpolation method ? |
Hi @emileten and thanks for the help @tomalrussell! The affine matrix lets us transform from the geographical space e.g. latitudes and longitudes to pixel space directly, so no labels needed. In general the result of the transform is fractional pixel indices so then we can decide whether to calculate the point by bilinear interpolation or take the containing pixel or other variations. See, e.g. Geotiffs support affine matrices so nothing new under the sun here actually (i.e. it is pretty widely adopted). NetCDF and XArrays do have coordinate labels of course. In the hazard repo we have an OscZarr class that reads the Zarr arrays as Xarrays (e.g. for generating map tiles) - that is, we can convert between representations as required; coordinate labels can be inferred from affine matrix and array dimensions. Xarray has its own format for saving arrays in Zarr format but there we end up with separate arrays for the coordinates. Downsides of this are performance (need always to load up coordinates before doing anything and cache that in certain circumstances) and having a nested structure. Hence we went more the Geotiff way. Zarr may set a standard for such things in the future that we would presumably want to align to but has none yet as far as I know. Hope that helps! |
Ok great I think I understand what's happening now, thanks for the additional input @joemoorhouse. So, when we try to subset a zarr to a latitude/longitude pair that don't have an exact match in the zarr coordinates (per the transform matrix definition), this lat/lon pair results in fractional pixel indices. No data point corresponds to that of course, and we have to make a decision. We allow for the following decisions :
Does this sound correct? |
Hi @emileten, yes that's right. It turns out that how to do linear interpolation depends on how pixels are interpreted, i.e. pixel-is-area or pixel-is-point. We assume the former (pixel-is-area) - I think that is true for all data sets we are going to need. But actually the difference is only 0.5 pixels. I mention just in case you see 'pixel-is-area' and are wondering what that means. |
Thanks @joemoorhouse that helps enough I think, I'll close this conversation |
Hi both - in case of interest, there's some discussion around standardising this way of storing the affine geotransform as metadata (rather than the full coordinate dimension arrays) over at zarr-developers/geozarr-spec#17 |
Hi @joemoorhouse,
This is a question about how the code works, not a feature request or a bug report. Let me know if there is a better place for this (I didn't find a discussion section in the repo).
I've been diving in
physrisk
yesterday. I am trying to understand how the logic of selecting pixels in the hazard data for given assets locations provided by the client works. In other words, I am trying to find where the mapping :assets lat/lons -> hazard data pixels
lives and understand it.I believe I've identified the location : in
zarr_reader
, specifically in theget_curves
method. The precise line of code where the spatial subsetting happens seems to be this one where we make use of thezarr-py
get_coordinate_selection
method. We pass index values (iz,iy,ix
) corresponding to the pixels we're selecting in the data.So, the mapping logic must happen right before. I am not able though to understand how the mapping works. Where exactly are the lat/lons mapped to the zarr indices and how does that logic work in
get_curves
? I first thought it's in the call to the_get_coordinates
method, but this one doesn't access the zarr store at all.Could you shed a little bit of light on this ? Note that I may be looking at the wrong place entirely, let me know if so !
Thank you.
cc @geohacker
The text was updated successfully, but these errors were encountered: