Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for multi-dimensional resampling/regridding #486

Open
shoyer opened this issue Jul 21, 2015 · 32 comments
Open

API for multi-dimensional resampling/regridding #486

shoyer opened this issue Jul 21, 2015 · 32 comments

Comments

@shoyer
Copy link
Member

shoyer commented Jul 21, 2015

This notebook by @kegl shows a nice example of how to use pyresample with xray:
https://www.lri.fr/~kegl/Ramps/edaElNino.html#Downsampling

It would nice to build a wrapper for this machinery directly into xray in some way.

xref #475

cc @jhamman @rabernat

@rabernat
Copy link
Contributor

Pyresample is probably overkill for that case. Aggregating / decimating regular lat-lon grids could probably be done much more simply. For example

N = 10
fac = 2
x = np.arange(N, dtype=np.float64)
np.add.reduceat(x, np.arange(0,N,fac)) / fac

This gives array([ 0.5, 2.5, 4.5, 6.5, 8.5])

This type of resampling has the advantage of preserving certain integral invariants, as opposed to the nearest neighbor resampling in the example above. (Imagine if there had been lots of spatial variance below the 5 degree scale in that example--it would have been aliased horribly. That was only avoided because the original field was very smooth.) It is also very fast.

Pyresample seems best for complicated transformations from one map projection to another. I'm not sure I fully understand how xray handles grids where the coordinates are themselves 2d fields, as in this example.

@shoyer
Copy link
Member Author

shoyer commented Jul 21, 2015

Indeed, SciPy's ndimage.interpolation.zoom would probably be more appropriate (and faster) for regular grids.

Xray currently doesn't have any built-in support for handling projected data, but basic selection and regridding (from explicit arrays of 2D coordinates) seems in scope for the project.

@jhamman
Copy link
Member

jhamman commented Jul 21, 2015

I'd be interested in helping build a wrapper / api that supports the following types of regridding operations. I don't think pyresample handles all of these:

  1. Bilinear interpolation
  2. Bicubic interpolation
  3. Distance-weighted average remapping
  4. Nearest neighbor remapping
  5. Conservative (area) remapping
  6. Largest area fraction remapping

I would also like to see some better support for 2D coordinate variables. These are specifically outlined in the cf-conventions in section: 5.2. Two-Dimensional Latitude, Longitude, Coordinate Variables. Maybe we can open an additional issue for that...

@forman
Copy link

forman commented Apr 8, 2016

@jhamman: any progress on this? Our team would be happy to contribute as we have similar requirements in our project.

@rabernat
Copy link
Contributor

rabernat commented Apr 8, 2016

@forman I am starting to suspect that this might be possible to implement through #818

@jhamman
Copy link
Member

jhamman commented Apr 8, 2016

@forman - no progress on my end and no plans to work on this in the next 6 months. If your team is interested in working on this, we can discuss the api further and I'm happy to provide input and guidance along the way. One option is to wrap pyresample and possibly fill in some of the missing pieces in their api. I'd like to see a user interface that looks something like this:

da.resample_like(other, kind='bilinear', dims=('lat', 'lon'))

As @rabernat mentions, for some remapping/resampling, the multi-dimensional groupby will help with some of these applications.

@rabernat
Copy link
Contributor

rabernat commented Apr 8, 2016

I feel like the biggest application of the multi-dimensional groupby will be with "conservative resampling" and coarse-graining, where you want to make sure to conserve certain integrals (e.g. total heat content) while changing coordinates.

pyresample will be more useful for fine-graining and interpolating missing data.

@godfrey4000
Copy link

I have an immediate need in this area. My objective is to create a tool that will enable arithmetic on variables defined on lattices whose points don't coincide. Through my attempts thus far, it has become clear that I need data structures that incorporate spacial indexing and lattice indexing.

Since I have to tackle this issue to proceed, I thought I should follow the thinking discussed in this forum, so that it may be useful to others.

@rabernat
Copy link
Contributor

@godfrey4000: lots of us in the climate community would like xarray-backed regridding. It is a hard problem, however. It wold be great if you wanted to work on it.

At a recent workshop, a group of xarray users developed a draft design document for a regridding package.
https://aospy.hackpad.com/Regridding-Design-Document-tENJARIeg83

Your comments on this would be very welcome. An open question is whether the regridding belongs within xarray or in a standalone package (which would of course have xarray as a dependency).

@jhamman
Copy link
Member

jhamman commented Jan 10, 2017

@rabernat - I've more or less settled that this belongs in a separate package that uses xarray's accessor features.

I've setup a little project (xmap) to get this started that I hope we can garner some volunteers to help push forward (nudge @godfrey4000 and @forman): https://github.com/jhamman/xmap

There will likely be some overlap between features that xmap and xarray. In those cases, we can try to push relevant features toward xarray.

@godfrey4000
Copy link

I'm ready to start working on this project. I already have a prototype regridding class that I developed as part of another project. Working on that, I discovered these points:

  • regridding takes a long time because the lattices can be huge
  • the design should accomodate parallel processing on a cluster
  • data needs to be normalized first (deal with missing values, etc.)
  • the user will want choices

Some of these choices are:

  • the destination lattice
  • the interpolation algorithm
  • subset of the dimension space
     
    As the first step in a strategy to achieve this with a sequence of realizable goals, I plan to implement a regridding of just the latitude and longitude dimensions.

Is there a style guide that I can/should follow? Something like this: https://google.github.io/styleguide/pyguide.html? Does it or something else define naming conventions?

@shoyer
Copy link
Member Author

shoyer commented Jan 14, 2017

Is there a style guide that I can/should follow? Something like this: https://google.github.io/styleguide/pyguide.html? Does it or something else define naming conventions?

It's pretty standard to follow Python's PEP8 with NumPy-style docstrings.

I generally like the Google style guide, too, but it leans towards being overly strict -- sometimes it's OK to be more relaxed (e.g., it's rules for valid import statements). Also, the public facing version has gotten out of date (I think there are plans to update it).

@duncanwp
Copy link
Contributor

duncanwp commented Apr 5, 2017

@jhamman @godfrey4000 - I'm not sure of the status of this, but I'm the lead developer on a package called CIS which might be useful/relevant.

It was designed as a command line tool to allow easy collocation (resampling) between different model and observation datasets, but is now also a Python library. We spent a fair amount of time thinking about the various permutations and you can see some of the details in our paper here. Internally we currently use Iris Cube-like objects but it would be pretty easy to operate on xarray Datasets since they share a similar design.

The basic syntax is:

from CIS import read_data
X = read_data('some_obs_data.nc')
Y = read_data('some_other_data.nc')
X.sampled_from(Y)
# or...
Y.collocated_onto(X)

Happy to discuss further here, or in xmap.

@PeterDSteinberg
Copy link

Regridding is of interest to NASA and other clients of ours. It is important to them to be able to do broadcast operations between rasters that differ in resolution or are otherwise offset. We'll follow the XMap repo mentioned above ( @jhamman ) and see about building on that style. Our clients and open source tools like datashader for viz and elm for ML could use XMap and benefit from coordinate transformations and regridding. We have a meeting internally to discuss approaches to the coordinates' metadata and resampling / regridding and I'll be in touch further soon about how we can help here (see also the issues on this experimental earthio repo).

@forman
Copy link

forman commented May 31, 2017

@PeterDSteinberg please have a look at module gridtools.resampling of repo https://github.com/CAB-LAB/gridtools. There are various up- and downsampling methods, which can deal with NaNs, and which are fast as C thanks to JIT through Numba. We use this package successfully in two projects.

@JiaweiZhuang
Copy link

JiaweiZhuang commented Aug 29, 2017

I've wrapped ESMF/ESMPy by xarray: https://github.com/JiaweiZhuang/xESMF

It supports remapping between arbitrary quadrilateral grids, using ESMF's regridding algorithms including bilinear, conservative, nearest neighbour, etc... See this notebook for an example.

The package is still preliminary but it already works. See "Issues & Plans" in the main page for more details.

@rabernat
Copy link
Contributor

Awesome work @JiaweiZhuang! This could be a great way forward for this important need. I think lots of us would be keen to contribute to your project. I encourage you to add tests and docs...that will help other contributors feel comfortable getting involved.

@kegl
Copy link

kegl commented Aug 29, 2017

Super cool, thanks!

@JiaweiZhuang
Copy link

JiaweiZhuang commented Aug 30, 2017

@rabernat Thanks for the suggestion! I'll add tests&docs when time allows.

If you want to look into details: The package contains the two layers (explained in the "Design Idea" section). The first layer has nothing to do with xarray, but just provides a convenient way (only with numpy) to access a useful subset of ESMPy functions. This layer is important because ESMPy's API is too complicated, but once it is done it doesn't need to be changed too often. The second layer wraps the first layer using xarray. Most of the crafts will be added to the second layer.

As a temporary workaround, I've added another notebook for using the low-level wrapper, for interested developers.

@darothen
Copy link

If ESMF is the way to go, then some effort needs to be made to build conda recipes and other infrastructure for distributing and building the platform. It's a heavy dependency to haul around.

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 30, 2017

then some effort needs to be made to build conda recipes and other infrastructure for distributing and building the platform.

Like https://github.com/conda-forge/esmf-feedstock 😉

(Windows is still a problem b/c of the Fortran compiler.)

@darothen
Copy link

@ocefpaf Awesome, good to know that hurdle has already been leaped :)

@JiaweiZhuang
Copy link

@ocefpaf Any plan for Python3-compatible ESMPy? I only see Python2.7 here: https://github.com/conda-forge/esmpy-feedstock

@ocefpaf
Copy link
Contributor

ocefpaf commented Aug 30, 2017

@JiaweiZhuang let's discuss that in the feedstock issue tracker to avoid cluttering xarray's.

@JiaweiZhuang
Copy link

I am thinking about the API design for xESMF (JiaweiZhuang/xESMF#9). Any comments are welcome 😃

@shoyer
Copy link
Member Author

shoyer commented Nov 30, 2017

@mraspaud of pyresample expressed interest to me offline about bringing some of pyresample's resampling features into xarray -- welcome to the conversation!

@mraspaud
Copy link
Contributor

thanks @shoyer

@jhamman
Copy link
Member

jhamman commented Dec 1, 2017

@mraspaud - What functionality are you interested in bringing over? I've been watching pyresample for a while and would love to see our two packages leverage each other's functionality (where possible).

@mraspaud
Copy link
Contributor

mraspaud commented Dec 4, 2017

@jhamman One possibility would be to have a .resample on a DataArray (or equivalent independent function) that would be provided also a set of new coordinates, and that would return a new DataArray resampled to the new coordinates. One step further would be to implement this in sel or isel directly somehow.

@shoyer
Copy link
Member Author

shoyer commented Dec 4, 2017

For nearest-neighbor style resampling, we already have support for 1-dimensional resampling in .reindex()/.sel(). It would feel pretty natural to add support for N-dimensional resampling, too, if those lookups can use an index of some sort (i.e., a KDTree) to do things efficiently.

@mraspaud
Copy link
Contributor

mraspaud commented Dec 6, 2017

@shoyer absolutely, I will look into it, soon I hope

@stale
Copy link

stale bot commented Nov 6, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests