Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pangeo TEM example #145

Closed
tomwhite opened this issue Oct 21, 2022 · 4 comments · Fixed by #476
Closed

Pangeo TEM example #145

tomwhite opened this issue Oct 21, 2022 · 4 comments · Fixed by #476
Labels
benchmarks Example benchmark problem xarray-integration Uses or required for cubed-xarray integration

Comments

@tomwhite
Copy link
Member

This issue is to explore running the "Transformed Eulerian Mean diagnostic" example in https://github.com/dcherian/ncar-challenge-suite/blob/main/tem.ipynb using Cubed.

It uses Xarray, so needs pydata/xarray#7019

@tomwhite
Copy link
Member Author

Here's a jupyter notebook: https://github.com/tomwhite/cubed/blob/f5ece5b068db014f828bf6f3afcf6b05280af52a/examples/pangeo-tem.ipynb

There are a couple of pieces missing. The first is fairly minor, and is that mean doesn't work yet with NaNs, so I've added skipna=False just to make progress. It's tracked in pydata/xarray#7243.

The main thing that's missing is a Cubed-compatible path for xarray groupby. I think it's probably worth using the Flox path in xarray here, but then that will require a Cubed implementation for xarray apply_ufunc, which is being tracked in #67, and has been started by @TomNicholas in #119. Does this sound right @dcherian, or is there a reason to start by supporting the default internal xarray groupby?

@dcherian
Copy link

dcherian commented Oct 31, 2022

Yes you'll want apply_ufunc anyway.

We'll also want to support cubed in flox, which should be mostly easy since I'm using dask primitives. The main issue is how to deal with the intermediates. ATM Flox uses dictionaries like this

{
    "groups": np.array([1, 2, 3, 4])  # shape (ngroups,)
    "intermediates": (np.array([4,5,6,7]), np.array([1, 1, 1, 1]))  # so tuple(sum, counts) with shape (..., ngroups)
}

for the mean reduction

That said, for this particular problem I could write the groupby as an efficient indexing + reduction. That's probably better since we'll have a "pure array" version of the problem, which will be easy for experimentation and comparison across libraries

@tomwhite
Copy link
Member Author

tomwhite commented Nov 1, 2022

Thanks @dcherian.

For intermediates, Cubed uses structured arrays (e.g. mean), which seems similar to Flox.

That said, for this particular problem I could write the groupby as an efficient indexing + reduction.

Agree that would be good. Although we'll probably need to add integer array indexing to Cubed - see data-apis/array-api#177 for a bit of discussion about that from a standardisation point of view.

@tomwhite
Copy link
Member Author

tomwhite commented Nov 4, 2022

I've added integer array indexing (and take) to Cubed now (0b59b32), which should help for group by.

I've update the example notebook at https://github.com/tomwhite/cubed/blob/5eb5f23c25c37ec8634eb35a71711f4eaaffd643/examples/pangeo-tem.ipynb (this needed a few xarray changes). It doesn't use Flox group by yet though.

@TomNicholas TomNicholas added the xarray-integration Uses or required for cubed-xarray integration label May 24, 2023
@TomNicholas TomNicholas added the benchmarks Example benchmark problem label Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks Example benchmark problem xarray-integration Uses or required for cubed-xarray integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants