Return a 3D object alongside 1D object in apply_ufunc #8695

ahuang11 · 2024-02-02T18:47:14Z

Is your feature request related to a problem?

Currently, I have something similar to this, where the input_lat is transformed to new_lat (here, +0.25, but in real use case, it's indeterministic).

Since xarray_ufunc doesn't return a dataset with actual coordinates values, I had to return a second output to retain new_lat to properly update the coordinate values, but this second output is shaped time, lat, lon so I have to ds["lat"] = new_lat.isel(lon=0, time=0).values, which I think is inefficient; I simply need it to be shaped lat.

Any ideas on how I can modify this to make it more efficient?

import xarray as xr
import numpy as np

air = xr.tutorial.open_dataset("air_temperature")["air"]
input_lat = np.arange(20, 45)

def interp1d_np(data, base_lat, input_lat):
    new_lat = input_lat + 0.25
    return np.interp(new_lat, base_lat, data), new_lat

ds, new_lat = xr.apply_ufunc(
    interp1d_np,  # first the function
    air,
    air.lat,  # as above
    input_lat,  # as above
    input_core_dims=[["lat"], ["lat"], ["lat"]],  # list with one entry per arg
    output_core_dims=[["lat"], ["lat"]],  # returned data has one dimension
    exclude_dims=set(("lat",)),  # dimensions allowed to change size. Must be a set!
    vectorize=True,  # loop over non-core dims
)
new_lat = new_lat.isel(lon=0, time=0).values
ds["lat"] = new_lat

Describe the solution you'd like

Either be able to automatically assign the new_lat to the returned xarray object, or allow a 1D dataset to be returned

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

slevang · 2024-02-06T03:05:19Z

If new_lat is really 1-dimensional, meaning it doesn't vary by lat/lon point, why can't it be calculated or defined prior to running anything through apply_ufunc, and then passed through as an input to your interp function?

ahuang11 · 2024-02-06T03:14:12Z

It's indeterministic, or it's decided inside apply_ufunc

def _help_downsample(data, time, n_out):
    indices = MinMaxLTTBDownsampler().downsample(time, data, n_out=n_out)
    return data[indices], indices


def apply_downsample(ts_ds, n_out):
    ts_ds_downsampled, indices = xr.apply_ufunc(
        _help_downsample,
        ts_ds["data"],
        ts_ds["time"],
        kwargs=dict(n_out=n_out),
        input_core_dims=[["time"], ["time"]],
        output_core_dims=[["time"], ["indices"]],
        exclude_dims=set(("time",)),
        vectorize=True,
        dask="parallelized",
        dask_gufunc_kwargs=dict(output_sizes={"time": n_out, "indices": n_out}),
    )
    ts_ds_downsampled["time"] = ts_ds["time"].isel(time=indices.values[0])
    return ts_ds_downsampled.rename("data")

slevang · 2024-02-06T03:43:14Z

If indices is non-deterministic for every time _help_downsample is called for new data, doesn't that mean you want a unique vector of indices at all non-core input dims? You're calling apply_ufunc(..., vectorize=True) so the helper function is being called once per 1D timeseries, and selecting indices[0] would be throwing away unique index vectors in that case.

ahuang11 · 2024-02-06T04:14:51Z

You're right. I suppose I misunderstood; I thought it was stacking all the channels to calculate the indices.

Perhaps what I really want to do is just the following without apply_ufunc

new_da = da.stack({"temp": ["time", "channel"]})
MinMaxLTTBDownsampler().downsample(new_da["temp"], new_da["data"], n_out=n_out)

dcherian · 2024-02-06T16:21:55Z

I've run in to this before. There's no way to limit the broadcasting behaviour, so if there are broadcast dimensions on the input variables, those are assumed to be the broadcast dimensions on all output variables.

I believe the best solution is #1618 . In the mean time if you don't need vectorize you can specify all input and output dims as core dimensions and then there are no broadcast dimensions at all.

ahuang11 · 2024-02-15T20:28:28Z

Is there a way to do this without a loop?

import numpy as np
t0 = [0, 1, 2]
t1 = [3, 4, 5]

ts = [t0, t1]
z = np.array([[6, 7, 8], [9, 10, 11]])

ds = xr.DataArray(z, dims=["channel", "time"]).to_dataset("channel")
for i, channel in enumerate(ds.data_vars):
    dim = f"time_{i}"
    ds[channel] = (dim, ds[channel].values)
    ds[channel] = ds[channel].assign_coords({dim: ts[i]})
ds

slevang · 2024-02-15T20:57:25Z

If the time vectors are all the same length, you can just store as a multi-dimensional coordinate:

ds = xr.DataArray(
    z, 
    dims=["channel", "t_idx"], 
    coords={"time": (("channel", "t_idx"), ts)}
)

ahuang11 added the enhancement label Feb 2, 2024

ahuang11 mentioned this issue Feb 6, 2024

[GOAL] Demo viewing of large multi-chan timeseries data with multi-time-resolution generator and dynamic accessor holoviz-topics/neuro#87

Closed

2 tasks

dcherian added the plan to close May be closeable, needs more eyeballs label Feb 6, 2024

max-sixty closed this as completed Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return a 3D object alongside 1D object in apply_ufunc #8695

Return a 3D object alongside 1D object in apply_ufunc #8695

ahuang11 commented Feb 2, 2024 •

edited

Loading

slevang commented Feb 6, 2024

ahuang11 commented Feb 6, 2024

slevang commented Feb 6, 2024

ahuang11 commented Feb 6, 2024 •

edited

Loading

dcherian commented Feb 6, 2024

ahuang11 commented Feb 15, 2024

slevang commented Feb 15, 2024

Return a 3D object alongside 1D object in apply_ufunc #8695

Return a 3D object alongside 1D object in apply_ufunc #8695

Comments

ahuang11 commented Feb 2, 2024 • edited Loading

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

slevang commented Feb 6, 2024

ahuang11 commented Feb 6, 2024

slevang commented Feb 6, 2024

ahuang11 commented Feb 6, 2024 • edited Loading

dcherian commented Feb 6, 2024

ahuang11 commented Feb 15, 2024

slevang commented Feb 15, 2024

ahuang11 commented Feb 2, 2024 •

edited

Loading

ahuang11 commented Feb 6, 2024 •

edited

Loading