Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply_ufunc should preemptively broadcast #3032

Open
OriolAbril opened this issue Jun 18, 2019 · 11 comments
Open

apply_ufunc should preemptively broadcast #3032

OriolAbril opened this issue Jun 18, 2019 · 11 comments

Comments

@OriolAbril
Copy link
Contributor

Code Sample

I am having some troubles understanding apply_ufunc broadcasting rules. As I had some trouble understanding the docs, I am not 100% sure it is a bug, but I am quite sure. I will try to explain why with the following really simple example.

import xarray as xr
import numpy as np

a = xr.DataArray(data=np.random.normal(size=(7, 3)), dims=["dim1", "dim2"])
c = xr.DataArray(data=np.random.normal(size=(5, 6)), dims=["dim3", "dim4"])

def func(x,y):
    print(x.shape)
    print(y.shape)
    return

The function defined always raises an error when trying to call apply_ufunc, but this is intended, as the shapes have already been printed by then, and this keeps the example as simple as possible.

Problem description

xr.apply_ufunc(func, a, c)
# Out
# (7, 3, 1, 1)
# (5, 6)

Here, a has been kind of broadcasted, but I would expect the shapes of a and c to be the same as when calling xr.broadcast, as there are no input core dims, so all dimensions are broadcasted. However:

print([ary.shape for ary in xr.broadcast(a,c)])
# [(7, 3, 5, 6), (7, 3, 5, 6)]

Using different input core dims does not get rid of the problem, instead I believe it shows some more issues:

xr.apply_ufunc(func, a, c, input_core_dims=[["dim1"],[]])
# (3, 1, 1, 7), expected (3, 5, 6, 7)
# (5, 6), expected (3, 5, 6)

xr.apply_ufunc(func, a, c, input_core_dims=[[],["dim3"]])
# (7, 3, 1), expected (7, 3, 6)
# (6, 5), expected (7, 3, 6, 5)

xr.apply_ufunc(func, a, c, input_core_dims=[["dim1"],["dim3"]])
# (3, 1, 7), expected (3, 6, 7)
# (6, 5), expected (3, 6, 5)

Is this current behaviour what should be expected?

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8 (default, Jan 14 2019, 11:02:34) [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] python-bits: 64 OS: Linux OS-release: 4.15.0-52-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.3

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.3.0
netCDF4: 1.5.1.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.1.0
cartopy: None
seaborn: None
setuptools: 41.0.0
pip: 19.1.1
conda: None
pytest: 4.5.0
IPython: 7.5.0
sphinx: 2.0.1

@max-sixty
Copy link
Collaborator

Thanks for the issue & code sample @OriolAbril

IIUC, func needs to do this broadcasting itself; from the apply_ufunc docstring:

func : callable
    Function to call like ``func(*args, **kwargs)`` on unlabeled arrays
    (``.data``) that returns an array or tuple of arrays. If multiple
    arguments with non-matching dimensions are supplied, this function is
    expected to vectorize (broadcast) over axes of positional arguments in
    the style of NumPy universal functions [1]_ (if this is not the case,
    set ``vectorize=True``). If this function returns multiple outputs, you
    must set ``output_core_dims`` as well.

@OriolAbril
Copy link
Contributor Author

Then shouldn't a in the first example keep its original shape?

@max-sixty
Copy link
Collaborator

max-sixty commented Jun 18, 2019

Because func receives unlabelled arrays, apply_ufunc aligns the axes order. So if they share a dimension:

In [8]: import xarray as xr
   ...: import numpy as np
   ...:
   ...: a = xr.DataArray(data=np.random.normal(size=(7, 3)), dims=["dim1", "dim2"])
   ...: c = xr.DataArray(data=np.random.normal(size=(7, 6)), dims=["dim1", "dim4"])  # <- change here
   ...:
   ...: def func(x,y):
   ...:     print(x.shape)
   ...:     print(y.shape)
   ...:     return x
   ...:

In [9]: xr.apply_ufunc(func, a, c)
(7, 3, 1)
(7, 1, 6)

...otherwise func wouldn't know how to align.

Another option would be for your original example to put lengths of 1 in all axes, rather than only 'forward filling', e.g.

xr.apply_ufunc(func, a, c)
# Out
# (7, 3, 1, 1)
# (1, 1, 5, 6) # <- change here

I think it operates without that step because functions 'in the wild' generally will handle that themselves, but that's a guess and needs someone who knows this better to weight in

@shoyer
Copy link
Member

shoyer commented Jun 19, 2019

For what it's worth, I agree that this behavior is a little surprising. We should probably make apply_ufunc() explicitly broadcast arrays first.

@max-sixty
Copy link
Collaborator

We should probably make apply_ufunc() explicitly broadcast arrays first.

To confirm, so that we have something like this?

xr.apply_ufunc(func, a, c)
# Out
# (7, 3, 5, 6)
# (7, 3, 5, 6) 

@shoyer
Copy link
Member

shoyer commented Jun 19, 2019

Yes, exactly.

@max-sixty
Copy link
Collaborator

I'm trying to think whether there would be any performance cost there - i.e. are there any arrays where preemptive broadcasting would be both expensive and unnecessary?

@OriolAbril
Copy link
Contributor Author

I'm trying to think whether there would be any performance cost there - i.e. are there any arrays where preemptive broadcasting would be both expensive and unnecessary?

Even if there were a performance cost (compared to the actual behaviour), it could be easily avoided by using all dims as input_core_dims couldn't it? IIUC, all dims should be broadcasted unless they are in input core dims, so it broadcasting could still be avoided without problem.

@shoyer
Copy link
Member

shoyer commented Jun 19, 2019 via email

@max-sixty
Copy link
Collaborator

@shoyer thanks for the clarity

@OriolAbril would you mind if we changed this issue to "apply_ufunc should preemptively broadcast"

@OriolAbril
Copy link
Contributor Author

@max-sixty Not at all, whatever is best. I actually opened the issue without being 100% it was one.

@max-sixty max-sixty changed the title apply_ufunc broadcasting issues apply_ufunc should preemptively broadcast Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants