-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apply_ufunc gives wrong dtype with dask=parallelized and vectorized=True #4015
Comments
It works when you set da2 = xr.apply_ufunc(
func,
da,
vectorize=False,
dask="parallelized",
output_dtypes=[da.dtype],
)
assert da2.dtype == da.dtype, "wrong dtype" or when you pass your own da2 = xr.apply_ufunc(
func,
da,
vectorize=True,
dask="parallelized",
output_dtypes=[da.dtype],
meta=da
)
assert da2.dtype == da.dtype, "wrong dtype" This also goes wrong if the DataArray has another dtype, e.g. da = xr.DataArray(np.arange(2*3*4).reshape(2,3,4)
da = da.chunk(dict(dim_1=1))
da2 = xr.apply_ufunc(
func,
da,
vectorize=True,
dask="parallelized",
output_dtypes=[da.dtype],
)
assert da2.dtype == da.dtype, "wrong dtype" Indeed, the dtype of meta takes precedence over the dtype (https://github.com/dask/dask/blob/25005e19cc30e8b2877d4dadbaef378ee912bdc0/dask/array/core.py#L1022):
|
@mathuse is right. The solution is to use xarray/xarray/core/computation.py Lines 1008 to 1011 in 3820fb7
@ulijh or @mathause , are either of you up for sending in a PR? For now the workaround is to pass your own |
* ENH: use `dask.array.apply_gufunc` in `xr.apply_ufunc` for multiple outputs when `dask='parallelized'`, add/fix tests * DOC: Update docstring and whats-new.rst * WIP: apply_gufunc * WIP: apply_gufunc -> reinstate dask='allowed' as per @mathause, adapting tests * WIP: apply_gufunc -> add test for GH #4015, fix test for sparse meta checking * WIP: apply_gufunc -> remove unused `input_dims` * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <[email protected]> * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <[email protected]> * Update xarray/core/computation.py Co-authored-by: Mathias Hauser <[email protected]> * WIP: use dask_gufunc_kwargs, keep vectorize first but only for non-dask-gufunc, rework docstrings, adapt tests * DOC: add reference to internal changes in whats-new.rst * FIX: mypy * FIX: vectorize inside `apply_variable_ufunc` * TST: add tests from #4022 from @mathause * FIX: address black issue * FIX: xfail test for dask < 2.3 * WIP: apply changes in response to @mathause's review comments * WIP: remove line * WIP: catch different chunksize error and allow_rechunk, docstring fixes * WIP: remove comment * WIP: style issues * WIP: revert catch, revert test, add tests without output_dtypes * WIP: fix signature in apply_ufunc->apply_gufunc, handle output_sizes, handle dask version, fix tests * WIP: fix tuple * WIP: add dims_map to _UFuncSignature, adapt output_sizes to fit for apply_gufunc * WIP: black * WIP: raise ValueError if output_sizes dimension mismatch * WIP: raise ValueError if output_sizes is missing for given output_core_dims * WIP: simplify if/else * FIX: resolve conflicts prior merge with master * FIX: combine if's as per review * FIX: pass `vectorize` and `output_dtypes` kwargs explicitely into `apply_variable_ufunc` as per review suggestion * FIX: pass `vectorize` and `output_dtypes` kwargs explicitely into `da.apply_gufunc` * FIX: address review comments of @keewis and @mathause * FIX: black * FIX: `vectorize` not needed in if-clause * FIX: set DeprecationWarning and stacklevel=2 * FIX: use FutureWarning for user visibility * FIX: remove comment as suggested Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Mathias Hauser <[email protected]> Co-authored-by: Deepak Cherian <[email protected]>
Applying a function to a data array with
dtype = complex
returns one withdtype = float
. It seems to work before commit 17b70ca.MCVE Code Sample
Expected Output
da
andda2
should both have the samedtype=complex
.Problem Description
To me it seems to me that the kwarg
meta=None
somehow causes dask to allocate a float array and ignore the dtype kwargs (which seems to be carried through correctly down todask.array.blockwise.blockwise()
) . I'm not familiar with the apply_ufing and the dask code, so I can't tell on which end the bug sits.Versions
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.2 (default, Apr 8 2020, 14:31:25)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.6.5-arch3-1
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_DE.utf8
LOCALE: de_DE.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.3
xarray: 0.15.2.dev47+g33a66d63
pandas: 1.0.3
numpy: 1.18.3
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.1.1.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.12.0
distributed: 2.14.0
matplotlib: 3.2.1
cartopy: 0.17.0
seaborn: 0.10.0
numbagg: None
pint: None
setuptools: 46.1.3
pip: 20.0.2
conda: None
pytest: 5.4.1
IPython: 7.13.0
sphinx: 3.0.2
The text was updated successfully, but these errors were encountered: