-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variable.stack constructs extremely large chunks #5754
Comments
Ah this is dask/dask#5544 again. It looks like dask needs to break up the potentially-very-large intermediate chunks. That said our strategy of transposing first means that the optimization implemented in dask/dask#5544 (comment) doesn't kick in in this case. |
Fixed upstream |
Sorry, is this fixed? |
It was fixed in dask, but we're still sub-optimal. Do you have an example of a problem? Please open a new issue with a reproducible example if you do. |
I simply tested var.stack(new=("x", "y")) and I got the above message. I don't understand why From 1521 to 1527 line of xarray/core/variable.py they did reshape? |
This is fine. That warning says they're fixing the issue reported here. |
Minimal Complete Verifiable Example:
Here's a small array with too-small chunk sizes just as an example
Now stack two dimensions, this is a 100x increase in chunk size (in my actual code, 85MB chunks become 8.5GB chunks =) )
But calling
reshape
on the dask array preserves the original chunk sizeSolution
Ah, found it , we transpose then reshape in
Variable_stack_once
.xarray/xarray/core/variable.py
Lines 1521 to 1527 in f915515
Writing those steps with pure dask yields the same 100x increase in chunksize
Anything else we need to know?:
Environment:
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:21:18)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1127.18.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.19.0
pandas: 1.3.1
numpy: 1.21.1
scipy: 1.5.3
netCDF4: 1.5.6
pydap: installed
h5netcdf: 0.11.0
h5py: 3.3.0
Nio: None
zarr: 2.8.3
cftime: 1.5.0
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: 3.0.4
bottleneck: 1.3.2
dask: 2021.07.2
distributed: 2021.07.2
matplotlib: 3.4.2
cartopy: 0.19.0.post1
seaborn: 0.11.1
numbagg: None
pint: 0.17
setuptools: 49.6.0.post20210108
pip: 21.2.2
conda: 4.10.3
pytest: 6.2.4
IPython: 7.26.0
sphinx: 4.1.2
The text was updated successfully, but these errors were encountered: