-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify chunks in bytes #8021
Comments
I like this suggestion! The trick will be to find a general way to map the chunk specification efficiently over the underlying storage backend's "preferred chunks" (e.g. #7948). Note that you can get most of what you want today with the following syntax: xr.open_dataset(..., chunks=None).chunk("1 GiB") In the future, I think it would be quite nice if we supported: xr.open_dataset(..., chunks="1 GiB") where the resulting chunks were constructed, as best as possible, to align with the chunks of the underlying dataset (e.g. Zarr, HDF5). xref: #1440 |
Hi @mrocklin
me too. What happens in the null case, NO chunking ?
shows odd results on my old imac. |
First thing to consider is whether your netcdf4 file is chunked or contiguous on disk. If it is not chunked on disk, Xarray and Dask can not do much to optimize partial array decompression. If it is chunked on disk, you'll likely find the best performance aligning your read chunks to the chunks on disk. #7948 recently added support
I'd like to leave this issue here because the feature described above still applies. I would encourage you to open a discussion for a more detailed conversation. |
@jhamman, apart from chunking, this testcase shows Multilevel caches SSD L2 L1 vary a LOT |
Is your feature request related to a problem?
I'm playing around with xarray performance and would like a way to easily tweak chunk sizes. I'm able to do this by backing out what xarray chooses in an
open_zarr
call and then provide the rightchunks=
argument. I'll admit though that I wouldn't mind giving Xarray a value like"1 GiB"
though and having it use that when determining"auto"
chunk sizes.Dask array does this in two ways. We can provide a value in chunks as like the following:
We also refer to a value in Dask config
This is not very important (I'm unblocked) but I thought I'd mention it in case someone is looking for some fun work 🙂
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: