-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow-up to #507: Lazy loading of chunked xarray Datasets #544
Comments
@emanuel-schmid, @chahank, you do you feel about adding sparse as dependency? |
Hmm I do not know this package. The call is with @emanuel-schmid . But, if yes, what we should be careful in how it is imported, as we often import |
Yes, but it only affects import sparse as sp
import scipy.stats.sparse as sparse |
To hopefully ease your mind a bit: Using the See the note on the latest |
closed by #578 |
During the review of #507 the missing support for chunked (lazily-evaluated) datasets was considered a major issue because the entire dataset has to fit into memory (see #507 (comment)).
I am currently investigating a way to load the data lazily with dask arrays (a feature automatically supported by xarray, see https://docs.xarray.dev/en/stable/user-guide/dask.html). Dask arrays are distributed into smaller blocks or "chunks", which are designed to fit into memory. By default, such a block is implemented as
numpy.ndarray
.Xarray offers functions that are agnostic to whether the underlying array type is a
dask.array
or anumpy.ndarray
. This enables writing a function that takes the dask chunks and returns "sparse" array chunks:However, the above example does not work because
scipy.sparse.csr_matrix
does not support the requirednumpy.ndarray
API for this operation. One workaround would be to use the sparse library:If we don't want a new dependency, we would need to operate on the xarray data types directly, which probably means stitching the chunks together ourselves. Is it okay to add sparse as new dependency only for this particular use case or should I dig further to see if we can work around it?
The text was updated successfully, but these errors were encountered: