Parallelization via dask #7
Labels
performance
references generation
Reading byte ranges from archival files
xarray
Requires changes to xarray upstream
There are two places we could use xarray's machinery for parallelization to potentially speed up the generation of references.
Using
parallel=True
inxr.open_mfdataset
, which would then usedask.delayed
to parallelize the generation of the byte ranges from each file. This could be a big speedup, as it would parallelize the opening of the legacy files.In theory we could also wrap the
ManifestArray
objects withdask.Array
, then use dask's tree-reduce to do the concatenation. I think this is roughly whatkerchunk.combine.auto_dask
is approximating. However I'm not totally confident that (a) this is set up to work right now in dask.array or (b) this actually is a performance bottleneck in practice.The text was updated successfully, but these errors were encountered: