Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

open_datatree() keeps the hdf file open preventing writes #325

Closed
KareemShalabi opened this issue Mar 30, 2024 · 2 comments
Closed

open_datatree() keeps the hdf file open preventing writes #325

KareemShalabi opened this issue Mar 30, 2024 · 2 comments

Comments

@KareemShalabi
Copy link

Consider this analysis pipline:
Multiple arrays for the same data variable organized in a group hierarchy inside HDF file according to some attributes. A datatree is a perfect data structure container for that. I can read all arrays in a chunked dask datasets, and map the function over the datatree collecting the results on the way.

Because the size of the final result of the function is way out of memory, I tried saving the intermediary results(result of computation in a single iteration) to the same file and group path returning the new chunked dataarray after reloading. An exception is thrown, because the file is hold open by the datatree object. This does not happen when I create datatree object myself ( from a dict of group paths and dataarray objects).

@TomNicholas
Copy link
Member

Thanks for raising this. I think this issue is a duplicate of #93. There was a PR opened to fix it but realistically given that we're currently integrating datatree into Xarray main, we'll probably prioritize fixing there instead of in this package.

@TomNicholas
Copy link
Member

Closing in favour of pydata/xarray#9337

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants