-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open_virtual_dataset
fails when there is a subgroup
#336
Comments
hmm a very similar error was raised in #159 and I thought fixed in #165. Tagging @scottyhq in case he has any insights. There might be something about this file that causes kerchunk to return references in a slightly different format than for the other files we have tested (you can see it's failing at the point of trying to translate kerchunk's in-memory references format to virtualizarr's
We decided it wasn't mature enough to just switch to using that as the default yet. But you can turn it on by importing this class and passing that in via from virtualizarr.readers.hdf import HDFVirtualBackend
vds = open_virtual_dataset('file.nc', backend=HDFVirtualBackend) |
Also is there a reason you're doing |
Because I want to make a virtual dataset for the root group, not the subgroup. If I leave out the group argument entirely, it complains about "multiple groups found." |
Right. Now you say that, it seems obvious that the behaviour instead should match xarray's, where the default is to return the contents of the root group. I'm looking at what went wrong here now. |
(This issue is inspired by the NAS GES DISC GPM_3IMERGHH_07 dataset, which has this same structure. cc @abarciauskas-bgse)
Consider a NetCDF dataset that has a valid NetCDF group at one level of the hierarchy and then a sub-group beneath that. We can make one like this:
Xarray can open either group fine.
For the root group, it just ignores the sub group.
However, VirtualiZarr doesn't like it
It looks like VirtualiZarr is assuming that all child nodes in the hierarchy are arrays, not groups.
I'm also curious why we are not going through the new Non-kerchunk backend for HDF5/netcdf4 files, rather than the kerchunk backend. How do you turn that on? (I'm on 1.2.0.)
Related to #84
The text was updated successfully, but these errors were encountered: