-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDF5 support for compound datasets, character string datasets #1348
Comments
Yes. I would be interested in assisting in expanding the functionality. The HDF5 driver supports reading a specific type of compound type that mimics complex datasets written by h5py, i.e a structure with 2 entries of the same type. |
Thanks @piyushrpt I think I see some of the code for supporting complex datasets here. Thinking about the purpose of GDAL, I'm wondering how to best approach this. The goal would not be to just have general-purpose support for HDF5 files, but I think some wiggle room for pulling data from tables to support the various ways that spatial metadata can be stored in HDF files would be helpful. |
@mkoohafkan I agree with you that the implementation can be made more general. Here are a couple of things that I thought about but did not find enough time to implement and issue a PR. Maybe you and others can add to this list of thoughts:
|
One option that could be considered for compound datatypes is that each element is returned as a separate band of a dataset. For example, if a dataset is a 2D array of compound types - this could be interpreted as a GDAL dataset where each band has a different type. This model gets complicated with 3D arrays. One can access individual elements of a compound type as shown in this example - https://support.hdfgroup.org/ftp/HDF5/examples/misc-examples/chgfield.c For the example in link above - band 1 would be Int32, band 2 would be Float64 and band3 Float32. |
linking an old request/query for compound types with a specific interpretation |
gdalinfo can't see the single scalar variable in this file, but gdalmdiminfo can: gdalmdiminfo /vsizip//vsicurl/https://github.com/user-attachments/files/17415359/NSIDC0051_SEAICE_PS_S25km_19781027_v2.0.nc.zip | grep crs -A 5 -B 2
Perhaps we can modify the description and scope of this issue, limit it to compound types, given the multidimensional support: "classic" 2D raster mode is not suitable for scalars or 1D vars. I feel like compound types is really a "vector" issue, fwiw. Also, making a gratuitous link to a related discussion in VirtualiZarr: zarr-developers/VirtualiZarr#260 |
Using
gdalinfo
to list HDF5 subdatasets currently does not support compound or scalar string datasets. This is a limitation as certain spatial metadata such as labels, projections, etc. may be stored in these types of datasets. I presume that reading these types of datasets is not supported either. However, I find it interesting thatgdalinfo
is able to return string attributes of HDF5 groups just fine (although it does not return attributes for subdatasets) which suggests gdal already has some ability to handle strings.Is the HDF5 driver still being supported? If so, is there any interest or capacity to expand the functionality of the HDF5 driver?
The text was updated successfully, but these errors were encountered: