Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5 support for compound datasets, character string datasets #1348

Open
mkoohafkan opened this issue Mar 7, 2019 · 6 comments
Open

HDF5 support for compound datasets, character string datasets #1348

mkoohafkan opened this issue Mar 7, 2019 · 6 comments

Comments

@mkoohafkan
Copy link

Using gdalinfo to list HDF5 subdatasets currently does not support compound or scalar string datasets. This is a limitation as certain spatial metadata such as labels, projections, etc. may be stored in these types of datasets. I presume that reading these types of datasets is not supported either. However, I find it interesting that gdalinfo is able to return string attributes of HDF5 groups just fine (although it does not return attributes for subdatasets) which suggests gdal already has some ability to handle strings.

Is the HDF5 driver still being supported? If so, is there any interest or capacity to expand the functionality of the HDF5 driver?

@piyushrpt
Copy link
Contributor

piyushrpt commented Mar 7, 2019

Yes. I would be interested in assisting in expanding the functionality.

The HDF5 driver supports reading a specific type of compound type that mimics complex datasets written by h5py, i.e a structure with 2 entries of the same type.

@mkoohafkan
Copy link
Author

mkoohafkan commented Mar 19, 2019

Thanks @piyushrpt I think I see some of the code for supporting complex datasets here.

Thinking about the purpose of GDAL, I'm wondering how to best approach this. The goal would not be to just have general-purpose support for HDF5 files, but I think some wiggle room for pulling data from tables to support the various ways that spatial metadata can be stored in HDF files would be helpful.

@piyushrpt
Copy link
Contributor

@mkoohafkan I agree with you that the implementation can be made more general. Here are a couple of things that I thought about but did not find enough time to implement and issue a PR. Maybe you and others can add to this list of thoughts:

  1. I think the best method might be to have a generic HDF5 driver - just like "raw" dataset driver and specializations be derived from it. The basic driver only interprets data types and provides read/write functionality for datasets and attributes.

  2. I see that there is Cosmo Skymed specific code baked into the driver. I think this is a useful functionality, but should probably be its own driver rather than be baked in to HDF5 driver. This could be a specialization of the generic HDF5 driver.

  3. I think there are quite a few projects that use CF conventions within HDF5 files (not netcdf only). That might be a good starting point for including spatial metadata. This could be a CF specialization of the HDF5 driver. Such data is already supported I believe by ESRI and software like panoply.

@piyushrpt
Copy link
Contributor

One option that could be considered for compound datatypes is that each element is returned as a separate band of a dataset. For example, if a dataset is a 2D array of compound types - this could be interpreted as a GDAL dataset where each band has a different type. This model gets complicated with 3D arrays. One can access individual elements of a compound type as shown in this example - https://support.hdfgroup.org/ftp/HDF5/examples/misc-examples/chgfield.c

For the example in link above - band 1 would be Int32, band 2 would be Float64 and band3 Float32.

@mdsumner
Copy link
Contributor

linking an old request/query for compound types with a specific interpretation

https://trac.osgeo.org/gdal/ticket/6551

@mdsumner
Copy link
Contributor

mdsumner commented Oct 18, 2024

gdalinfo can't see the single scalar variable in this file, but gdalmdiminfo can:

gdalmdiminfo /vsizip//vsicurl/https://github.com/user-attachments/files/17415359/NSIDC0051_SEAICE_PS_S25km_19781027_v2.0.nc.zip | grep crs -A 5 -B 2
  ],
  "arrays": {
    "crs": {
      "datatype": "String",
      "attributes": {
        "grid_mapping_name": "polar_stereographic",
        "straight_vertical_longitude_from_pole": 0,
        "false_easting": 0,

Perhaps we can modify the description and scope of this issue, limit it to compound types, given the multidimensional support: "classic" 2D raster mode is not suitable for scalars or 1D vars. I feel like compound types is really a "vector" issue, fwiw.

Also, making a gratuitous link to a related discussion in VirtualiZarr: zarr-developers/VirtualiZarr#260

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants