Documentation - variable-length datatype description seems incomplete (doesn't mention global heap) #1682
Labels
Component - Documentation
Doxygen, markdown, etc.
Priority - 2. Medium ⏹
It would be nice to have this in the next release
Type - Improvement
Improvements that don't add a new feature or functionality
Milestone
I'm working on trying to decode an HDF5 file (attached) manually using the specification (for the purpose of writing rust code for a decoder that quickly extracts raw signal data from nanopore FAST5 files), and am getting tripped up by the definition of variable-length data. I'm currently looking at the attribute message starting at location 0x14e62 in the file, which I've expanded out here in 8-byte chunks for clarity:
I can understand the first two lines of this:
The first 4 bytes of the third line define a v1 variable-length datatype; type string; null-terminated; with UTF-8 encoding. Beyond this I think the specification indicates a length should follow, so that's another 0x10 bytes for the datatype (consistent with the attribute information)... but then I get lost.
The datatype message specification suggests the next information that follows is "Properties", and for variable-length datatypes it states, "Each variable-length type is based on some parent type. The information for that parent type is described recursively by this field." Unfortunately, I don't understand what this means. The next four bytes in the datatype definition are
10 00 00 00
, and I can't find anything in the specification to help decode them. If I assume this is a variable class/version information segment, then I would expect it to decode to v1, fixed point. If I assume this is the property of a string, the specification tells me, "There are no properties defined for the string class." If I assume this is the property of an array, I end up with a dimensionality of 16, and there's not enough space in the datatype to define 16 dimensions.Moving onto the first eight bytes of the dataspace section (from 0x14e9a), I get v1, 0 dimensions (i.e. scalar value), and no set flags (with reserved bytes set to zero). That all seems fine, and consistent with what I expect. Following on from this (fifth line), I get lost again. The attribute message section tells me what should follow is the data itself, but that's not correct; this is not a null-terminated string.
After a lot of hunting through the file, and comparing with the output of h5dump, and checking h5debug, I found the version string I was looking for, in the global heap starting in the file at position 0x800, index 115 (0x73).
I found a little nugget of information in the specification for the global heap, which stated, "For example, data of variable-length datatype elements is stored in the global heap and is accessed via a global heap ID. The format for global heap IDs is described at the end of this section." It would have been really helpful for me if this information were in the section on variable-length datatype elements (it's not, from what I can tell). This clued me into realising that the last 12 bytes of this attribute section was probably the location of the global heap, and the index within that heap with the data (following what is described in the specification as, "The format for the ID used to locate an object in the global heap is described here:"). But I have no idea what the first 4 bytes on that line relate to (
03 00 00 00
). Are these flags relating to the heap, or the data? Are there any other variable types that use the global heap, or is the variable-length datatype the only one?There seems to be additional information in this attribute section that is being parsed by h5dump / h5debug (e.g.
CTYPE H5T_C_S1
;H5T_LOC_0
), but I can't see how it's defined in the HDF5 specification. Could someone please help me understand this?perfect_guppy.tar.gz
The text was updated successfully, but these errors were encountered: