HDF5 Limit on attribute creation order tracking #2054

gsjaardema · 2021-08-05T19:00:20Z

[This discussion was taking place over email, but I think I should put it here for easier searching and tracking]

Original Issue:

It looks like I am hitting a limit of 65,536 for creating attributes with attribute creation order tracking enabled. Is that the limit in HDF5? If so, is it possible to increase this number by changing a #define, or is it an inherent limit in the data format?

This is happening in a NetCDF-4 format file, so the HDF5 is being created via the NetCDF API:

HDF5-DIAG: Error detected in HDF5 (1.10.7) thread 0:
#000: /scratch/gdsjaar/seacas/TPL/hdf5/hdf5-1.10.7/src/H5A.c line 285 in H5Acreate2(): unable to create attribute
major: Attribute
minor: Unable to initialize object
#1: /scratch/gdsjaar/seacas/TPL/hdf5/hdf5-1.10.7/src/H5Aint.c line 275 in H5A__create(): unable to create attribute in object header
major: Attribute
minor: Unable to insert object
#2: /scratch/gdsjaar/seacas/TPL/hdf5/hdf5-1.10.7/src/H5Oattribute.c line 296 in H5O__attr_create(): attribute creation index can't be incremented
major: Attribute
minor: Unable to increment reference count

HDF5 Response:

According to the section IV.A.2.v. "The Attribute Info Message" in the File Format Spec maximum creation index is 2 bytes, so it is a file format issue. I think we do have an issue since there is no limit now on the number of attributes. We will need to introduce changes to the file format. We need to have a conversation on HDF5 limitations on sizes and how much work it will be.

gsjaardema · 2021-08-05T19:02:54Z

The issue is that NetCDF-4 uses "attribute creation order" tracking as described in docs/file_format_specification.md

\subsection creation_order Creation Order

The netCDF API maintains the creation order of objects that are
created in the file. The same is not true in HDF5, which maintains the
objects in alphabetical order. Starting in version 1.8 of HDF5, the
ability to maintain creation order was added. This must be explicitly
turned on in the HDF5 data file in several ways.

Each group must have link and attribute creation order set. The
following code (from libsrc4/nc4hdf.c) shows how the netCDF-4 library
sets these when creating a group.

\code
           /* Create group, with link_creation_order set in the group
            * creation property list. */
           if ((gcpl_id = H5Pcreate(H5P_GROUP_CREATE)) < 0)
              return NC_EHDFERR;
           if (H5Pset_link_creation_order(gcpl_id, H5P_CRT_ORDER_TRACKED|H5P_CRT_ORDER_INDEXED) < 0)
              BAIL(NC_EHDFERR);
           if (H5Pset_attr_creation_order(gcpl_id, H5P_CRT_ORDER_TRACKED|H5P_CRT_ORDER_INDEXED) < 0)
              BAIL(NC_EHDFERR);
           if ((grp->hdf_grpid = H5Gcreate2(grp->parent->hdf_grpid, grp->name,
                                            H5P_DEFAULT, gcpl_id, H5P_DEFAULT)) < 0)
              BAIL(NC_EHDFERR);
           if (H5Pclose(gcpl_id) < 0)
              BAIL(NC_EHDFERR);
\endcode

Each dataset in the HDF5 file must be created with a property list for
which the attribute creation order has been set to creation
ordering. The H5Pset_attr_creation_order function is used to set the
creation ordering of attributes of a variable.

The following example code (from libsrc4/nc4hdf.c) shows how the
creation ordering is turned on by the netCDF library.

\code
        /* Turn on creation order tracking. */
        if (H5Pset_attr_creation_order(plistid, H5P_CRT_ORDER_TRACKED|
                                       H5P_CRT_ORDER_INDEXED) < 0)
           BAIL(NC_EHDFERR);
\endcode

gsjaardema · 2021-08-05T19:12:36Z

I recently hit this issue again (and due to limited long-term memory) spent some time triaging it and then remembered the issue...

I pulsed the THG group again and they are willing to look into increasing the maximum attribute creation index from 16 bits to a larger value, but that probably won't happen until 1.14.0 at the earliest...

Since I had users who needed to write their files now (although writing to netcdf-3 / netcdf-5 format works), and I will probably have users start hitting the limit more frequently as model complexity increases... I decided to experiment. I removed all calls to H5Pset_attr_creation_order() and was able to "successfully" create the file and it seems to run through subsequent read/write calls with no discernable issues.

Question

What is the disadvantage of writing a netCDF-4 file with the attribute creation order turned off? Will this cause issues? If so, what issues and are they problem-specific (i.e., do some applications/uses of netCDF need this turned on and some can function fine without it?)

Proposal

If it is OK for some applications to work with netCDF files with the attribute creation ordering turned off, would it be possible to add a configuration (preferably run-time or less desirably compile-time) to the netCDF library which would disable the setting of the attribute creation ordering. This would give me a solution now instead of in a year or two (plus the time waiting for associated applications to catch up and be able to handle hdf5-1.14.X format files...)

I can work up a PR for consideration if this seems like a possibility...

gsjaardema · 2021-08-05T19:14:50Z

Note that it is probably OK to retain the attribute creation order tracking for the groups. The main area I am hitting the limit on is with datasets.

DennisHeimbigner · 2021-08-05T19:27:06Z

The difference is that when opening an existing dataset, the assigned attribute numbers
would differ from at creation time. It is possible, I suppose that some users do a bad
thing and access attributes by attribute number rather than name when reading a dataset.

edwardhartnett · 2021-08-05T20:17:58Z

This is unfortunately very important. If we don't turn on creation ordering, the varids will change. They will be reordered into alphabetical order.

Plenty of codes out there depend on var 0 being something, var 1 being something else, etc. So reodering the vars will break all kinds of user code and be a major violation of backwards compatibility.

gsjaardema · 2021-08-05T20:23:09Z

I am not suggesting this as a change to default behavior; I'm asking whether it would be possible to provide an option that the application / library could set to disable the use of the ordering if and only if the application / library knew that it could work correctly with non-deterministially-ordered variables...

edwardhartnett · 2021-08-06T06:48:09Z

I believe the attribute table that keeps track of dimensions may know about varids - I don't know how what you propose can work, but I am certainly open to suggestions...

gsjaardema · 2021-08-06T18:35:40Z

I have run a few tests on my netCDF library that has the attribute creation order tracking disabled and other than a different ordering for a few attributes, I can see no differences in the files. All of the vars appear in the exact same ordering in the file as they do with the attribute creation order tracking enabled (which I think makes sense...)

I'm not sure where to look to verify that the attribute table bookkeeping does not get changed, but the files that I am creating seem to be valid and are readable by tools linked with either a "pre-change" netcdf library or a "post-change" library.

If I run "ctest", I get some failures, but they all seem to be related to attribute ordering. I can see that if an application relied on the attributes appearing in a certain order, this change would break them, but in my uses, I always query the attribute by name and not position, so unless internally the library is relying on a specific ordering of the "hidden" attributes, I'm not sure how this would affect my subset of files. [I do agree that this breaks backward-compatibility so must be selectable at run-time and not the default behavior]

Work in progress / Proof of concept: Add a capability to disable the tracking of attribute creation order. See Unidata#2054 for details. This PR adds a `NC_NOATTCREORD` define which can be passed int the `mode` argument to `nc_create`. If it is present, then the calls to set the attribute creation order tracking is disabled. This should only be used for files in which you *know* that the ordering of the attributes does not matter to *any* potential readers of this database.

gsjaardema · 2021-08-06T19:36:43Z

See #2056 for a proof-of-concept implementation of what I am proposing.

gsjaardema · 2021-08-06T21:09:01Z

@edwardhartnett

This is unfortunately very important. If we don't turn on creation ordering, the varids will change.

I guess I am not understanding how the attribute creation ordering affects the varids. Could you explain? In my tests, it looks like the only thing that changes is the order of the attributes themselves.

DennisHeimbigner · 2021-08-06T23:55:07Z

Do you really have a variable with 2^16 attributes?

gsjaardema · 2021-08-06T23:57:22Z

No, but I have a file that has more than 2^16 attributes since each variable has a few hidden attributes and the count is global over the file and not local to a particular variable.

DennisHeimbigner · 2021-08-07T00:35:56Z

Is there a separate counter for variables and for attributes? That would lessen the problem
since it would only affect users who access attributes by attribute number (rather than name).
Not sure how common that is.

edwardhartnett · 2021-08-07T03:32:27Z

OK, I thought you were talking about changing the order of varids.

I agree that changing the order of attids has a far smaller impact.

One way to do this would be as a mode flag for the whole file at nc_create time. Another would be a variable setting. Unfortunately we don't have a mode flag for vars, so this would require a new function nc_def_var_att_ordering() or something.

In either case, I think old versions of netCDF-4 would still be able to work with the file. The user would have to know that attribute ordering is alphabetical not by creation.

DennisHeimbigner · 2021-08-08T20:21:48Z

Another solution is for netcdf to track creation order something like we track dimension ids.

edhartnett · 2021-08-09T07:29:40Z

Good idea! We would lose a minor backward compatibility. Let's suppose we turn off HDF5 creation ordering, and rely on our own tracking of creation order, and we do a new release with that code (4.8.2). In this release we keep track of attribute creation order in our hidden attribute metadata. Then, files created by 4.8.2 would be readable to previous versions, but the attribute order might be different from intended. Because versions of netcdf-c before 4.8.2 would not be able to interpret the attribute ordering information we are storing in our metadata hidden attributes. This could actually be pretty serious. For example, if NOAA upgraded to 4.8.2 all downstream users of NOAA data would see their attribute order change, until they upgraded to 4.8.2. For that reason, maybe this should still be something that has to be explicitly turned on via a mode flag or some var function?

…

On Sun, Aug 8, 2021 at 2:21 PM Dennis Heimbigner ***@***.***> wrote: Another solution is for netcdf to track creation order something like we track dimension ids. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2054 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCSXXAPN7PSCLLX5LS6HDTT33ROPANCNFSM5BUODBMA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

DennisHeimbigner · 2021-08-09T22:30:32Z

We also need to consider the reverse situation:

library with creation order disabled creates a file
someone else reads the file but their library has creation order enabled.

Will they experience problems?

edwardhartnett · 2021-08-10T06:43:20Z

No I don't think that would be a problem, the read code takes the attributes in whatever order the HDF5 library presents them. Creation order is determined at dataset creation time for the HDF5 file.

gsjaardema mentioned this issue Aug 6, 2021

Attribute creation order on/off #2056

Merged

kmuehlbauer mentioned this issue Jan 9, 2022

Files written by h5netcdf cannot be edited by netcdf4-python h5netcdf/h5netcdf#128

Closed

hmaarrfk mentioned this issue Jan 10, 2022

Monitor netCDF4 requirements for track_order h5netcdf/h5netcdf#130

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDF5 Limit on attribute creation order tracking #2054

HDF5 Limit on attribute creation order tracking #2054

gsjaardema commented Aug 5, 2021

gsjaardema commented Aug 5, 2021

gsjaardema commented Aug 5, 2021

gsjaardema commented Aug 5, 2021

DennisHeimbigner commented Aug 5, 2021

edwardhartnett commented Aug 5, 2021

gsjaardema commented Aug 5, 2021

edwardhartnett commented Aug 6, 2021

gsjaardema commented Aug 6, 2021

gsjaardema commented Aug 6, 2021

gsjaardema commented Aug 6, 2021 •

edited

Loading

DennisHeimbigner commented Aug 6, 2021

gsjaardema commented Aug 6, 2021

DennisHeimbigner commented Aug 7, 2021

edwardhartnett commented Aug 7, 2021

DennisHeimbigner commented Aug 8, 2021

edhartnett commented Aug 9, 2021 via email

DennisHeimbigner commented Aug 9, 2021

edwardhartnett commented Aug 10, 2021

HDF5 Limit on attribute creation order tracking #2054

HDF5 Limit on attribute creation order tracking #2054

Comments

gsjaardema commented Aug 5, 2021

Original Issue:

HDF5 Response:

gsjaardema commented Aug 5, 2021

gsjaardema commented Aug 5, 2021

Question

Proposal

gsjaardema commented Aug 5, 2021

DennisHeimbigner commented Aug 5, 2021

edwardhartnett commented Aug 5, 2021

gsjaardema commented Aug 5, 2021

edwardhartnett commented Aug 6, 2021

gsjaardema commented Aug 6, 2021

gsjaardema commented Aug 6, 2021

gsjaardema commented Aug 6, 2021 • edited Loading

DennisHeimbigner commented Aug 6, 2021

gsjaardema commented Aug 6, 2021

DennisHeimbigner commented Aug 7, 2021

edwardhartnett commented Aug 7, 2021

DennisHeimbigner commented Aug 8, 2021

edhartnett commented Aug 9, 2021 via email

DennisHeimbigner commented Aug 9, 2021

edwardhartnett commented Aug 10, 2021

gsjaardema commented Aug 6, 2021 •

edited

Loading