-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDF5 Limit on attribute creation order tracking #2054
Comments
The issue is that NetCDF-4 uses "attribute creation order" tracking as described in
|
I recently hit this issue again (and due to limited long-term memory) spent some time triaging it and then remembered the issue... I pulsed the THG group again and they are willing to look into increasing the maximum attribute creation index from 16 bits to a larger value, but that probably won't happen until 1.14.0 at the earliest... Since I had users who needed to write their files now (although writing to netcdf-3 / netcdf-5 format works), and I will probably have users start hitting the limit more frequently as model complexity increases... I decided to experiment. I removed all calls to QuestionWhat is the disadvantage of writing a netCDF-4 file with the attribute creation order turned off? Will this cause issues? If so, what issues and are they problem-specific (i.e., do some applications/uses of netCDF need this turned on and some can function fine without it?) ProposalIf it is OK for some applications to work with netCDF files with the attribute creation ordering turned off, would it be possible to add a configuration (preferably run-time or less desirably compile-time) to the netCDF library which would disable the setting of the attribute creation ordering. This would give me a solution now instead of in a year or two (plus the time waiting for associated applications to catch up and be able to handle hdf5-1.14.X format files...) I can work up a PR for consideration if this seems like a possibility... |
Note that it is probably OK to retain the attribute creation order tracking for the groups. The main area I am hitting the limit on is with datasets. |
The difference is that when opening an existing dataset, the assigned attribute numbers |
This is unfortunately very important. If we don't turn on creation ordering, the varids will change. They will be reordered into alphabetical order. Plenty of codes out there depend on var 0 being something, var 1 being something else, etc. So reodering the vars will break all kinds of user code and be a major violation of backwards compatibility. |
I am not suggesting this as a change to default behavior; I'm asking whether it would be possible to provide an option that the application / library could set to disable the use of the ordering if and only if the application / library knew that it could work correctly with non-deterministially-ordered variables... |
I believe the attribute table that keeps track of dimensions may know about varids - I don't know how what you propose can work, but I am certainly open to suggestions... |
I have run a few tests on my netCDF library that has the attribute creation order tracking disabled and other than a different ordering for a few attributes, I can see no differences in the files. All of the I'm not sure where to look to verify that the attribute table bookkeeping does not get changed, but the files that I am creating seem to be valid and are readable by tools linked with either a "pre-change" netcdf library or a "post-change" library. If I run "ctest", I get some failures, but they all seem to be related to attribute ordering. I can see that if an application relied on the attributes appearing in a certain order, this change would break them, but in my uses, I always query the attribute by name and not position, so unless internally the library is relying on a specific ordering of the "hidden" attributes, I'm not sure how this would affect my subset of files. [I do agree that this breaks backward-compatibility so must be selectable at run-time and not the default behavior] |
Work in progress / Proof of concept: Add a capability to disable the tracking of attribute creation order. See Unidata#2054 for details. This PR adds a `NC_NOATTCREORD` define which can be passed int the `mode` argument to `nc_create`. If it is present, then the calls to set the attribute creation order tracking is disabled. This should only be used for files in which you *know* that the ordering of the attributes does not matter to *any* potential readers of this database.
See #2056 for a proof-of-concept implementation of what I am proposing. |
I guess I am not understanding how the attribute creation ordering affects the |
Do you really have a variable with 2^16 attributes? |
No, but I have a file that has more than 2^16 attributes since each variable has a few hidden attributes and the count is global over the file and not local to a particular variable. |
Is there a separate counter for variables and for attributes? That would lessen the problem |
OK, I thought you were talking about changing the order of varids. I agree that changing the order of attids has a far smaller impact. One way to do this would be as a mode flag for the whole file at nc_create time. Another would be a variable setting. Unfortunately we don't have a mode flag for vars, so this would require a new function nc_def_var_att_ordering() or something. In either case, I think old versions of netCDF-4 would still be able to work with the file. The user would have to know that attribute ordering is alphabetical not by creation. |
Another solution is for netcdf to track creation order something like we track dimension ids. |
Good idea!
We would lose a minor backward compatibility. Let's suppose we turn off
HDF5 creation ordering, and rely on our own tracking of creation order, and
we do a new release with that code (4.8.2). In this release we keep track
of attribute creation order in our hidden attribute metadata.
Then, files created by 4.8.2 would be readable to previous versions, but
the attribute order might be different from intended. Because versions of
netcdf-c before 4.8.2 would not be able to interpret the attribute ordering
information we are storing in our metadata hidden attributes.
This could actually be pretty serious. For example, if NOAA upgraded to
4.8.2 all downstream users of NOAA data would see their attribute order
change, until they upgraded to 4.8.2.
For that reason, maybe this should still be something that has to be
explicitly turned on via a mode flag or some var function?
…On Sun, Aug 8, 2021 at 2:21 PM Dennis Heimbigner ***@***.***> wrote:
Another solution is for netcdf to track creation order something like we
track dimension ids.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2054 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCSXXAPN7PSCLLX5LS6HDTT33ROPANCNFSM5BUODBMA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
We also need to consider the reverse situation:
Will they experience problems? |
No I don't think that would be a problem, the read code takes the attributes in whatever order the HDF5 library presents them. Creation order is determined at dataset creation time for the HDF5 file. |
[This discussion was taking place over email, but I think I should put it here for easier searching and tracking]
Original Issue:
HDF5 Response:
According to the section IV.A.2.v. "The Attribute Info Message" in the File Format Spec maximum creation index is 2 bytes, so it is a file format issue. I think we do have an issue since there is no limit now on the number of attributes. We will need to introduce changes to the file format. We need to have a conversation on HDF5 limitations on sizes and how much work it will be.
The text was updated successfully, but these errors were encountered: