-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage of Enums instead of flag #238
Comments
Thanks for the question, @bzah. I think the answer is that it's not CF-compliant to use netCDF4 Enums at the moment. CF has not yet adopted all features which were introduced in netCDF4. Of course it can be discussed whether Enums should be supported by CF. Probably it would be similar to
Jonathan |
The question is a bit more subtle than that and I will provide context below. It is clear that currently enumerations are not part of the convention. So we cannot expect any reading tool relying on CF to do anything with the enumeration data type information (raw data decoding is handled by the base library though, so it just appears like a normal variable). An enumerated variable may have additional flag_values and flag_meanings attributes, in which case it seems it would be CF compliant. It's an inelegant solution though as it would result in duplicating the information, with the risk of unconsistencies. The use case we are having now at EUMETSAT, is that we have netCDF4 products using a lot of enumerations as this was deemed useful regardless of the CF-compliance aspect. Considering that changing formats at this stage would be massively inconvenient, we are trying to identify the best route to make them more CF-friendly in a pragmatic way. One way is to add the missing attributes (which won't break our existing interfaces). Another one is to adapt the tools we make available to our users (e.g. python readers) to be able to parse the enumeration datatype and create the same metadata as if the CF attributes were present. The latter case is more of a short term workaround though and the first one might be needed for compatibility with other tools. |
Could this be a topic for the cf-workshop in October ? |
I think this sounds like a good topic for the workshop. |
We discussed this issue during the last CF workshop.
Additionally we would like to see the support for Enums improved in various tools (The following is merely a wish list, not requirements):
Notes:
Please feel free to comment, I might have missed/forgotten something. |
Dear Abel @bzah Thanks for the useful summary. My impression from the discussion is that For data-readers, and software that supports reading the data, Because of the extra complexity, I think this point from our discussion is very important:
If this were allowed, existing software could read the I wonder what the prospects are for the netCDF lib being altered in this way. Cheers Jonathan |
I think it this is a more compelling argument for inclusion than either the data checking or ease for writers. It would could be helpful for consumers of the data as well to have it defined in one place. (that is, of there are use-cases -- I've never seen the same flag values used in multiple variables, but that's sut my limited experience)
I'm confused -- can't you do that now? if it's stored as an int with flag-value, it can be read and cast to an Enum, yes? At least I'm pretty sure that's what we do in some of our code now. |
Dear Chris
I have no experience of this myself, but our discussion group understood that the API would not let you obtain an Enum netCDF variable with the same function as you use for an integer netCDF variable, even though that is actually how it's stored on disk. That's my understanding of what was said, anyway. Abel @bzah or others more experienced can correct me! Best wishes Jonathan |
Ahh -- got it. I'm getting out of my area of expertise, but I think you can do essentially anything with C, but it would require custom code not built in to the netcdf C lib -- that makes sense. That could be a nice utility, yes. |
Ideally, we were thinking, existing code would require no change to read Enums. If I understand correctly, that implies a change in the netCDF lib or API. If that were done, and if Enum variables also had |
If a client code that know nothing about the netCDF4 enum type enounters a file that stores an enum variable, then it will probably fail if it tries to read the enum. The reason is that the client would first need to use the API to figure out what the base type (integer, unsigned int, byte, etc) of the enum is so that the client can allocate enough memory before getting the variable. In language like C, all this needs to be done explicitly through the API. A smart interpreted language (e.g., Python) wrapper would do all this memory querying and allocation automatically but would probably still fail unless it had been specially coded to retrieve and treat the enum as the underlying base type rather than as a netcdf4 enum. My guess is that the functionality @JonathanGregory is wondering about would require new API functions e.g., |
Dear Charlie @czender Thanks for your explanation. That's helpful. Yes, I was hoping (in my comment here, and in the discussion) that no change would be required to existing code. I suppose that means changing the effect of the existing get functions so that an Enum is fetched as its storage type. Then a different API enquiry would be needed to discover if this basic type is being used as an Enum. However, if only a minimal change to existing code is required, such as you suggest, that would not be too bad as a price for supporting Enums in a minimal way. Best wishes Jonathan |
@JonathanGregory After further thought, I need to partially walk back my opinion (above) from yesterday. If a client program is "cautious enough" to allocate enough memory to hold the base type values of a netCDF4 enum variable, then it might succeed without any code modifications. Yesterday I said the program must know the base type of the enum. However, if the client just blindly reads the enum the standard nc_get_var() API and provides enough memory then it might work assuming that the netCDF library does not do a safety check that matches the type of the on-disk variable against the type of the buffer that the variable will be read into (the API requires specifying the type of destination buffer and it might use this information for safety checks). Whether netCDF does such a check, I don't know. Someone would need to test this to be sure. The library could, under the hood, just prevent the API from reading enums directly into integer types. Of course the attempt will definitely be buggy and likely fail if the client does not provide enough memory to hold the base type of the enum, e.g., if the client expects unsigned int32 and the base type is actually unsigned int64. Anyway, the situation is not quite as pessimistic as I thought yesterday. |
This strikes me as simply the nature of C :-) However, it seems the question, to some extent, is to what degree the netcdf-C lib should support CF out of the box -- I have no idea what Unidata thinks of this. Is there a C CF lib out there? Of course, Charlie has done a lot in NCO -- which is great! |
I haven't been working on this topic as much as I should have, but from my understanding Enums do not yet meet the criteria to be added to CF. For this, we would need to have:
[1] and have been met in the python xarray package, but this is merely one library and it required quite some work to get it merged. [2] was initially on the road plan for the same xarray PR, but we thought it would fit better on cf-xarray, so it's not yet implemented. So, I suggest we close this issue for now, perhaps to revisit if Enums gain interest from other parties (or if EUMETSAT is willing to invest a lot in open-source contributions to make [1] and [2] possible @MTG-Formats) |
I could not find in the current CF convention or in the discussions here if netCDF4's Enums are forbidden or not. I know that CF currently recommend to use flag_meanings|flag_values|flag_mask (see here or there), but there is a dedicated mean to define this kind of values in netCDF4 through Enum.
In short: I would like to know: is it CF valid to use netCDF4's Enum instead of a combination of flag_meanings and flag_values to describe an enumerated type ?
For reference:
Edit After some archeology (thanks Lars!), there are existing issues about this topic:
The text was updated successfully, but these errors were encountered: