Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for netCDF4.EnumType #8147
Add support for netCDF4.EnumType #8147
Changes from 33 commits
f1bc33b
4da8938
ab53970
75e00c7
e1d51e3
95e30b2
a3160c5
59ef686
d135be2
8c12e50
e8f4872
f481e1f
951ea32
ec3c90a
dcc1254
55927f1
9e9c62c
5f1bffc
5189c74
2410c2e
4b966ba
9273a1d
cbfadad
ca043a7
ee3dc00
892b2b6
9ab1ad1
7219b99
2aa119f
da43a10
096f021
26bb8ce
d21d73a
81a4bec
b114ccc
6376a13
89a8751
ac20a40
d515e0d
5c66563
d62ac29
f834ede
9a3980a
2a3103f
f22046d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no
enum
attribute on-disk for plain CF style enum. Testing forflag_meaning
,flag_values
should be sufficient. Problem: We would need to invent an enum name.If we can decode from
flag_meaning
,flag_values
to Python enum, it would be good to be able to roundtrip (implement.encoding
Python enum toflag_meaning
,flag_values
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we always want to decode CF flag_* into python enums ? Having this test also on "attrs['enum']" would avoid decoding them when this is not wanted.
As for the encoding, I had previously a
EnumCoder::encode
method that turns an python Enum into CF flag_*, but encoders are called before thenetCDF4_::open_store_variable
method. So within open_store_variable I would need to decode the flag_* again and turn them into a netCDF4 enum, which defies the goal of having a python Enum, no ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, only when
decode_enum=True
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view, there are two things here. Encoding/decoding python enum <-> CF flag_* and being able to read/write netCDF4 enum type.
The current approach using attrs["enum"] to represent netCDF4 enum in xarray is straightforward and works very well.
We might just keep the CF stuff for another PR and get the netCDF4 enum backend feature in. @dcherian, do you have suggestions to move forward here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with limiting to just the netCDF type, and skipping the
flag_*
stuff.Another thought I just had is whether we can use the
dtype
attribute instead. But we need two pieces of information: theEnum
dict and thedtype
of the values. Shall we stick theEnum
dict indtype
and preserve thedtype
of the array as the dtype on disk? Is there a usecase for encoding the values as a different dtype at all?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can store the dtype within the enum with the
type
argument:encoding["dtype"] = Enum(e_name, e_dict, type=data.dtype.type)
but I don't know what are the caveats of this approach.Alright, sorry for all these back and forths. I will try to focus on the netCDF Enum encoding/decoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bzah No need to apologize. :-) This stuff soon gets very complex and finding the best approach is based on multiple iterations.
I've thought a while on @dcherians comment above. We might be able to do something like this in
open_store_variable
:This will use numpy's metadata to store the enum at the dtype. This is essentially the same as h5py is doing (without the
enum_name
). We could even provide that dtype to the variable itself. It's not that problematic to create a python enum from that dtype.In
prepare_variable
we would just have to do the same thing analog to your current datatype extraction, but now from the variable.dtypeFor that to work we would either need to implement EnumCoder.encoding to add back the metadata (it might have been stripped by processing) to the variable.dtype or fix
NonStringCoder
to do this (preferred, so we can reserve the EnumCoder for the flag_* stuff).By using that approach we get the dtype and the enum (in dtype.metadata).
Update: fixed example code