-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CF: also decode time bounds when available #2571
Conversation
Hello @fmaussion! Thanks for submitting the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together @fmaussion! This seems like the right place to handle this.
One minor design question here is: how strictly do we want to adhere to roundtripping variable attributes in Datasets? With this implementation, if one decodes the times, and then saves things back out to a file, a "time bounds" variable will contain units
and calendar
attributes even though it might not have them initially. I'm not sure if this is a big deal (in this case it is simply adding redundant metadata), but I just wanted to bring it up in case it was a concern.
xarray/tests/test_coding_times.py
Outdated
@@ -624,6 +624,62 @@ def test_decode_cf(calendar): | |||
assert ds.test.dtype == np.dtype('M8[ns]') | |||
|
|||
|
|||
def test_decode_cf_time_bounds(): | |||
|
|||
da = DataArray(np.arange(6).reshape((3, 2)), coords={'time': [1, 2, 3]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it make sense to make this DataArray a fixture
and split these cases into separate tests?
xarray/conventions.py
Outdated
new_attrs['calendar'] = attrs['calendar'] | ||
to_update[attrs['bounds']] = new_attrs | ||
# For all bound variables, update their time attribute if not present | ||
for k, new_attrs in to_update.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could update the attributes of the potential bounds variables immediately within the for v in variables.values()
loop rather than use another loop here (this would remove the need for the to_update
dictionary).
xarray/conventions.py
Outdated
@@ -350,6 +350,25 @@ def stackable(dim): | |||
drop_variables = [] | |||
drop_variables = set(drop_variables) | |||
|
|||
# Time bounds coordinates might miss the decoding attributes | |||
# https://github.com/pydata/xarray/issues/2565 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it also make sense to add a comment with the URL for the "Cell Boundaries" section of the CF conventions here?
xarray/conventions.py
Outdated
to_update = dict() | ||
for v in variables.values(): | ||
attrs = v.attrs | ||
if ('units' in attrs and 'since' in attrs['units'] and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be cleaner to write something like:
has_date_units = 'units' in attrs and 'since' in attrs['units']
if has_date_units and 'bounds' in attrs:
Could we move this logic into a separate helper function? That would make things a little better organized and easier to test. |
I addressed all comments except the fixtures, which seemed a bit overkill (but I simplified the tests). Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fmaussion, just a minor optional naming suggestion, otherwise this looks good to me!
xarray/conventions.py
Outdated
has_date_units = 'units' in attrs and 'since' in attrs['units'] | ||
if has_date_units and 'bounds' in attrs: | ||
if attrs['bounds'] in variables: | ||
to_update = variables[attrs['bounds']].attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a clearer name would be bounds_attrs
?
to_update = variables[attrs['bounds']].attrs | |
bounds_attrs = variables[attrs['bounds']].attrs |
Thanks! Made the change as requested. Regarding the general design:
I agree - I don't think it is a big deal either. It must also be noted that this might break some code in downstream libraries (including mine) which has built workarounds for this and will now error because the bounds are already decoded. Here also, I'm quite confident that this is an edge case but its worth mentioning. Should I put my what's new in "Breaking changes" instead? |
I for sure see this perspective. I also think a plausible case could be made that this change is like a "bug fix," that is something that people may have needed to work around before in various ways, but ultimately should not have needed to. So I think it's up to you what you think is best. If you do decide to shift it to the breaking changes section, I would suggest being a little more specific about under what circumstances the behavior is changing (i.e. this only impacts the behavior for time bounds variables defined via CF conventions that do not already have "units" and "calendar" attributes). |
+1 for putting this in "Breaking changes" (we try to be pretty conservative with the definition of a strict "Enhancement"), but otherwise this looks good to me. |
Sorry for being so slow. I just done it, will merge tomorrow once the tests pass. |
* master: DEP: drop python 2 support and associated ci mods (pydata#2637) TST: silence warnings from bottleneck (pydata#2638) revert to dev version DOC: fix docstrings and doc build for 0.11.1 Source encoding always set when opening datasets (pydata#2626) Add flake check to travis (pydata#2632) Fix dayofweek and dayofyear attributes from dates generated by cftime_range (pydata#2633) silence import warning (pydata#2635) fill_value in shift (pydata#2470) Flake fixed (pydata#2629) Allow passing of positional arguments in `apply` for Groupby objects (pydata#2413) Fix failure in time encoding for pandas < 0.21.1 (pydata#2630) Fix multiindex selection (pydata#2621) Close files when CachingFileManager is garbage collected (pydata#2595) added some logic to deal with rasterio objects in addition to filepaths (pydata#2589) Get 0d slices of ndarrays directly from indexing (pydata#2625) FIX Don't raise a deprecation warning for xarray.ufuncs.{angle,iscomplex} (pydata#2615) CF: also decode time bounds when available (pydata#2571)
whats-new.rst
for all changes andapi.rst
for new APINot sure if this is the best way to handle it, but it seems to work