-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow _FillValue and missing_value to differ (Fixes #1749) #2016
Conversation
Test failures look unrelated--they also happen to me locally. Installing SciPy 1.0.0 locally fixes them, so I'm guessing they're caused by SciPy 1.0.1. |
please merge in master, which fixes the failing tests. |
Rebased on current master. |
Fixed flake8 (unnecessary import) |
Reworked due to corner case with PyNio on Python 2.7. Final solution looks simpler to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me!
xarray/coding/variables.py
Outdated
if raw_fill_values: | ||
encoded_fill_values = [fv for option in raw_fill_values | ||
for fv in np.ravel(option) | ||
if not pd.isnull(fv)] | ||
|
||
if len(encoded_fill_values) > 1: | ||
warnings.warn("variable {!r} has multiple fill values {}, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this warning this useful? Or should it be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't need it. I'm not sure if others would like to be made of this case, though. I think my thoughts on leaving it were that it might be good to know since it would affect round-tripping.
One thing I notice now is that this will now fire if missing_value
and _FillValue
are both present, regardless of their values. That's probably a bad thing. So I guess I'm 👍 on removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it probably makes sense to keep if it's an indication that round-tripping will likely fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, leaving the warning. I updated the comprehensions to make a set, so that should suppress false positives with the warning.
xarray/tests/test_conventions.py
Outdated
@@ -156,12 +156,16 @@ def test(self): | |||
|
|||
|
|||
def test_decode_cf_with_conflicting_fill_missing_value(): | |||
var = Variable(['t'], np.arange(10), | |||
expected = Variable(['t'], [np.nan, np.nan, 2], {'units': 'foobar'}) | |||
expected[[0, 1]] = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is unnecessary, since you already set these values to NaN when you construct the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, leftover from trying to get the thing to work. 😁
The CF standard permits both values, and them to have different values, so we should not be treating this as an error--just mask out all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this looks good to me. I'll merge this in a day or two assuming no other comments.
thanks! |
It seems that this PR exposed a new encoding key 'missing_value', and this makes the encoding fail to round-trip, as it is not explicitly allowed in |
@shoyer Happy to put in a PR to address...provided you can tell me what that should actually look like. 😁 |
Well, we're not going to perfectly round-trip data that uses both But we should certainly not raise errors when round-tripping. I see a few reasonable options:
As a regression test, we should verify that |
The CF standard permits both values, and them to have different values, so we should not be treating this as an error--just mask out all of them.
whats-new.rst
for all changes andapi.rst
for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)