Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow _FillValue and missing_value to differ (Fixes #1749) #2016

Merged
merged 2 commits into from
Mar 31, 2018

Conversation

dopplershift
Copy link
Contributor

@dopplershift dopplershift commented Mar 26, 2018

The CF standard permits both values, and them to have different values, so we should not be treating this as an error--just mask out all of them.

  • Closes _FillValue and missing_value not allowed for the same variable #1749 (remove if there is no corresponding issue, which should only be the case for minor changes)
  • Tests added (for all bug fixes or enhancements)
  • Tests passed (for all non-documentation changes)
  • Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

@dopplershift
Copy link
Contributor Author

Test failures look unrelated--they also happen to me locally. Installing SciPy 1.0.0 locally fixes them, so I'm guessing they're caused by SciPy 1.0.1.

@shoyer
Copy link
Member

shoyer commented Mar 28, 2018

please merge in master, which fixes the failing tests.

@dopplershift
Copy link
Contributor Author

Rebased on current master.

@dopplershift
Copy link
Contributor Author

Fixed flake8 (unnecessary import)

@dopplershift
Copy link
Contributor Author

Reworked due to corner case with PyNio on Python 2.7. Final solution looks simpler to me.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me!

if raw_fill_values:
encoded_fill_values = [fv for option in raw_fill_values
for fv in np.ravel(option)
if not pd.isnull(fv)]

if len(encoded_fill_values) > 1:
warnings.warn("variable {!r} has multiple fill values {}, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this warning this useful? Or should it be removed?

Copy link
Contributor Author

@dopplershift dopplershift Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't need it. I'm not sure if others would like to be made of this case, though. I think my thoughts on leaving it were that it might be good to know since it would affect round-tripping.

One thing I notice now is that this will now fire if missing_value and _FillValue are both present, regardless of their values. That's probably a bad thing. So I guess I'm 👍 on removing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it probably makes sense to keep if it's an indication that round-tripping will likely fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, leaving the warning. I updated the comprehensions to make a set, so that should suppress false positives with the warning.

@@ -156,12 +156,16 @@ def test(self):


def test_decode_cf_with_conflicting_fill_missing_value():
var = Variable(['t'], np.arange(10),
expected = Variable(['t'], [np.nan, np.nan, 2], {'units': 'foobar'})
expected[[0, 1]] = np.nan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is unnecessary, since you already set these values to NaN when you construct the variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, leftover from trying to get the thing to work. 😁

The CF standard permits both values, and them to have different values,
so we should not be treating this as an error--just mask out all of
them.
Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this looks good to me. I'll merge this in a day or two assuming no other comments.

@shoyer shoyer merged commit 1b48ac8 into pydata:master Mar 31, 2018
@shoyer
Copy link
Member

shoyer commented Mar 31, 2018

thanks!

@dopplershift dopplershift deleted the fix-1749 branch March 31, 2018 19:59
@himikof
Copy link

himikof commented Apr 19, 2018

It seems that this PR exposed a new encoding key 'missing_value', and this makes the encoding fail to round-trip, as it is not explicitly allowed in xarray.backends.netCDF4_._extract_nc4_variable_encoding function (similar to #1869).
This leads to the following error when saving with an explicitly preserved encoding:
ValueError: unexpected encoding parameters for 'netCDF4' backend: ['missing_value']

@dopplershift
Copy link
Contributor Author

@shoyer Happy to put in a PR to address...provided you can tell me what that should actually look like. 😁

@shoyer
Copy link
Member

shoyer commented Apr 20, 2018

Well, we're not going to perfectly round-trip data that uses both _FillValue and missing_value. That isn't possible with xarray's data model.

But we should certainly not raise errors when round-tripping. I see a few reasonable options:

  • Don't store missing_value in encoding. This would entail removing the use of pop_to with missing_value in decode().
  • Add some logic to encode() to check which of _FillValue and/or missing_value are defined. Use the attribute is defined and write it to attribute. If both are defined, ignore/drop missing_value.

As a regression test, we should verify that code.encode(coder.decode(var)) == var (including explicitly checking encoding).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

_FillValue and missing_value not allowed for the same variable
3 participants