-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rewrite quantify and dequantify #17
Conversation
I still have to update the docstrings, but other than that this should be ready for review. The failing tests are due to a design question: should we raise for I have been answering both questions with "no", but I didn't think too much before I wrote the code in obj = ds.pint.dequantify()
... # more code
ds.pint.dequantify() # or always dequantify in a function, e.g. before saving to a file |
I would say yes. There is a distinction between specifying a dimensionless unit and not specifying a unit. @jthielen how do the CF conventions store "dimensionless" under the
Similarly I think yes. The user thought there were units on the data, but were mistaken, so should be informed. If they want to cover both possibilities then they perhaps they should use a try except? I am imagining the end-goal use of this package as always assuming the user is following this pattern:
This approach would also make round-tripping simpler. |
I think the design decision comes down to what
I've always imagined it to be the latter due to its ease of use, and that is exactly why I proposed the names "quantify" and "dequantify" in hgrecco/pint#849 (and implemented it this way in MetPy's accessor). My responses to the questions below assume this point-of-view. That being said, if it is instead decided upon to be the former, then I could definitely see the point in erroring on
I was going to say no here, since in Pint's view there is no distinction between specifying a dimensionless unit and not specifying a unit... The CF canonical unit for dimensionless is "1". However, in many netCDF files I've seen "out in the wild" that do not fully conform to the CF conventions, they represent dimensionless variables with an empty unit string or no units attribute at all. Since this is still
Again, I would say no...that instead of erroring, |
Thanks for the detailed reply @jthielen . I think your points are much stronger than mine, and now think that might be the best way to go!
I did not realise this.
That's a good point. Quirks with CF conventions should be handled by CF-specific code instead of pre-empted. Perhaps the better way to think about this (and what you seem to be proposing) is that |
Indeed! That's a much clearer way of phrasing it. |
the update of the docstrings is done and I fixed / removed the failing tests. Something else we should probably discuss is whether or not to remove the "units" attribute in I'm leaning towards removing since that way the attributes cannot be out-of-sync, but we could also add a keyword argument to control that. |
Yes 👍 The units are now on the array itself so I think this is sensible. And as you point out, it won't be out of sync. Also |
Codecov Report
@@ Coverage Diff @@
## master #17 +/- ##
==========================================
+ Coverage 82.41% 86.91% +4.50%
==========================================
Files 7 7
Lines 614 688 +74
==========================================
+ Hits 506 598 +92
+ Misses 108 90 -18
Continue to review full report at Codecov.
|
Yes it does! Thanks for adding that. While it would be nice to not have to worry about the issue in the first place, making |
then this should be ready for review and merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation-wise, this looks good to me! I just have two documentation questions:
- how should we clarify to users that non-index coordinates are being quantified, but index coordinates are not at this point?
- should probably document removal of units attribute (see comments below)
pint_xarray/accessors.py
Outdated
dict-like, it should map a variable name to the desired | ||
unit (use the DataArray's name to refer to its data). If | ||
not provided, will try to read them from | ||
``DataArray.attrs['units']`` using pint's parser. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps good to comment something to the effect of "this will remove the 'units' attribute from DataArray.attrs"?
pint_xarray/accessors.py
Outdated
Dataset. It should map variable names to units (unit names | ||
or ``pint.Unit`` objects). If not provided, will try to | ||
read them from ``Dataset[var].attrs['units']`` using | ||
pint's parser. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here.
I was thinking about this, but then I thought I'd leave that to the PR that adds pseudo-support for indexes. Rereading the code, this not what happens: right now, it tries to quantify every variable, even if it is a dimension variable. This is something we want in the future, but right now that would mean we discard the units attribute (and get a UnitStrippedWarning). I'll make sure the attribute is properly set/updated and mention that in the documentation.
same here, I'll update the documentation |
which is important because modifying in-place makes it difficult to test.
done, I think? I updated the docstrings and modified the attribute extraction function to never change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Just one little pedantic comment that doesn't need to hold up merging if you don't want to.
Co-authored-by: Jon Thielen <[email protected]>
that's a simple fix, so no problem. We have lots of these in |
Since we now have the utils from #11,
quantify
anddequantify
can be extended to also support coordinates.I'm also hoping to fix #9, either by changing the check in
Variable.data
(I need to think more about the implications of that) or by stating that this will load non-dask arrays into memory.