-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for units #525
Comments
Eventually, I hope so! Unfortunately, doing this in a feasible and maintainable way will probably require upstream fixes in NumPy. In particular, better support for duck-array types (numpy/numpy#4164) and/or the ability to write units as a custom NumPy dtypes. Both of these are on the NumPy roadmap, though they don't have a timeframe for when that will happen. |
Astropy has pretty good units support: |
Unfortunately, the astropy approach uses a numpy.ndarray subclass, which means it's mutually exclusive with dask.array. Otherwise, it does look very nice, though. On Mon, Aug 17, 2015 at 8:38 AM, Ryan Abernathey [email protected]
|
@shoyer - as one who thinks unit support is probably the single best thing astropy has (and is co-maintainer of Alternatively, maybe it is easier to tag on the outside rather than the inside. This would also not seem to be that hard, given that astropy's |
@mhvk It would certainly be possible to extend dask.array to handle units, in either of the ways you suggest. Although you could allow So far, so good -- but with the current state of duck array typing in NumPy, it's really hard to be happy with this. Until Once we have all that duck-array stuff, then yes, you certainly could write all a duck-array |
@shoyer - fair enough, and sad we don't have For the outside method, from the dask perspective, it would indeed be easiest if units were done as a dtype, since then you can punt all the decisions to helper routines. My guess, though, is that it will be a while before numpy will include what is required to tell, e.g., that if I add something in
In |
p.s. For |
I am one of the authors of Pint and I was just pointed here by @arsenovic Pint does not subclass ndarray, it rathers wrap any numerical type dispatching to the wrapped value any attribute access that does not understand. By defining >>> import numpy as np
>>> import pint
>>> ureg = pint.UnitRegistry()
>>> [1., 4., 9.] * ureg.meter # a list is interpreted as an ndarray
<Quantity([1. 4. 9.], 'meter')>
>>> np.sqrt(_)
<Quantity([ 1. 2. 3.], 'meter ** 0.5')>
>>> _.sum()
<Quantity(6.0, 'meter ** 0.5')> I think something similar can be done for xarray. |
@hgrecco - for astropy's Aside: at some point I'd hope to get the various implementations of units to talk together: it would be good to have an API that works such that units are inter-operable. |
@hgrecco Are you suggesting that pint could wrap xarray objects, or that xarray could wrap pint? Either is certainly possible, though I'm a bit pessimistic that we can come up with a complete solution without the numpy fixes we've been discussing. Also, just to note, xarray contains a |
id just like to chime in and say that this feature would really be sweet. i always find myself doing a lot work to handle/convert different units. |
@shoyer When we prototyped Pint we tried putting Quantity objects inside numpy array. It was was working fine but the performance and memory hit was too large. We were convinced that our current design was right when we wrote the first code using it. The case might be different with xarray. It would be nice to see some code using xarray and units (as if this was an already implemented feature). @mhvk I do agree with your views. We also mention these limitations in the Pint documentation. Wrapping (instead of subclassing) adds another issue: some Numpy functions do not recognize a Quantity object as an array. Therefore any function that call In any case, as was mentioned before in the thread Custom dtypes and Duck typing will be great for this. In spite of this limitations, we chose wrapping because we want to support quantities even if NumPy is not installed. It has worked really nice for us, working in most of the common cases even for numpy arrays. Regarding interoperating, it will be great. It will be even better if we can move into one, blessed, solution under the pydata umbrella (or similar). |
Not to be pedantic, but just one more 👍 on ultimately implementing units support within xarray -- that would be huge. |
I agree that custom dtypes is the right solution (and I'll go dig some more there). In the meantime, I'm not sure why you couldn't wrap an xarray |
#988 describes a possible approach for allowing third-party libraries to add units to xarray. It's not as comprehensive as a custom dtype, but might be enough to be useful. |
+1 for units support. I agree, parametrised dtypes would be the preferred solution, but I don't want to wait that long (I would be willing to contribute to that end, but I'm afraid that would exceed my knowledge of numpy). I have never used dask. I understand that the support for dask arrays is a central feature for xarray. However, the way I see it, if one would put a (unit-aware) ndarray subclass into an xarray, then units should work out of the box. As you discussed, this seems not so easy to make work together with dask (particularly in a generic way). However, shouldn't that be an issue that the dask community anyway has to solve (i.e.: currently there is no way to use any units package together with dask, right)? In that sense, allowing such arrays inside xarrays would force users to choose between dask and units, which is something they have to do anyway. But for a big part of users, that would be a very quick way to units! Or am I missing something here? I'll just try to monkeypatch xarray to that end, and see how far I get... |
@burnpanck Take a look at the approach described in #988 and let me know if you think that sounds viable. NumPy subclasses inside xarray objects would probably mostly work, if we changed some internal uses of |
#988 would certainly allow to me to implement unit functionality on xarray, probably by leveraging an existing units package. |
Or another way to put it: While typical metadata/attributes are only relevant if you eventually read them (which is where you will notice if they were lost on the way), units are different: They work silently behind the scene at all times, even if you do not explicitly look for them. You want an addition to fail if units don't match, without having to explicitly first test if the operands have units. So what should the ufunc_hook do if it finds two Variables that don't seem to carry units, raise an exception? Most probably not, as that would prevent to use xarray at the same time without units. So if the units are lost on the way, you might never notice, but end up with wrong data. To me, that is just not unlikely enough to happen given the damage it can do (e.g. the time it takes to find out what's going on once you realise you get wrong data). |
So for now, I'm hunting for |
@burnpanck - thanks for the very well-posed description of why units are so useful not as some meta-data, but as an integral property. Of course, this is also why making them part of a new dtype is a great idea! But failing that, I'd agree that it has to be part of something like an Now, off-topic but still: what is a little less wonderful is that there seem to be quite a few independent units implementations around (even just in astronomy, there is that of |
Based the points raised by @crusaderky in hgrecco/pint#878 (comment) about how much special case handling xarray has for dask arrays, I was thinking recently about what it might take for the xarray > pint > dask.array wrapping discussed here and elsewhere to work as fluidly as xarray > dask.array currently does. Would it help for this integration to have pint Quanitites implement the dask custom collections interface for when it wraps a dask array? I would think that this would allow a pint Quanitity to behave in a "dask-array-like" way rather than just an "array-like" way. Then, instead of xarray checking for Also, if I'm incorrect with this line of thought, or there is a better way forward for implementing this wrapping pattern, please do let me know! |
Yes, great idea! |
Hi all, I was just pointed at this by someone who went to the NumFOCUS summit. I'm the lead developer for |
sorry for the late reply, @ngoldbaum. For There were a few unresolved design questions with how unit libraries should implement certain |
Hi! I'm just popping in as a very interested user of both xarray and unit packages to ask: since there's been some awesome progress made here and pint-xarray is now enough of A Thing to have documentation, though obviously experimental - how much work would you expect a corresponding package for astropy's Quantities to take, given the current state of things? Are there any limitations that would prevent that? I saw the discussion above about Quantities being more problematic due to taking the subclass-from-numpy-arrays route, but I'm not sure how much of a roadblock that still is. I would suspect the API could be shared with pint-xarray (which, obviously, is experimental for now). |
I would expect For reference, the currently remaining issues for all duck arrays (except obviously |
actually, that turns out to be wrong. Since xarray/xarray/core/variable.py Lines 231 to 245 in 7dbbdca
Adding |
I believe this can be closed via #5734 @andersy005? |
xarray/xarray/core/variable.py Lines 294 to 331 in 5fd50c5
@keewis it looks like adding an if block before line 323 would allow astropy quantities to be wrapped as you said, i.e.,
what else is needed to make this happen? I'd love to make some progress towards |
We've since made some progress with the automatic generation of tests for arbitrary duck arrays in xarray-array-testing, so it might be better to work on that instead of trying to generalize the massive (and probably stale) testsuite for testing unit support with That said, that's for testing the support, not the support itself. Implementing that should be fairly easy, we should just make sure to continue banning Edit: actually, |
One thing that bother me alot is that pandas lacks good unit support (e.g. quantities, pint, ...)
Is there a chance that xray will support it?
The text was updated successfully, but these errors were encountered: