Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insuring interoperability of pint with other (non-unit) array-like types #845

Closed
jthielen opened this issue Aug 21, 2019 · 8 comments · Fixed by #963
Closed

Insuring interoperability of pint with other (non-unit) array-like types #845

jthielen opened this issue Aug 21, 2019 · 8 comments · Fixed by #963
Labels
numpy Numpy related bug/enhancement
Milestone

Comments

@jthielen
Copy link
Contributor

jthielen commented Aug 21, 2019

When working with the new __array_function__ implementation in #764, seeing what it takes to get pint Quantity's working inside xarray (pydata/xarray#525), and trying to wrap my head around NEP 18, I realized that (to the best of what I could find) pint's expected behavior with other ndarray-like types is not well-documented or tested for. And so, I hope that this issue could be a discussion on clarifying those expected behaviors and determining how to document and test them.

To start that discussion, I think it would make sense to split other array-like types into two categories, those that should wrap pint, and those that should be wrapped by pint, or in other words, determine where pint should fall in the "type casting hierarchy". Based on pydata/xarray#525 (comment) and other comments I've seen, a split like the following seems to be desired:

Wrap pint Quantity

  • xarray DataArray

Wrapped by pint Quantity

  • dask array
  • cupy array
  • sparse array
  • numpy masked array (and of course ndarray)

(Pandas seems to have special handling through pint-pandas, and hopefully others that I'm missing can be added to either list as appropriate)

Does this make sense as a division, and should a list like this be documented anywhere?

Also, once #764 is in place, it seems like there will be a need for explicit identification of types in each category...types that should wrap pint are used in

if other.__class__.__name__ in ["PintArray", "Series"]:
, and types that should be wrapped by pint would be used with implementing the functionality mentioned in #764. However, what should be done about array-like types that are unknown to pint, and how can consistent behavior between different Quantity-related operations be assured?

Finally, should pint implement tests for all the types it expects to be able to wrap? I got the impression from pydata/xarray#525 and pydata/xarray#2956 that tests fit best with the type higher in the type casting hierarchy (the one doing the wrapping).

Please do correct me on anything I am mistaken about with the above rambling 😄.

xref pydata/xarray#525, pydata/xarray#2956, #764, #633, Unidata/MetPy#996

EDIT: Discussion about arbitrarily nested metadata with NEP 18 implementers is ongoing at dask/dask#5329, which will be good to keep in mind here too.

@shoyer
Copy link

shoyer commented Aug 27, 2019

From a coordination perspective, I think the main task is to ensure that there's a well defined type casting hierarchy among all NEP-18 duck arrays, so it's clear which project is responsible for handling which interaction. This will help avoid incompatible nestings, e.g., dask wrapping pint and pint wrapping dask.

This might be something that makes sense as a continuously updated "Informational NEP" defining a defacto community standard.

Here's a partial attempt, with most of the duck-arrays I can think of off hand. The arrow X -> Y means that X handles interactions between X and Y arrays:

The general principle here is that if library X can wrap Y arrays, then it should also handle their interaction. This keeps the logic about how different array library interact in one place, as much as possible.

Black lines indicate existing relationships; gray lines indicate proposed relationships.

@hgrecco
Copy link
Owner

hgrecco commented Aug 28, 2019

looks good to me

@shoyer
Copy link

shoyer commented Aug 28, 2019

Now generated with graphviz, which should be a little more maintainable:
image

@crusaderky
Copy link
Contributor

@shoyer I changed the style of the arrows to dotted where there is only planned or untested support, and added arrows pint -> sparse and pint-> cupy

https://gist.github.com/crusaderky/1b00fd68ae1ce79109b6af546c88bd4a

@pentschev
Copy link

Those are nice graphs @shoyer, thanks for doing that.

The general principle here is that if library X can wrap Y arrays, then it should also handle their interaction. This keeps the logic about how different array library interact in one place, as much as possible.

I'm wondering if this would be a guideline that would be worth to add to the NEP-18? I feel it's quite a good idea to make it more visible, even if elsewhere.

bors bot added a commit that referenced this issue Dec 11, 2019
905: NEP-18 Compatibility r=hgrecco a=jthielen

Building off of the implementation of `__array_function__` in #764, this PR adds compatibility with NEP-18 in Pint (mostly in the sense of Quantity having `__array_function__` and being wrappable as a duck array; for Quantity wrapping other duck arrays, see #845). Many tests are added of NumPy functions being used with Pint Quantities by way of `__array_function__`.

Accompanying changes that were needed as a part of this implementation include:
- a complete refactor of `__array_ufunc__` and ufunc attribute fallbacks to work in parallel with `__array_function__`
- promoting `_eq` in `quantity` to `eq` in `compat`
- preliminary handling of array-like compatibility by defining upcast types and attempting to wrap and defer to all others (a follow-up PR, or set of PRs, will be needed to completely address #845 / #878)

Closes #126 
Closes #396 
Closes #424
Closes #547 
Closes #553
Closes #617
Closes #619
Closes #682 
Closes #700
Closes #764
Closes #790
Closes #821

Co-authored-by: Jon Thielen <[email protected]>
@jthielen
Copy link
Contributor Author

jthielen commented Dec 11, 2019

Some progress towards this goal (at least within Pint) has been made with #905, however, I think the following are the high-level tasks remaining:

@hgrecco
Copy link
Owner

hgrecco commented Dec 11, 2019

Thanks @jthielen for summarizing this. I think is important to have input from people using heavily those packages to make it not only correct but also that it feels right. We can start by collecting some tests.

Another question that I have: should we have related packages (like pint-pandas) as a way to speed up development?

Some progress towards this goal (at least within Pint) has been made with #905,

A large progress has been made in #905, don't underestimate your work! 😄

@jthielen
Copy link
Contributor Author

jthielen commented Dec 26, 2019

In light of #955, I hope to be working on most of the areas listed above in the next couple days. However, the tests with downcast types are likely to not all be ready for 0.10 because

However, I think testing Sparse and masked arrays as downcast types should be sufficient for now.

Also, for simplicity, I think I'll just do the upcast type tests with xarray, and not worry about Pandas for now.

bors bot added a commit that referenced this issue Dec 27, 2019
959: Add tests and improve upcast type compatibility (part of #845) r=hgrecco a=jthielen

As a part of #845, this PR adds tests for upcast type compatibility with xarray (just tests of deferral/commutativity, for full integration tests, see @keewis's work in xarray's test suite). Along the way came a check for upcast types on Quantity creation (closing #479), changing to checking actual types rather than names (otherwise xarray's and uncertainties `Variable` conflict), exposing the upcast type collection, and adding the `@check_implemented` decorator to several arithmetic operations where it was missing.

Also, I hope that the xarray tests only being run on the latest available python and xarray versions is sufficient. If that should be changed, please let me know (if a more complete matrix is desired, it may be worth looking at reconfiguring the Travis configuration to use a [build matrix](https://docs.travis-ci.com/user/build-matrix/)).

- [x] Closes #479; Progress towards #845
- [x] Executed ``black -t py36 . && isort -rc . && flake8`` with no errors
- [x] The change is fully covered by automated unit tests
- [x] Documented in docs/ as appropriate
- [x] Added an entry to the CHANGES file


Co-authored-by: Jon Thielen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
numpy Numpy related bug/enhancement
Projects
None yet
5 participants