Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: wrap any array API standard compatible array #2101

Open
lucascolley opened this issue Dec 29, 2024 · 21 comments
Open

RFC: wrap any array API standard compatible array #2101

lucascolley opened this issue Dec 29, 2024 · 21 comments

Comments

@lucascolley
Copy link

Preface: I'm not familiar with the history behind https://pint.readthedocs.io/en/stable/user/numpy.html#Array-Type-Support, but I have experience adding support for alternative array types to SciPy. For context, see https://numpy.org/neps/nep-0056-array-api-main-namespace.html#abstract, which superseded NEP 30.


The array API standard is being utilised in libraries like SciPy and scikit-learn to add support for array libraries like CuPy, PyTorch, and JAX. It seems feasible that Pint could wrap any library which is compatible with the standard.

The model for how this can be done is demonstrated in https://github.com/mdhaber/marray. Marray adds masked array support to standard-compatible arrays. Similar to Pint's Quantity, an MArray is some data together with a mask:

MArray(
    Array([1, 2], dtype=array_api_strict.int64),
    Array([False,  True], dtype=array_api_strict.bool)
)

Marray exposes one function, get_namespace, which takes as input a standard-compatible namespace, and returns a new standard-compatible namespace with the additional capabilities of creating and correctly processing masked arrays. Under the hood, this namespace wraps each function to handle anything specific to masks, and otherwise dispatches to the underlying array namespace. So for pint, pint.get_namespace(torch) would return a namespace for pint-wrapped PyTorch tensors.

MArrays also expose this namespace via the __array_namespace__ method, which allows libraries to operate on them like any other standard array:

# get the namespace for input pint(torch) array x
xp = x.__array_namespace__()
y = xp.exp(x) # returns another pint(torch) array
z = xp.asarray([1, 2, 3]) # also a pint(torch) array
...

Marray currently passes almost the entirety of https://github.com/data-apis/array-api-tests. It would be cool to see if Pint could do the same!

This would bring a few benefits:

  • for users, they can use pint with whatever standard-compatible library they like
  • for libraries which depend on pint, they can also then work on supporting any standard-compatible arrays
  • users can pass Pint Quantitys to any library which supports standard-compatible arrays, like SciPy (eventually!)

@nabobalis I know you had expressed interest in this.

@lucascolley lucascolley changed the title RFC: wrap any array API standard compatible arrays RFC: wrap any array API standard compatible array Dec 29, 2024
@andrewgsavage
Copy link
Collaborator

yes, there is interest. there have been a few issues relating to this in the past but it requires someone to start it! I suggest we create a seperate repo for this as it allows the array code to be iterated and released faster and independently to pint. maybe pint-array or pint-array-api?

@lucascolley
Copy link
Author

alright! Extremely rough POC at https://github.com/lucascolley/pint-array:

In [1]: from pint import UnitRegistry

In [2]: from pint_array import get_namespace

In [3]: import array_api_strict as xp

In [4]: pxp = get_namespace(xp)

In [5]: ureg = UnitRegistry()

In [6]: x = pxp.asarray(xp.arange(6), units=ureg.meter)

In [7]: pxp.sum(x)
Out[7]: <Quantity(15, 'meter')>

In [9]: x = pxp.asarray(x, dtype=xp.float64)

In [10]: pxp.var(x)
Out[10]: <Quantity(2.9166666666666665, 'meter ** 2')>

In [11]: x = pxp.asarray(x, units=ureg.degC)

In [12]: pxp.sum(x)
---------------------------------------------------------------------------
OffsetUnitCalculusError                   Traceback (most recent call last)
Cell In[12], line 1
----> 1 pxp.sum(x)

File ~/programming/pint-array/pint_array/__init__.py:42, in get_namespace.<locals>.sum(x, axis, dtype, keepdims)
     40 units = x.units
     41 magnitude = xp.sum(x, axis=axis, dtype=dtype, keepdims=keepdims)
---> 42 units = (1 * units + 1 * units).units
     43 return UnitRegistry.Quantity(magnitude, units)

File ~/programming/pint-array/.pixi/envs/default/lib/python3.13/site-packages/pint/facets/plain/quantity.py:849, in PlainQuantity.__add__(self, other)
    846 if isinstance(other, datetime.datetime):
    847     return self.to_timedelta() + other
--> 849 return self._add_sub(other, operator.add)

File ~/programming/pint-array/.pixi/envs/default/lib/python3.13/site-packages/pint/facets/plain/quantity.py:101, in check_implemented.<locals>.wrapped(self, *args, **kwargs)
     99 elif isinstance(other, list) and other and isinstance(other[0], type(self)):
    100     return NotImplemented
--> 101 return f(self, *args, **kwargs)

File ~/programming/pint-array/.pixi/envs/default/lib/python3.13/site-packages/pint/facets/plain/quantity.py:825, in PlainQuantity._add_sub(self, other, op)
    823     units = other._units
    824 else:
--> 825     raise OffsetUnitCalculusError(self._units, other._units)
    827 return self.__class__(magnitude, units)

OffsetUnitCalculusError: Ambiguous operation with offset unit (degree_Celsius, degree_Celsius). See https://pint.readthedocs.io/en/stable/user/nonmult.html for guidance.

With a bit of work it should be relatively straightforward based on the work already done to wrap NumPy (I hope).

@andrewgsavage
Copy link
Collaborator

Thanks, that's very helpful to understand how it works.

Assuming the function signatures are the same for numpy asn the array API, it may be possible to change the implements decorator to also create functions for pint_namespace
That would be fine to do in pint.

@lucascolley
Copy link
Author

lucascolley commented Dec 30, 2024

Assuming the function signatures are the same for numpy asn the array API, it may be possible to change the implements decorator to also create functions for pint_namespace
That would be fine to do in pint.

Yes, it seems like we should be able to reuse a lot of that code. Given that NumpyWrapper supports functions (and ufuncs) which aren't in the array API standard, I think it will be easier to keep them separate for now.

I just pushed a change to the repo such that we can define an ArrayUnitQuantity, which is a pint.Quantity with additional methods from ArrayQuantity. Since the return value of the __array_namespace__ method is the pint_namespace which is itself defined using ArrayUnitQuantitys, I'm not sure yet how ArrayQuantity could become a standalone facet to integrate into pint.UnitRegistry. But maybe that isn't needed!

@mhvk
Copy link

mhvk commented Dec 30, 2024

xref astropy/astropy#17588

This is probably as good a moment as any to mention that @nstarman and I have been working on a new Quantity implementation that aims to be array API compatible and agnostic of the unit system - see https://github.com/astropy/quantity-2.0 and the design considerations at astropy/astropy-APEs#91 (especially the PDF linked on top).

As our hope in part is to replace astropy's Quantity, our minimal goal is to provide full coverage of the array API; it would be the full numpy API for ndarray (i.e., including out arguments, specializations of ufuncs like np.add.{at, reduce, reduceat, accumulate}, and gufuncs.

Anyway, the part that is perhaps most relevant here is the unit-agnostic part - this would need some standardization of how units interact with arrays, perhaps new dunder methods of some agreement on operators (e.g., astropy can attach a unit via value << unit - maybe that could become a standard?)

@nstarman
Copy link

A shared units & Quantity API would be fantastic!
I would vote for defining a quantity-api libraries with some runtime-checkable Protocols.
I mocked up something along these lines in https://github.com/nstarman/units/tree/main/src/units/api

@andrewgsavage
Copy link
Collaborator

@mhvk , when you say unit system, do you mean the unit module, pint or astropy.units etc?

the linked pdf refers to a Quantity API. If I am understanding it correctly, if pint, astropy.units and any other unit library followed it, then there could be a quantity-array library that works for both pint and astropy.units.

could then make pint-pandas and pint-xarray work for astropy and others too

@lucascolley
Copy link
Author

Yep, that sounds perfect. The namespaces I'm building in pint-array just need an array library, a quantity class from which to inherit, and some units compatible with that quantity. Currently, that's pint.Quantity and pint.UnitRegistry, but it should be relatively simple to swap those out for a new standard if you folks can figure one out!

That's probably the hard part; in the meantime, I'll work on getting these array-quantity namespaces up to standard-compliance, while attempting to do the right thing with units. I could probably use some help writing tests for correct unit-handling at some point!

@mhvk
Copy link

mhvk commented Dec 30, 2024

@lucascolley - spurred on by this conversation, I've pushed my implementation of ufucs/element-wise operations to astropy/quantity-2.0#22 - tests/test_element_wise_functions.py may be fairly directly useful to you.

@andrewgsavage
Copy link
Collaborator

Yes, there aren't all that many classes and functions needed for the array API, based on pint's numpy array_function implementation. I think a Quantity API defining the classes and functions needed to write the pint-array library would be a good first version. Is https://github.com/astropy/quantity-2.0 the best place for such a discussion on a Quantity API?

There are a load of numpy tests that have been added as people implement numpy functions.

@lucascolley
Copy link
Author

What do we think about creation-like functions such as ones_like? Should the result be dimensionless or have the same units as the input array? Pint currently returns a dimensionless array for NumPy arrays.

@andrewgsavage
Copy link
Collaborator

that was being changed to have the same units as the input array, but has got stuck #882

@lucascolley
Copy link
Author

lucascolley commented Dec 31, 2024

Up until this point everything has been working with pint.UnitRegistry. I'm now trying to implement https://data-apis.org/array-api/latest/API_specification/generated/array_api.concat.html#array_api.concat, my implementation of which makes use of https://pint.readthedocs.io/en/stable/api/facets.html#pint.facets.plain.PlainQuantity.m_as. That takes us through to this generic _convert function:

# factor is type float and if our magnitude is type Decimal then
# must first convert to Decimal before we can '*' the values
if isinstance(value, Decimal):
factor = Decimal(str(factor))
elif isinstance(value, Fraction):
factor = Fraction(str(factor))
if inplace:
value *= factor
else:
value = value * factor

In the array API standard:

Behavior is not specified when mixing a Python float and an array with an integer data type; this may give float32, float64, or raise an exception.

and so the multiplication here can throw an exception for integer arrays.

Do you think we could try to address this problem in the plain registry @andrewgsavage ? It will be a bit hacky without adding proper array API awareness to Pint, but I think something like

if hasattr(value, __array_namespace__) and value.__array_namespace__().isdtype(value.dtype, "integral"):
    factor = int(factor)

might fix the integer array case.

I suppose the alternative is for me to subclass UnitRegistry and override the _convert method. But it might be nice to avoid that if possible?

@andrewgsavage
Copy link
Collaborator

The conversion factor should not be converted to an int - that would lead to things like 4 inches converting to 8cm instead of 10cm. Can this be handled within the concat implementation by special casing the integer arrays to check consistent units?

@lucascolley
Copy link
Author

ah okay, so if consolidating the units would require a non-integral conversion factor and we have integer arrays, we can raise. Could you point me to a helper in Pint which checks for such a factor, if it exists already? Or otherwise, how would be best to go about writing it?

@andrewgsavage
Copy link
Collaborator

I don't think we have that. I'd either raise or return a float array. Do you want to use your repo for these discussions to keep this cleaner?

@lucascolley
Copy link
Author

lucascolley commented Jan 1, 2025

I was just made aware of @SimonHeybrock's https://pydims.github.io/pydims/index.html, which adds named dimensions and units to array API standard compatible arrays. It supports units from both astropy.units and pint, so it sounds like you should be involved in any discussion of standardising a unit/quantity API @SimonHeybrock.

Since pydims adds named dimensions, its arrays can't comply with the standard for the same reasons as xarray. So the scopes of pydims and (what is currently called) pint-array are orthogonal in that regard. pint-array should definitely look towards supporting both forms of units like pydims does, but perhaps that will be easier or superseded if we wait for the work that @nstarman and @mhvk have in mind.

@lucascolley
Copy link
Author

alright, https://github.com/lucascolley/pint-array is now passing array-api-tests 🎉 ! The behaviour with units is currently untested and known to be quite buggy, but all of the building blocks are there.

I probably won't have time to work on proper tests, documentation etc. until this summer following my university exams, but contributions are very welcome!

@lucascolley
Copy link
Author

lucascolley commented Jan 4, 2025

Example of unit-aware integration now working with SciPy 1.15.0 (released yesterday)!1

EDIT: I'm not sure the unit behaviour is correct yet here - I think we would want pint^2 for the first result and pint^3 for the second? EDIT 2: I think this is because I haven't correctly defined the units for __gt__ yet

EDIT 3: fixed now!

In [1]: import pint_array; import numpy as np; pnp = pint_array.pint_namespace(np); from pint import UnitRegistry; ureg = UnitRegistry()

In [2]: from scipy.integrate import cubature

In [3]: def f(x, n):
   ...:     """f(x) = n * x"""
   ...:     return n*x
   ...: 

In [4]: a = pnp.asarray([0], units=ureg.pint)

In [5]: b = pnp.asarray([1], units=ureg.pint)

In [6]: n = pnp.arange(10)

In [7]: res = cubature(
   ...:     f,
   ...:     a=a,
   ...:     b=b,
   ...:     args=(n,),
   ...: )

In [8]: res.estimate
Out[8]: 
<Quantity(
  array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5]),
  'pint ** 2'
)>

In [9]: n.magnitude * (1**2) / 2 # closed form
Out[9]: array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [10]: res.error
Out[10]: 
<Quantity(
  array([0.00000000e+00, 9.02056208e-17, 1.80411242e-16, 3.46944695e-16,
         3.60822483e-16, 1.11022302e-16, 6.93889390e-16, 2.49800181e-16,
         7.21644966e-16, 1.66533454e-15]),
  'pint ** 2'
)>

In [11]: def f(x, n):
    ...:     """f(x) = n * (x * x)"""
    ...:     return n * (x * x)
    ...: 

In [12]: res = cubature(
    ...:     f,
    ...:     a=a,
    ...:     b=b,
    ...:     args=(n,),
    ...: )

In [13]: res.estimate
Out[13]: 
<Quantity(
  array([0.        , 0.33333333, 0.66666667, 1.        , 1.33333333,
         1.66666667, 2.        , 2.33333333, 2.66666667, 3.        ]),
  'pint ** 3'
)>

In [14]: n.magnitude * (1**3) / 3 # closed form
Out[14]: 
array([0.        , 0.33333333, 0.66666667, 1.        , 1.33333333,
       1.66666667, 2.        , 2.33333333, 2.66666667, 3.        ])

In [15]: res.error
Out[15]: 
<Quantity(
  array([0.00000000e+00, 2.77555756e-17, 5.55111512e-17, 3.33066907e-16,
         1.11022302e-16, 4.44089210e-16, 6.66133815e-16, 2.49800181e-16,
         2.22044605e-16, 3.33066907e-16]),
  'pint ** 3'
)>

Still need to fix the unit behaviour of quite a few functions/methods, but this is promising :)

Footnotes

  1. you'll need to set the env var SCIPY_ARRAY_API=1

@SimonHeybrock
Copy link

I was just made aware of @SimonHeybrock's https://pydims.github.io/pydims/index.html, which adds named dimensions and units to array API standard compatible arrays. It supports units from both astropy.units and pint, so it sounds like you should be involved in any discussion of standardising a unit/quantity API @SimonHeybrock.

Since pydims adds named dimensions, its arrays can't comply with the standard for the same reasons as xarray. So the scopes of pydims and (what is currently called) pint-array are orthogonal in that regard. pint-array should definitely look towards supporting both forms of units like pydims does, but perhaps that will be easier or superseded if we wait for the work that @nstarman and @mhvk have in mind.

Thanks for the shout-out @lucascolley! I would like to note that I am not actively working on PyDims. It was (for now) meant to gauge the communities interest in the approach, including in particular a "Python Units API Standard".

As an aside, there is now yet another Python units library, since @phlptp added Python bindings to https://github.com/LLNL/units.

@andrewgsavage
Copy link
Collaborator

@SimonHeybrock I have a made a topic proposing collabration on a Quantity API astropy/quantity-2.0#23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants