Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for NumPy 2.0 #77

Closed
asmeurer opened this issue Jan 8, 2024 · 13 comments · Fixed by #79
Closed

Support for NumPy 2.0 #77

asmeurer opened this issue Jan 8, 2024 · 13 comments · Fixed by #79

Comments

@asmeurer
Copy link
Member

asmeurer commented Jan 8, 2024

To support NumPy 2.0, we should

  • Add CI to test 2.0 nightlies. We can use python -m pip install --pre --upgrade --extra-index https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy.

  • Decide if we want to support just NumPy 2.0 or both NumPy 2.0 and NumPy 1.26.

  • If we decide to support both, we should check if any of the wrappers here will break with 2.0 having fixed the underlying behavior. Otherwise, I don't think it's that important to wrap conditionally, unless there is a good reason to remove the wrapper when it isn't needed for performance.

  • If we aren't supporting both we can remove a good number of wrappers.

Personally I think we should support both, at least for the time being.

@rgommers
Copy link
Member

rgommers commented Jan 9, 2024

+1 for supporting both. Probably just following the timeline of https://scientific-python.org/specs/spec-0000/, since that's what downstream users of this package use. So the minimum version that is in use right now is around 1.23.x. There won't be differences of interest between 1.23.x and 1.26.x probably, but it may be useful to document that SPEC 0 is followed.

IIRC there is already one CI job with NumPy 1.21

@rgommers
Copy link
Member

I just noticed that this code now bypasses the compat layer completely, since numpy.ndarray gained a __array_namespace__ method for 2.0:

if hasattr(x, '__array_namespace__'):
namespaces.add(x.__array_namespace__(api_version=api_version))
elif _is_numpy_array(x):
_check_api_version(api_version)
if _use_compat:
from .. import numpy as numpy_namespace
namespaces.add(numpy_namespace)
else:
import numpy as np
namespaces.add(np)

The order of elif's should probably be reversed?

@rgommers
Copy link
Member

The order of elif's should probably be reversed?

This fix seems indeed necessary and somewhat urgent. Bypassing of the compat layer otherwise causes numpy main to not work at all with SciPy, and I assume scikit-learn too. @asmeurer WDYT?

@lucascolley
Copy link
Member

lucascolley commented Jan 17, 2024

I would suggest moving all of the _is_*_array conditions above the hasattr(x, '__array_namespace__') but add a comment to remove each branch once 100% compat is achieved. I imagine the initial motivation was to discourage libraries from implementing __array_namespace__ if they are not actually compliant.

@asmeurer
Copy link
Member Author

I've fixed that at #79. I plan to implement the other things there too, but if we need that fix specifically soon we can merge it now and do a release.

@rgommers
Copy link
Member

That would be nice I think - once CI on gh-79 is green and it's ready for testing with SciPy please ping me. Happy to help move this along.

@asmeurer
Copy link
Member Author

I'm trying my best with the CI (#80), but my point was that if this is blocking this, we should just ignore the CI issues for now.

@asmeurer
Copy link
Member Author

So ideally, we should be able to just leave most of the helper intact and use them for both 1.26 and 2.0. There will be extra redundant wrapping for most of the functionality in 2.0, but it will make things a lot easier for us to maintain, since we won't have to do version checks, and this will also let us keep the same code for cupy (no idea what the cupy plans for migrating to numpy 2.0), and dask.

I read through the numpy wrapper code to see if there's anything that popped out at me as something that could potentially break with this. Here's what I noticed:

  • asarray currently raises NotImplementedError if copy=False (

    def isdtype(
    dtype: Dtype, kind: Union[Dtype, str, Tuple[Union[Dtype, str], ...]], xp,
    *, _tuple=True, # Disallow nested tuples
    ) -> bool:
    """
    Returns a boolean indicating whether a provided dtype is of a specified data type ``kind``.
    Note that outside of this function, this compat library does not yet fully
    support complex numbers.
    See
    https://data-apis.org/array-api/latest/API_specification/generated/array_api.isdtype.html
    for more details
    """
    if isinstance(kind, tuple) and _tuple:
    return any(isdtype(dtype, k, xp, _tuple=False) for k in kind)
    elif isinstance(kind, str):
    if kind == 'bool':
    return dtype == xp.bool_
    elif kind == 'signed integer':
    return xp.issubdtype(dtype, xp.signedinteger)
    elif kind == 'unsigned integer':
    return xp.issubdtype(dtype, xp.unsignedinteger)
    elif kind == 'integral':
    return xp.issubdtype(dtype, xp.integer)
    elif kind == 'real floating':
    return xp.issubdtype(dtype, xp.floating)
    elif kind == 'complex floating':
    return xp.issubdtype(dtype, xp.complexfloating)
    elif kind == 'numeric':
    return xp.issubdtype(dtype, xp.number)
    else:
    raise ValueError(f"Unrecognized data type kind: {kind!r}")
    else:
    # This will allow things that aren't required by the spec, like
    # isdtype(np.float64, float) or isdtype(np.int64, 'l'). Should we be
    # more strict here to match the type annotation? Note that the
    # numpy.array_api implementation will be very strict.
    return dtype == kind
    ). It seems like copy=False still isn't implemented in numpy main. Is this planned for 2.0? I also don't know if the _CopyMode parts will need to be updated, or if we should even have those there in the first place.

  • I'm assuming numpy 2.0 is also going to use the string "cpu" for the device. If this isn't the case, we will need to update things.

  • It's possible my isdtype implementation doesn't match the numpy one for some corner cases not handled by the standard.

  • I don't know if we are using any of the NumPy APIs that were deprecated or moved around. We'll have to use the test suite to check that. I tried the ruff numpy linter and it didn't come up with anything. The main one I noticed that worried me is the aforementioned np._CopyMode.

That's all I saw from a cursory look at the code, but I haven't looked at the tests yet.

Perhaps a good middle ground would be to leave the wrappers for existing functions but use the numpy implementation for functions that are completely implemented from scratch in array-api-compat, like isdtype, vector_norm, and vecdot, whenever they are found to be present in the namespace. Those functions are all new in 2.0 and should be completely standard compliant.

@rgommers
Copy link
Member

  • It seems like copy=False still isn't implemented in numpy main. Is this planned for 2.0? I also don't know if the _CopyMode parts will need to be updated, or if we should even have those there in the first place.

Yes, PR is up: numpy/numpy#25168

Yes, that should be a safe assumption I think.

  • It's possible my isdtype implementation doesn't match the numpy one for some corner cases not handled by the standard.

Let's deal with that if an issue pops up? It's brand new, so not in heavy use yet.

I don't know if we are using any of the NumPy APIs that were deprecated or moved around.

Probably okay as is, since you wrote pretty idiomatic code in this package.

Perhaps a good middle ground would be to leave the wrappers for existing functions but use the numpy implementation for functions that are completely implemented from scratch in array-api-compat

Sounds like a good plan to me.

@asmeurer
Copy link
Member Author

Let's deal with that if an issue pops up? It's brand new, so not in heavy use yet.

I was thinking specifically maybe the numpy one does something extra with non-standard dtypes? Anyway this is moot given my other suggestion.

Probably okay as is, since you wrote pretty idiomatic code in this package.

The one thing I saw that worries me is _CopyMode. Is that staying? Maybe we should just get rid of it? I admittedly don't quite understand the point of that. Is supposed to be internal only? We also support it in numpy.array_api but I don't know if that was a good idea.

@rgommers
Copy link
Member

I think so, but @mtsokol will know better what (if anything) may happen to np._CopyMode.

@asmeurer
Copy link
Member Author

I don't know if we are using any of the NumPy APIs that were deprecated or moved around.

Actually it looks like we don't have to worry about that because it was already fixed in #63

@mtsokol
Copy link
Contributor

mtsokol commented Jan 19, 2024

_CopyMode is just an internal enum for representing copy options. I think it's been in the NumPy for some time. While working on copy keyword for asarray there was no discussion on removing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants