Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement NA.__array_ufunc__ #30245

Merged
merged 29 commits into from
Jan 5, 2020
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
bf8680f
Implement NA.__array_ufunc__
TomAugspurger Dec 12, 2019
0c69bd0
Merge remote-tracking branch 'upstream/master' into na-array-ufunc
TomAugspurger Dec 13, 2019
46f2327
ndarrays
TomAugspurger Dec 13, 2019
075d58a
move
TomAugspurger Dec 16, 2019
b72dd1c
Merge remote-tracking branch 'upstream/master' into na-array-ufunc
TomAugspurger Dec 16, 2019
0371cf4
setup, prints
TomAugspurger Dec 16, 2019
878ef70
docs
TomAugspurger Dec 16, 2019
97af2e9
fixup
TomAugspurger Dec 16, 2019
f175a34
fixups
TomAugspurger Dec 16, 2019
f2ac945
Merge remote-tracking branch 'upstream/master' into na-array-ufunc
TomAugspurger Dec 17, 2019
0f4e121
lint
TomAugspurger Dec 17, 2019
fe04554
Merge remote-tracking branch 'upstream/master' into na-array-ufunc
TomAugspurger Dec 18, 2019
cf9ac10
Merge remote-tracking branch 'upstream/master' into na-array-ufunc
TomAugspurger Dec 20, 2019
72e2b67
Fixups
TomAugspurger Dec 20, 2019
8d90e9d
Merge branch 'master' of https://github.com/pandas-dev/pandas into na…
TomAugspurger Dec 28, 2019
6a2fc68
Merge remote-tracking branch 'upstream/master' into na-array-ufunc
TomAugspurger Dec 30, 2019
db4cc40
Merge remote-tracking branch 'upstream/master' into na-array-ufunc
TomAugspurger Dec 30, 2019
7b1585a
fix bug
TomAugspurger Dec 30, 2019
b79e07f
update
TomAugspurger Dec 30, 2019
567c584
test special
TomAugspurger Dec 30, 2019
b27470d
doc
TomAugspurger Dec 30, 2019
c7c9184
move
TomAugspurger Dec 30, 2019
a0dbca8
Merge remote-tracking branch 'upstream/master' into na-array-ufunc
TomAugspurger Jan 2, 2020
ce209f9
fixup
TomAugspurger Jan 2, 2020
e4ecadb
fixup
TomAugspurger Jan 2, 2020
8d2763d
Merge remote-tracking branch 'upstream/master' into na-array-ufunc
TomAugspurger Jan 4, 2020
f68e178
restore
TomAugspurger Jan 4, 2020
d8c23e9
fixup
TomAugspurger Jan 4, 2020
4c30bb4
Merge remote-tracking branch 'upstream/master' into na-array-ufunc
TomAugspurger Jan 5, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/source/getting_started/dsintro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -676,11 +676,11 @@ similar to an ndarray:
# only show the first 5 rows
df[:5].T

.. _dsintro.numpy_interop:

DataFrame interoperability with NumPy functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _dsintro.numpy_interop:

Elementwise NumPy ufuncs (log, exp, sqrt, ...) and various other NumPy functions
can be used with no issues on Series and DataFrame, assuming the data within
are numeric:
Expand Down
26 changes: 26 additions & 0 deletions doc/source/user_guide/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -920,3 +920,29 @@ filling missing values beforehand.

A similar situation occurs when using Series or DataFrame objects in ``if``
statements, see :ref:`gotchas.truth`.

NumPy ufuncs
jreback marked this conversation as resolved.
Show resolved Hide resolved
------------

:attr:`pandas.NA` implements NumPy's ``__array_ufunc__`` protocol. Most ufuncs
work with ``NA``, and generally return ``NA``:

.. ipython:: python

np.log(pd.NA)
np.add(pd.NA, 1)
jreback marked this conversation as resolved.
Show resolved Hide resolved

.. warning::

Currently, ufuncs involving an ndarray and ``NA`` will return an
object-dtype filled with NA values.

.. ipython:: python

a = np.array([1, 2, 3])
np.greater(a, pd.NA)

The return type here may change to return a different array type
in the future.

See :ref:`dsintro.numpy_interop` for more on ufuncs.
53 changes: 48 additions & 5 deletions pandas/_libs/missing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ from pandas._libs.tslibs.np_datetime cimport (
get_timedelta64_value, get_datetime64_value)
from pandas._libs.tslibs.nattype cimport (
checknull_with_nat, c_NaT as NaT, is_null_datetimelike)
from pandas._libs.ops_dispatch import maybe_dispatch_ufunc_to_dunder_op

from pandas.compat import is_platform_32bit

Expand Down Expand Up @@ -290,16 +291,29 @@ cdef inline bint is_null_period(v):
# Implementation of NA singleton


def _create_binary_propagating_op(name, divmod=False):
def _create_binary_propagating_op(name, is_divmod=False):

TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
def method(self, other):
if (other is C_NA or isinstance(other, str)
or isinstance(other, (numbers.Number, np.bool_))):
if divmod:
or isinstance(other, (numbers.Number, np.bool_))
jreback marked this conversation as resolved.
Show resolved Hide resolved
or isinstance(other, np.ndarray) and not other.shape):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit strange, but I suppose it is to follow numpy behaviour of 0-dim arrays returning scalars from comparison operations? (if so, maybe add a comment about that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to follow numpy behaviour of 0-dim arrays returning scalars from comparison operations?

It's to NumPy scalars. Without this, we'd have

np.int64(1) == pd.NA

raise with

>   out[:] = NA
E   IndexError: too many indices for array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A numpy scalar (which is different from a 0-dim array) is not a ndarray, so wouldn't pass the isinstance(other, np.ndarray) test. So I don't fully understand how that example is related.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could call lib.item_from_zerodim at the top?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger can you check this thread? I still don't understand how this relates to scalars, as they shouldn't pass the isinstance(other, np.ndarray) check. So not sure the below comment is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this

diff --git a/pandas/_libs/missing.pyx b/pandas/_libs/missing.pyx
index 8d4d2c5568..e2bb6448cd 100644
--- a/pandas/_libs/missing.pyx
+++ b/pandas/_libs/missing.pyx
@@ -295,8 +295,7 @@ def _create_binary_propagating_op(name, is_divmod=False):
 
     def method(self, other):
         if (other is C_NA or isinstance(other, str)
-                or isinstance(other, (numbers.Number, np.bool_))
-                or isinstance(other, np.ndarray) and not other.shape):
+                or isinstance(other, (numbers.Number, np.bool_))):
             # Need the other.shape clause to handle NumPy scalars,
             # since we do a setitem on `out` below, which
             # won't work for NumPy scalars.

I have

In [3]: np.int64(1) == pd.NA
/Users/taugspurger/.virtualenvs/pandas-dev/bin/ipython:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
  #!/Users/taugspurger/Envs/pandas-dev/bin/python
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-209227154583> in <module>
----> 1 np.int64(1) == pd.NA

~/sandbox/pandas/pandas/_libs/missing.pyx in pandas._libs.missing._create_binary_propagating_op.method()
    307         elif isinstance(other, np.ndarray):
    308             out = np.empty(other.shape, dtype=object)
--> 309             out[:] = NA
    310
    311             if is_divmod:

IndexError: too many indices for array

So when we get there, it really does seem like np.int64(1) is an ndarray. Is cython doing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when we get there, it really does seem like np.int64(1) is an ndarray. Is cython doing something?

Possibly, no idea. But I trust you this is indeed needed to get it working

# Need the other.shape clause to handle NumPy scalars,
# since we do a setitem on `out` below, which
# won't work for NumPy scalars.
if is_divmod:
return NA, NA
else:
return NA

elif isinstance(other, np.ndarray):
out = np.empty(other.shape, dtype=object)
out[:] = NA

if is_divmod:
return out, out.copy()
else:
return out

return NotImplemented

method.__name__ = name
Expand Down Expand Up @@ -369,8 +383,8 @@ class NAType(C_NAType):
__rfloordiv__ = _create_binary_propagating_op("__rfloordiv__")
__mod__ = _create_binary_propagating_op("__mod__")
__rmod__ = _create_binary_propagating_op("__rmod__")
__divmod__ = _create_binary_propagating_op("__divmod__", divmod=True)
__rdivmod__ = _create_binary_propagating_op("__rdivmod__", divmod=True)
__divmod__ = _create_binary_propagating_op("__divmod__", is_divmod=True)
__rdivmod__ = _create_binary_propagating_op("__rdivmod__", is_divmod=True)
# __lshift__ and __rshift__ are not implemented

__eq__ = _create_binary_propagating_op("__eq__")
Expand All @@ -397,6 +411,8 @@ class NAType(C_NAType):
return type(other)(1)
else:
return NA
elif isinstance(other, np.ndarray):
return np.where(other == 0, other.dtype.type(1), NA)

return NotImplemented

Expand All @@ -408,6 +424,8 @@ class NAType(C_NAType):
return other
else:
return NA
elif isinstance(other, np.ndarray):
return np.where((other == 1) | (other == -1), other, NA)

return NotImplemented

Expand Down Expand Up @@ -440,6 +458,31 @@ class NAType(C_NAType):

__rxor__ = __xor__

__array_priority__ = 1000
_HANDLED_TYPES = (np.ndarray, numbers.Number, str, np.bool_)

def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
types = self._HANDLED_TYPES + (NAType,)
for x in inputs:
if not isinstance(x, types):
return NotImplemented

if method != "__call__":
raise ValueError(f"ufunc method '{method}' not supported for NA")
result = maybe_dispatch_ufunc_to_dunder_op(
self, ufunc, method, *inputs, **kwargs
)
if result is NotImplemented:
# For a NumPy ufunc that's not a binop, like np.logaddexp
index = [i for i, x in enumerate(inputs) if x is NA][0]
result = np.broadcast_arrays(*inputs)[index]
if result.ndim == 0:
result = result.item()
if ufunc.nout > 1:
result = (NA,) * ufunc.nout
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved

return result


C_NA = NAType() # C-visible
NA = C_NA # Python-visible
94 changes: 94 additions & 0 deletions pandas/_libs/ops_dispatch.pyx
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
DISPATCHED_UFUNCS = {
"add",
"sub",
"mul",
"pow",
"mod",
"floordiv",
"truediv",
"divmod",
"eq",
"ne",
"lt",
"gt",
"le",
"ge",
"remainder",
"matmul",
"or",
"xor",
"and",
}
UFUNC_ALIASES = {
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
"subtract": "sub",
"multiply": "mul",
"floor_divide": "floordiv",
"true_divide": "truediv",
"power": "pow",
"remainder": "mod",
"divide": "div",
"equal": "eq",
"not_equal": "ne",
"less": "lt",
"less_equal": "le",
"greater": "gt",
"greater_equal": "ge",
"bitwise_or": "or",
"bitwise_and": "and",
"bitwise_xor": "xor",
}

# For op(., Array) -> Array.__r{op}__
REVERSED_NAMES = {
"lt": "__gt__",
"le": "__ge__",
"gt": "__lt__",
"ge": "__le__",
"eq": "__eq__",
"ne": "__ne__",
}


def maybe_dispatch_ufunc_to_dunder_op(
object self, object ufunc, str method, *inputs, **kwargs
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for functions that arent cdef or cpdef, we can use python syntax for annotations

"""
Dispatch a ufunc to the equivalent dunder method.

Parameters
----------
self : ArrayLike
The array whose dunder method we dispatch to
ufunc : Callable
A NumPy ufunc
method : {'reduce', 'accumulate', 'reduceat', 'outer', 'at', '__call__'}
inputs : ArrayLike
The input arrays.
kwargs : Any
The additional keyword arguments, e.g. ``out``.

Returns
-------
result : Any
The result of applying the ufunc
"""
# special has the ufuncs we dispatch to the dunder op on

op_name = ufunc.__name__
op_name = UFUNC_ALIASES.get(op_name, op_name)

def not_implemented(*args, **kwargs):
return NotImplemented

if (method == "__call__"
and op_name in DISPATCHED_UFUNCS
and kwargs.get("out") is None):
if isinstance(inputs[0], type(self)):
name = f"__{op_name}__"
return getattr(self, name, not_implemented)(inputs[1])
else:
name = REVERSED_NAMES.get(op_name, f"__r{op_name}__")
result = getattr(self, name, not_implemented)(inputs[0])
return result
else:
return NotImplemented
2 changes: 1 addition & 1 deletion pandas/core/ops/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import numpy as np

from pandas._libs import Timedelta, Timestamp, lib
from pandas._libs.ops_dispatch import maybe_dispatch_ufunc_to_dunder_op # noqa:F401
from pandas.util._decorators import Appender

from pandas.core.dtypes.common import is_list_like, is_timedelta64_dtype
Expand All @@ -31,7 +32,6 @@
)
from pandas.core.ops.array_ops import comp_method_OBJECT_ARRAY # noqa:F401
from pandas.core.ops.common import unpack_zerodim_and_defer
from pandas.core.ops.dispatch import maybe_dispatch_ufunc_to_dunder_op # noqa:F401
from pandas.core.ops.dispatch import should_series_dispatch
from pandas.core.ops.docstrings import (
_arith_doc_FRAME,
Expand Down
95 changes: 1 addition & 94 deletions pandas/core/ops/dispatch.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
"""
Functions for defining unary operations.
"""
from typing import Any, Callable, Union
from typing import Any, Union

import numpy as np

from pandas._typing import ArrayLike

from pandas.core.dtypes.common import (
is_datetime64_dtype,
is_extension_array_dtype,
Expand Down Expand Up @@ -126,94 +124,3 @@ def dispatch_to_extension_op(
# on the ExtensionArray
res_values = op(left, right)
return res_values


def maybe_dispatch_ufunc_to_dunder_op(
self: ArrayLike, ufunc: Callable, method: str, *inputs: ArrayLike, **kwargs: Any
):
"""
Dispatch a ufunc to the equivalent dunder method.

Parameters
----------
self : ArrayLike
The array whose dunder method we dispatch to
ufunc : Callable
A NumPy ufunc
method : {'reduce', 'accumulate', 'reduceat', 'outer', 'at', '__call__'}
inputs : ArrayLike
The input arrays.
kwargs : Any
The additional keyword arguments, e.g. ``out``.

Returns
-------
result : Any
The result of applying the ufunc
"""
# special has the ufuncs we dispatch to the dunder op on
special = {
"add",
"sub",
"mul",
"pow",
"mod",
"floordiv",
"truediv",
"divmod",
"eq",
"ne",
"lt",
"gt",
"le",
"ge",
"remainder",
"matmul",
"or",
"xor",
"and",
}
aliases = {
"subtract": "sub",
"multiply": "mul",
"floor_divide": "floordiv",
"true_divide": "truediv",
"power": "pow",
"remainder": "mod",
"divide": "div",
"equal": "eq",
"not_equal": "ne",
"less": "lt",
"less_equal": "le",
"greater": "gt",
"greater_equal": "ge",
"bitwise_or": "or",
"bitwise_and": "and",
"bitwise_xor": "xor",
}

# For op(., Array) -> Array.__r{op}__
flipped = {
"lt": "__gt__",
"le": "__ge__",
"gt": "__lt__",
"ge": "__le__",
"eq": "__eq__",
"ne": "__ne__",
}

op_name = ufunc.__name__
op_name = aliases.get(op_name, op_name)

def not_implemented(*args, **kwargs):
return NotImplemented

if method == "__call__" and op_name in special and kwargs.get("out") is None:
if isinstance(inputs[0], type(self)):
name = f"__{op_name}__"
return getattr(self, name, not_implemented)(inputs[1])
else:
name = flipped.get(op_name, f"__r{op_name}__")
return getattr(self, name, not_implemented)(inputs[0])
else:
return NotImplemented
Loading