Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EA: support basic 2D operations #27142

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
d673008
un-xfail tests, xfail instead of skip, minor cleanup
jbrockmendel Jun 28, 2019
b2a837b
Merge branch 'master' of https://github.com/pandas-dev/pandas into xf…
jbrockmendel Jun 28, 2019
c2fd1b1
Merge branch 'master' of https://github.com/pandas-dev/pandas into xf…
jbrockmendel Jun 29, 2019
bed5563
REF: derive __len__ from shape instead of vice-versa
jbrockmendel Jun 30, 2019
29d9dc2
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Jun 30, 2019
f3ce13c
update docstring
jbrockmendel Jun 30, 2019
4fd24c1
remove duplicated methods
jbrockmendel Jun 30, 2019
c0505ee
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Jul 1, 2019
c81daeb
implement shape in terms of size, with implement_2d decorator
jbrockmendel Jul 1, 2019
4d77dbe
move implement_2d, implement view
jbrockmendel Jul 1, 2019
d43ef30
port tests
jbrockmendel Jul 2, 2019
91c979b
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Jul 2, 2019
bc220c3
shape patching, tests
jbrockmendel Jul 2, 2019
421b5a3
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Jul 2, 2019
7c6df89
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Jul 5, 2019
203504c
blackify
jbrockmendel Jul 5, 2019
bc12f01
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Jul 8, 2019
2f18dae
shape fixups
jbrockmendel Jul 8, 2019
eb0645d
blackify+isort
jbrockmendel Jul 8, 2019
2540933
property read-write
jbrockmendel Jul 8, 2019
5b60fb5
add docstring
jbrockmendel Jul 9, 2019
96f3ae2
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Jul 22, 2019
92a0a56
implement base class view
jbrockmendel Jul 22, 2019
81191bf
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Jul 22, 2019
91639dd
use base view
jbrockmendel Jul 22, 2019
34cc9e9
patch take, getitem
jbrockmendel Jul 22, 2019
444f9f7
blackify
jbrockmendel Jul 22, 2019
7c15b74
isort fixup
jbrockmendel Jul 22, 2019
768d75d
patch iter
jbrockmendel Jul 22, 2019
3b7b2b2
slice handling cleanup
jbrockmendel Jul 22, 2019
177bfb0
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Jul 29, 2019
f5cba22
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Aug 2, 2019
174a1da
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Aug 3, 2019
1d78fbe
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Aug 5, 2019
41b49d9
Merge branch 'master' of https://github.com/pandas-dev/pandas into ar…
jbrockmendel Aug 9, 2019
fc331b8
dummy to force CI
jbrockmendel Aug 9, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -567,6 +567,7 @@ Other API changes
- Using an unsupported version of Beautiful Soup 4 will now raise an ``ImportError`` instead of a ``ValueError`` (:issue:`27063`)
- :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` will now raise a ``ValueError`` when saving timezone aware data. (:issue:`27008`, :issue:`7056`)
- :meth:`DataFrame.to_hdf` and :meth:`Series.to_hdf` will now raise a ``NotImplementedError`` when saving a :class:`MultiIndex` with extention data types for a ``fixed`` format. (:issue:`7775`)
- :meth:`Categorical.ravel` will now return a :class:`Categorical` instead of a NumPy array. (:issue:`27153`)
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change in 0.25? (IIRC yes?), if not can you break it out

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we only recently deprecated the old behavior. will double-check and see whats appropriate


.. _whatsnew_0250.deprecations:

Expand Down
1 change: 1 addition & 0 deletions pandas/core/arrays/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from ._reshaping import implement_2d # noqa:F401
from .array_ import array # noqa: F401
from .base import ( # noqa: F401
ExtensionArray, ExtensionOpsMixin, ExtensionScalarOpsMixin)
Expand Down
91 changes: 91 additions & 0 deletions pandas/core/arrays/_reshaping.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
"""
Utilities for implementing 2D compatibility for 1D ExtensionArrays.
"""
from typing import Tuple

import numpy as np

from pandas._libs.lib import is_integer


def implement_2d(cls):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reasoning behind making this a decorator as opposed to just having base class method? it seems much simpler and more obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main pushback on this idea was the onus put on EA authors who only support 1D. ATM this decorator mainly just renames __len__ to size so that it can re-define shape in a 2D-compatible way. So asking authors to use the decorator instead of doing that re-definition themselves is kind of a wash.

But the step after this is to patch __getitem__, __setitem__, take, __iter__, all of which are easier to do with the decorator than by asking authors to do it themselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan still to have an "opt me out of this, I natively support 2D arrays"? If we go down the route of putting every Block.values inside an EA (with PandasArray for the current NumPy-backed Blocks), then we'll want that, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see you're applying this to subclasses, rather than ExtensionArray itself. Motivation for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation for that?

less magic than a metaclass (and my first attempt at using a metaclass failed)

Is the plan still to have an "opt me out of this, I natively support 2D arrays"?

Defined EA._allows_2d = False. Authors set that to True if they implement this themselves. This decorator should be updated to check that and be a no-op in that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, because simply decorating

@implements_2d
class ExtensionArray:

would work for subclasses right? It has to be a metaclass. I'll poke around at the meta class approach to see what's going on. May be too magical.

But if we do go with a

"""
A decorator to take a 1-dimension-only ExtensionArray subclass and make
it support limited 2-dimensional operations.
"""
from pandas.core.arrays import ExtensionArray

# For backwards-compatibility, if an EA author implemented __len__
# but not size, we use that __len__ method to get an array's size.
has_size = cls.size is not ExtensionArray.size
has_shape = cls.shape is not ExtensionArray.shape
has_len = cls.__len__ is not ExtensionArray.__len__

if not has_size and has_len:
cls.size = property(cls.__len__)
cls.__len__ = ExtensionArray.__len__

elif not has_size and has_shape:
@property
def size(self) -> int:
return np.prod(self.shape)

cls.size = size

return cls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we are still doing this approach, but if we are

I would break out each of the implementations into a well named function that takes cls. then impelemented_2d is pretty straightforward to read w/o having to understand the details, you can immediately see what is being changed and the details exist in the functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

ATM the relevant discussion about how to proceed is in Tom's PR here



def tuplify_shape(size: int, shape) -> Tuple[int, ...]:
"""
Convert a passed shape into a valid tuple.
Following ndarray.reshape, we accept either `reshape(a, b)` or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't suppose NumPy has a function we can borrow here? I'm not aware of one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy/numpy#13768 is the closest thing I found, but I agree it seems like the kind of thing that should exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a somewhat dumb approach:

def tuplify_shape(size, shape):
    return np.broadcast_to(None, size).reshape(*shape).shape

Tested as:

In [20]: tuplify_shape(100, (2, -1, 5))
Out[20]: (2, 10, 5)

In [21]: tuplify_shape(100, ((2, -1, 5)))
Out[21]: (2, 10, 5)

In [22]: tuplify_shape(100, ((2, 11, 5)))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-f71a2f488ea8> in <module>()
----> 1 tuplify_shape(100, ((2, 11, 5)))

<ipython-input-19-6b4139eccdff> in tuplify_shape(size, shape)
      1 def tuplify_shape(size, shape):
----> 2     return np.broadcast_to(None, size).reshape(*shape).shape

ValueError: cannot reshape array of size 100 into shape (2,11,5)

`reshape((a, b))`, the latter being canonical.

Parameters
----------
size : int
shape : tuple

Returns
-------
tuple[int, ...]
"""
if len(shape) == 0:
raise ValueError("shape must be a non-empty tuple of integers",
shape)

if len(shape) == 1:
if is_integer(shape[0]):
pass
else:
shape = shape[0]
if not isinstance(shape, tuple):
raise ValueError("shape must be a non-empty tuple of integers",
shape)

if not all(is_integer(x) for x in shape):
raise ValueError("shape must be a non-empty tuple of integers", shape)

if any(x < -1 for x in shape):
raise ValueError("Invalid shape {shape}".format(shape=shape))

if -1 in shape:
if shape.count(-1) != 1:
raise ValueError("Invalid shape {shape}".format(shape=shape))
idx = shape.index(-1)
others = [n for n in shape if n != -1]
prod = np.prod(others)
dim = size // prod
shape = shape[:idx] + (dim,) + shape[idx + 1:]

if np.prod(shape) != size:
raise ValueError("Product of shape ({shape}) must match "
"size ({size})".format(shape=shape,
size=size))

num_gt1 = len([x for x in shape if x > 1])
if num_gt1 > 1:
raise ValueError("The default `reshape` implementation is limited to "
"shapes (N,), (N,1), and (1,N), not {shape}"
.format(shape=shape))
return shape
69 changes: 65 additions & 4 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

from pandas._typing import ArrayLike
from pandas.core import ops
from pandas.core.arrays._reshaping import tuplify_shape

_not_implemented_message = "{} does not implement {}."

Expand All @@ -47,12 +48,13 @@ class ExtensionArray:
* _from_sequence
* _from_factorized
* __getitem__
* __len__
* __len__ *or* size
* dtype
* nbytes
* isna
* take
* copy
* view
* _concat_same_type

A default repr displaying the type, (truncated) data, length,
Expand Down Expand Up @@ -123,6 +125,12 @@ class ExtensionArray:
# Don't override this.
_typ = 'extension'

# Whether this class supports 2D arrays natively. If so, set _allows_2d
# to True and override reshape, ravel, and T. Otherwise, apply the
# `implement_2d` decorator to use default implementations of limited
# 2D functionality.
_allows_2d = False

# ------------------------------------------------------------------------
# Constructors
# ------------------------------------------------------------------------
Expand Down Expand Up @@ -283,7 +291,7 @@ def __len__(self) -> int:
-------
length : int
"""
raise AbstractMethodError(self)
return self.shape[0]

def __iter__(self):
"""
Expand All @@ -298,6 +306,7 @@ def __iter__(self):
# ------------------------------------------------------------------------
# Required attributes
# ------------------------------------------------------------------------
_shape = None

@property
def dtype(self) -> ExtensionDtype:
Expand All @@ -311,14 +320,26 @@ def shape(self) -> Tuple[int, ...]:
"""
Return a tuple of the array dimensions.
"""
return (len(self),)
if self._shape is not None:
return self._shape

# Default to 1D
length = self.size
return (length,)

@property
def ndim(self) -> int:
"""
Extension Arrays are only allowed to be 1-dimensional.
"""
return 1
return len(self.shape)

@property
def size(self) -> int:
"""
The number of elements in this array.
"""
raise AbstractMethodError(self)

@property
def nbytes(self) -> int:
Expand Down Expand Up @@ -841,6 +862,22 @@ def copy(self) -> ABCExtensionArray:
"""
raise AbstractMethodError(self)

def view(self, dtype=None) -> ABCExtensionArray:
"""
Return a view on the array.

Returns
-------
ExtensionArray

Notes
-----
- This must return a *new* object, not self.
- The only case that *must* be implemented is with dtype=None,
giving a view with the same dtype as self.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can / should we make any requirements on this being no-copy? i.e.

a = my_array(...)
b = a.view(dtype=int)
b[0] = 10
assert a[0] == b[0]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely needs to be no-copy; i'll add that to the docstring

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a test for the no-copy bit

"""
raise AbstractMethodError(self)

# ------------------------------------------------------------------------
# Printing
# ------------------------------------------------------------------------
Expand Down Expand Up @@ -908,6 +945,30 @@ def _formatting_values(self) -> np.ndarray:
# Reshaping
# ------------------------------------------------------------------------

def reshape(self, *shape):
# numpy accepts either a single tuple or an expanded tuple
shape = tuplify_shape(self.size, shape)
result = self.view()
result._shape = shape
return result

@property
def T(self) -> ABCExtensionArray:
"""
Return a transposed view on self.
"""
shape = self.shape[::-1]
return self.reshape(shape)

def ravel(self, order=None) -> ABCExtensionArray:
"""
Return a flattened view on self.
"""
# Note: we ignore `order`, keep the argument for compat with
# numpy signature.
shape = (self.size,)
return self.reshape(shape)

@classmethod
def _concat_same_type(
cls,
Expand Down
45 changes: 9 additions & 36 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@

from pandas.io.formats import console

from ._reshaping import implement_2d
from .base import ExtensionArray, _extension_array_shared_docs

_take_msg = textwrap.dedent("""\
Expand Down Expand Up @@ -208,6 +209,7 @@ def contains(cat, key, container):
"""


@implement_2d
class Categorical(ExtensionArray, PandasObject):
"""
Represent a categorical variable in classic R / S-plus fashion.
Expand Down Expand Up @@ -490,20 +492,6 @@ def astype(self, dtype, copy=True):
return self._set_dtype(dtype)
return np.array(self, dtype=dtype, copy=copy)

@cache_readonly
def ndim(self):
"""
Number of dimensions of the Categorical
"""
return self._codes.ndim

@cache_readonly
def size(self):
"""
return the len of myself
"""
return len(self)

@cache_readonly
def itemsize(self):
"""
Expand Down Expand Up @@ -1230,8 +1218,7 @@ def shape(self):
-------
shape : tuple
"""

return tuple([len(self._codes)])
return self._codes.shape

def shift(self, periods, fill_value=None):
"""
Expand Down Expand Up @@ -1697,19 +1684,7 @@ def _values_for_rank(self):
)
return values

def ravel(self, order='C'):
"""
Return a flattened (numpy) array.

For internal compatibility with numpy arrays.

Returns
-------
numpy.array
"""
return np.array(self)

def view(self):
def view(self, dtype=None):
"""
Return a view of myself.

Expand All @@ -1720,7 +1695,11 @@ def view(self):
view : Categorical
Returns `self`!
"""
return self
if dtype is not None:
return NotImplementedError(dtype)
return self._constructor(values=self._codes,
dtype=self.dtype,
fastpath=True)

def to_dense(self):
"""
Expand Down Expand Up @@ -1935,12 +1914,6 @@ def _slice(self, slicer):
codes = self._codes[slicer]
return self._constructor(values=codes, dtype=self.dtype, fastpath=True)

def __len__(self):
"""
The length of this Categorical.
"""
return len(self._codes)

def __iter__(self):
"""
Returns an Iterator over the values of this Categorical.
Expand Down
12 changes: 6 additions & 6 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
from pandas.tseries import frequencies
from pandas.tseries.offsets import DateOffset, Tick

from ._reshaping import implement_2d
from .base import ExtensionArray, ExtensionOpsMixin


Expand Down Expand Up @@ -324,6 +325,7 @@ def ceil(self, freq, ambiguous='raise', nonexistent='raise'):
return self._round(freq, RoundTo.PLUS_INFTY, ambiguous, nonexistent)


@implement_2d
class DatetimeLikeArrayMixin(ExtensionOpsMixin,
AttributesMixin,
ExtensionArray):
Expand Down Expand Up @@ -402,12 +404,8 @@ def __array__(self, dtype=None):
return self._data

@property
def size(self) -> int:
"""The number of elements in this array."""
return np.prod(self.shape)

def __len__(self):
return len(self._data)
def shape(self):
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
return self._data.shape

def __getitem__(self, key):
"""
Expand Down Expand Up @@ -557,6 +555,8 @@ def view(self, dtype=None):
ndarray
With the specified `dtype`.
"""
if dtype is None:
return type(self)(self._data, dtype=self.dtype, freq=self.freq)
return self._data.view(dtype=dtype)

# ------------------------------------------------------------------
Expand Down
Loading