Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Allow storing ExtensionArrays in containers #19520

Merged
merged 139 commits into from
Feb 23, 2018
Merged
Show file tree
Hide file tree
Changes from 138 commits
Commits
Show all changes
139 commits
Select commit Hold shift + click to select a range
b7069ef
ENH: non-interval changes
TomAugspurger Feb 2, 2018
9cd92c7
COMPAT: py2 Super
TomAugspurger Feb 3, 2018
9211bbd
BUG: Use original object for extension array
TomAugspurger Feb 3, 2018
80f83a6
Consistent boxing / unboxing
TomAugspurger Feb 3, 2018
ca004d8
32-bit compat
TomAugspurger Feb 3, 2018
5d4a686
Add a test array
TomAugspurger Feb 5, 2018
9f4ad42
linting
TomAugspurger Feb 5, 2018
b1db4e8
Default __iter__
TomAugspurger Feb 5, 2018
00d6bb3
Tests for value_counts
TomAugspurger Feb 5, 2018
1608e3d
Implement value_counts
TomAugspurger Feb 5, 2018
52e2180
Py2 compat
TomAugspurger Feb 5, 2018
e6d06e2
Fixed dropna
TomAugspurger Feb 5, 2018
d356f19
Test fixups
TomAugspurger Feb 6, 2018
a6ae340
Started setitem
TomAugspurger Feb 6, 2018
41f09d8
REF/Clean: Internal / External values
TomAugspurger Feb 3, 2018
29cfd7c
Move to index base
TomAugspurger Feb 6, 2018
05eb9f3
Merge remote-tracking branch 'upstream/master' into pandas-array-upst…
TomAugspurger Feb 6, 2018
82fd0c6
Setitem tests, decimal example
TomAugspurger Feb 7, 2018
8b1e7d6
Compat
TomAugspurger Feb 7, 2018
10af4b6
Fixed extension block tests.
TomAugspurger Feb 7, 2018
cd5f1eb
Clarify binop tests
TomAugspurger Feb 7, 2018
0a9d9fd
TST: Removed ops tests
TomAugspurger Feb 7, 2018
3185f4e
Cleanup unique handling
TomAugspurger Feb 7, 2018
ffa888a
Merge remote-tracking branch 'upstream/master' into pandas-array-upst…
TomAugspurger Feb 7, 2018
5a59591
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 7, 2018
476f75d
Simplify object concat
TomAugspurger Feb 7, 2018
b15ee5a
Use values for intersection
TomAugspurger Feb 7, 2018
659073f
hmm
TomAugspurger Feb 7, 2018
b15ecac
More failing tests
TomAugspurger Feb 7, 2018
88b8f4f
remove bad test
TomAugspurger Feb 7, 2018
349ac1a
better setitem
TomAugspurger Feb 8, 2018
27ab045
Dropna works.
TomAugspurger Feb 8, 2018
8358fb1
Restore xfail test
TomAugspurger Feb 8, 2018
8ef34a9
Test Categorical
TomAugspurger Feb 8, 2018
340d11b
Xfail setitem tests
TomAugspurger Feb 8, 2018
8297888
TST: Skip JSON tests on py2
TomAugspurger Feb 8, 2018
7accb67
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 8, 2018
9b8d2a5
Additional testing
TomAugspurger Feb 8, 2018
9fbac29
More tests
TomAugspurger Feb 8, 2018
55305dc
ndarray_values
TomAugspurger Feb 8, 2018
0e63708
API: Default ExtensionArray.astype
TomAugspurger Feb 8, 2018
fbbbc8a
Simplify concat_as_object
TomAugspurger Feb 8, 2018
46a0a49
Py2 compat
TomAugspurger Feb 8, 2018
2c4445a
Set-ops ugliness
TomAugspurger Feb 8, 2018
5612cda
better docstrings
TomAugspurger Feb 8, 2018
b012c19
tolist
TomAugspurger Feb 8, 2018
d49e6aa
linting
TomAugspurger Feb 8, 2018
d7d31ee
Moved dtypes
TomAugspurger Feb 9, 2018
7b89f1b
clean
TomAugspurger Feb 9, 2018
b0dbffd
cleanup
TomAugspurger Feb 9, 2018
66b936f
NumPy compat
TomAugspurger Feb 9, 2018
32ee0ef
Use base _values for CategoricalIndex
TomAugspurger Feb 9, 2018
a9882e2
Update dev docs
TomAugspurger Feb 9, 2018
f53652a
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 9, 2018
2425621
cleanup
TomAugspurger Feb 9, 2018
fc337de
Merge remote-tracking branch 'upstream/master' into pandas-array-upst…
TomAugspurger Feb 9, 2018
6abe9da
cleanup
TomAugspurger Feb 9, 2018
512fb89
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 9, 2018
c5db5da
Merge branch 'index-values' into pandas-array-upstream+fu1
TomAugspurger Feb 9, 2018
0b112f2
cleanup
TomAugspurger Feb 9, 2018
170d0c7
Linting
TomAugspurger Feb 9, 2018
402620f
Precision in tests
TomAugspurger Feb 9, 2018
b556f83
Merge branch 'index-values' into pandas-array-upstream+fu1
TomAugspurger Feb 9, 2018
268aabc
Linting
TomAugspurger Feb 9, 2018
d671259
Move to extension
TomAugspurger Feb 9, 2018
cb740ed
Merge remote-tracking branch 'upstream/master' into pandas-array-upst…
TomAugspurger Feb 9, 2018
d9e8dd6
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 9, 2018
815d202
Push _ndarray_values to ExtensionArray
TomAugspurger Feb 11, 2018
a727b21
Clean up tolist
TomAugspurger Feb 11, 2018
f368c29
Move test locations
TomAugspurger Feb 11, 2018
d74c5c9
Fixed test
TomAugspurger Feb 12, 2018
0ec6958
Merge branch 'index-values' into pandas-array-upstream+fu1
TomAugspurger Feb 12, 2018
8104ee5
REF: Update per comments
TomAugspurger Feb 12, 2018
4d08218
Merge branch 'index-values' into pandas-array-upstream+fu1
TomAugspurger Feb 12, 2018
f8e29b9
lint
TomAugspurger Feb 12, 2018
0cd9faa
REF: Use _values for size and shape
TomAugspurger Feb 12, 2018
8fcdb70
PERF: Implement size, shape for IntervalIndex
TomAugspurger Feb 12, 2018
34a6a22
PERF: Avoid materializing values for PeriodIndex shape, size
TomAugspurger Feb 12, 2018
b4de5c4
Merge branch 'index-values' into pandas-array-upstream+fu1
TomAugspurger Feb 12, 2018
c233c28
Merge remote-tracking branch 'upstream/master' into index-values
TomAugspurger Feb 13, 2018
d6e8051
Cleanup
TomAugspurger Feb 13, 2018
3af8a21
Override nbytes
TomAugspurger Feb 13, 2018
3a5118b
Merge branch 'index-values' into pandas-array-upstream+fu1
TomAugspurger Feb 13, 2018
1e8e87e
Remove unused change
TomAugspurger Feb 13, 2018
0f5e4f0
Docs
TomAugspurger Feb 13, 2018
1436d1d
Merge remote-tracking branch 'upstream/master' into pandas-array-upst…
TomAugspurger Feb 13, 2018
c4dab88
Test cleanpu
TomAugspurger Feb 13, 2018
a312ba5
Always set PANDAS_TESTING_MODE
TomAugspurger Feb 13, 2018
758689f
Revert "Always set PANDAS_TESTING_MODE"
TomAugspurger Feb 13, 2018
02c3d40
Explicitly catch warnings or not
TomAugspurger Feb 13, 2018
9e17037
fastparquet warnings
TomAugspurger Feb 13, 2018
4599db4
Unicode literals strikes again.
TomAugspurger Feb 14, 2018
d34d9ca
Restore circle env var
TomAugspurger Feb 14, 2018
29d2528
More parquet test catching
TomAugspurger Feb 14, 2018
905d72e
Merge remote-tracking branch 'upstream/master' into pandas-array-upst…
TomAugspurger Feb 14, 2018
412c951
No stacklevel
TomAugspurger Feb 14, 2018
78834f1
Lower bound on FP
TomAugspurger Feb 15, 2018
f8eac55
Exact bound for FP
TomAugspurger Feb 15, 2018
f09c863
Don't use fastpath for ExtensionBlock make_block
TomAugspurger Feb 15, 2018
cedb63d
Consistently use _values
TomAugspurger Feb 16, 2018
cae2c26
TST: Additional constructor tests
TomAugspurger Feb 16, 2018
8088096
CLN: de-nested a bit
TomAugspurger Feb 16, 2018
8aed325
_fill_value handling
TomAugspurger Feb 16, 2018
453728a
Handle user provided dtype in constructors.
TomAugspurger Feb 16, 2018
cc13c8d
Document ExtensionBlock._maybe_coerce_values
TomAugspurger Feb 16, 2018
f90ac07
Created ABCExtensionArray
TomAugspurger Feb 16, 2018
4a03b26
TST: Tests for is_object_dtype and is_string_dtype and EAs
TomAugspurger Feb 16, 2018
635223f
fixup! Handle user provided dtype in constructors.
TomAugspurger Feb 17, 2018
cf423a7
Doc for setitem
TomAugspurger Feb 17, 2018
2d1a66c
Split base tests
TomAugspurger Feb 17, 2018
c849865
Revert test_parquet changes
TomAugspurger Feb 17, 2018
c3ec822
API: Removed _fill_value from the interface
TomAugspurger Feb 17, 2018
f4cf45c
Push coercion to extension dtype till later
TomAugspurger Feb 17, 2018
9c5d479
Linting
TomAugspurger Feb 17, 2018
1175c0d
ERR: Better error message for coercion to 3rd party dtypes
TomAugspurger Feb 17, 2018
c816d99
CLN: Make take_nd EA aware
TomAugspurger Feb 17, 2018
9c9f59e
Revert sparse changes
TomAugspurger Feb 17, 2018
08af9a3
Other _typ for ABCExtensionArray
TomAugspurger Feb 17, 2018
2e992f7
Test cleanup and expansion.
TomAugspurger Feb 17, 2018
cc5cc3e
Copy if copy
TomAugspurger Feb 17, 2018
704ee67
TST: remove self param for fixture
TomAugspurger Feb 18, 2018
c262968
Merge remote-tracking branch 'upstream/master' into pandas-array-upst…
TomAugspurger Feb 19, 2018
8bf0334
Remove unnescessary EA handling in Series ctor
TomAugspurger Feb 20, 2018
c8d88da
API: Removed value_counts
TomAugspurger Feb 20, 2018
24f3b60
More doc notes
TomAugspurger Feb 20, 2018
50bd5dd
Handle expanding a DataFrame with an EA
TomAugspurger Feb 20, 2018
879bc84
Added ExtensionDtype.__eq__
TomAugspurger Feb 20, 2018
33c9d1f
linting
TomAugspurger Feb 21, 2018
f07c166
REF: is_dtype_equal refactor
TomAugspurger Feb 21, 2018
79d43b1
Remove reference to dtype being a class
TomAugspurger Feb 21, 2018
91c629b
Merge remote-tracking branch 'upstream/master' into pandas-array-upst…
TomAugspurger Feb 21, 2018
a1ebf53
move
TomAugspurger Feb 22, 2018
aa57cad
Moved sparse check to take_nd
TomAugspurger Feb 22, 2018
c82748c
Docstring
TomAugspurger Feb 22, 2018
e919343
Split tests
TomAugspurger Feb 22, 2018
1ea74da
Revert index change
TomAugspurger Feb 22, 2018
0c41a34
Copy changes
TomAugspurger Feb 22, 2018
009bece
Simplify EA implementation names
TomAugspurger Feb 22, 2018
ea5562b
Linting
TomAugspurger Feb 22, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,12 @@
is_unsigned_integer_dtype, is_signed_integer_dtype,
is_integer_dtype, is_complex_dtype,
is_object_dtype,
is_extension_array_dtype,
is_categorical_dtype, is_sparse,
is_period_dtype,
is_numeric_dtype, is_float_dtype,
is_bool_dtype, needs_i8_conversion,
is_categorical, is_datetimetz,
is_datetimetz,
is_datetime64_any_dtype, is_datetime64tz_dtype,
is_timedelta64_dtype, is_interval_dtype,
is_scalar, is_list_like,
Expand Down Expand Up @@ -547,7 +548,7 @@ def value_counts(values, sort=True, ascending=False, normalize=False,
if is_categorical_dtype(values) or is_sparse(values):

# handle Categorical and sparse,
result = Series(values).values.value_counts(dropna=dropna)
result = Series(values)._values.value_counts(dropna=dropna)
result.name = name
counts = result.values

Expand Down Expand Up @@ -1292,10 +1293,13 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
"""
Specialized Cython take which sets NaN values in one pass

This dispatches to ``take`` defined on ExtensionArrays. It does not
currently dispatch to ``SparseArray.take`` for sparse ``arr``.

Parameters
----------
arr : ndarray
Input array
arr : array-like
Input array.
indexer : ndarray
1-D array of indices to take, subarrays corresponding to -1 value
indicies are filed with fill_value
Expand All @@ -1315,17 +1319,25 @@ def take_nd(arr, indexer, axis=0, out=None, fill_value=np.nan, mask_info=None,
If False, indexer is assumed to contain no -1 values so no filling
will be done. This short-circuits computation of a mask. Result is
undefined if allow_fill == False and -1 is present in indexer.

Returns
-------
subarray : array-like
May be the same type as the input, or cast to an ndarray.
"""

# TODO(EA): Remove these if / elifs as datetimeTZ, interval, become EAs
# dispatch to internal type takes
if is_categorical(arr):
return arr.take_nd(indexer, fill_value=fill_value,
allow_fill=allow_fill)
if is_extension_array_dtype(arr):
return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill)
elif is_datetimetz(arr):
return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill)
elif is_interval_dtype(arr):
return arr.take(indexer, fill_value=fill_value, allow_fill=allow_fill)

if is_sparse(arr):
arr = arr.get_values()

if indexer is None:
indexer = np.arange(arr.shape[axis], dtype=np.int64)
dtype, fill_value = arr.dtype, arr.dtype.type()
Expand Down
90 changes: 69 additions & 21 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,13 @@ class ExtensionArray(object):
* isna
* take
* copy
* _formatting_values
* _concat_same_type

Some additional methods are required to satisfy pandas' internal, private
Some additional methods are available to satisfy pandas' internal, private
block API.

* _concat_same_type
* _can_hold_na
* _formatting_values

This class does not inherit from 'abc.ABCMeta' for performance reasons.
Methods and properties required by the interface raise
Expand All @@ -53,13 +52,14 @@ class ExtensionArray(object):
Extension arrays should be able to be constructed with instances of
the class, i.e. ``ExtensionArray(extension_array)`` should return
an instance, not error.

Additionally, certain methods and interfaces are required for proper
this array to be properly stored inside a ``DataFrame`` or ``Series``.
"""
# '_typ' is for pandas.core.dtypes.generic.ABCExtensionArray.
# Don't override this.
_typ = 'extension'
# ------------------------------------------------------------------------
# Must be a Sequence
# ------------------------------------------------------------------------

def __getitem__(self, item):
# type (Any) -> Any
"""Select a subset of self.
Expand Down Expand Up @@ -92,7 +92,46 @@ def __getitem__(self, item):
raise AbstractMethodError(self)

def __setitem__(self, key, value):
# type: (Any, Any) -> None
# type: (Union[int, np.ndarray], Any) -> None
"""Set one or more values inplace.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some notes on the semantics of setitem if people want to take a look.


This method is not required to satisfy the pandas extension array
interface.

Parameters
----------
key : int, ndarray, or slice
When called from, e.g. ``Series.__setitem__``, ``key`` will be
one of

* scalar int
* ndarray of integers.
* boolean ndarray
* slice object

value : ExtensionDtype.type, Sequence[ExtensionDtype.type], or object
value or values to be set of ``key``.

Returns
-------
None
"""
# Some notes to the ExtensionArray implementor who may have ended up
# here. While this method is not required for the interface, if you
# *do* choose to implement __setitem__, then some semantics should be
# observed:
#
# * Setting multiple values : ExtensionArrays should support setting
# multiple values at once, 'key' will be a sequence of integers and
# 'value' will be a same-length sequence.
#
# * Broadcasting : For a sequence 'key' and a scalar 'value',
# each position in 'key' should be set to 'value'.
#
# * Coercion : Most users will expect basic coercion to work. For
# example, a string like '2018-01-01' is coerced to a datetime
# when setting on a datetime64ns array. In general, if the
# __init__ method coerces that value, then so should __setitem__
raise NotImplementedError(_not_implemented_message.format(
type(self), '__setitem__')
)
Expand All @@ -107,6 +146,16 @@ def __len__(self):
# type: () -> int
raise AbstractMethodError(self)

def __iter__(self):
"""Iterate over elements of the array.

"""
# This needs to be implemented so that pandas recognizes extension
# arrays as list-like. The default implementation makes successive
# calls to ``__getitem__``, which may be slower than necessary.
for i in range(len(self)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you not just raising AbstractMethodError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is tied intimately with the boxing concept of scalars, but this should be decoupled from getitem (yes its a valid implementation, but you would never actually do it this way), rather you would iterate on some underlying value and box it, not index into and as a side effect use getitem to box.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather you would iterate on some underlying value

We don't know how the subclass is storing things though, so there's no .values to iterate over.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so my point is that you cannot implement a default here, make the sub-classes do it. yes it might repeat some code, but its much better there i think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you cannot

Yes, we can (see the code). You "don't want" it, that something different :)
But I don't really understand why the above is not a good default implementation. I think a lot subclasses will do exactly this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And some context: If I remove __iter__, the object will still be iterable with essentially this implementation since it defines __getitem__ and __len__. We just need to implement it explicitly so that is_iterable works correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still might be better raise here by default and force the subclass to implement this (even if this impl). It is SO important for subclasses to pay attention to this, it should just be required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do much better than this implementation.
To give you some context: in geopandas, I have a cython method that converts a GeometryArray to numpy object array of Geometry objects, which takes ca 80ms on an array of 10000 points. Simply doing np.array([a[i] for i in range(len(a))], dtype=object) is ca 108 ms. This small difference is because by definition in each iteration I need to create a python object.

yield self[i]

# ------------------------------------------------------------------------
# Required attributes
# ------------------------------------------------------------------------
Expand All @@ -132,9 +181,9 @@ def nbytes(self):
# type: () -> int
"""The number of bytes needed to store this object in memory.

If this is expensive to compute, return an approximate lower bound
on the number of bytes needed.
"""
# If this is expensive to compute, return an approximate lower bound
# on the number of bytes needed.
raise AbstractMethodError(self)

# ------------------------------------------------------------------------
Expand Down Expand Up @@ -184,8 +233,8 @@ def take(self, indexer, allow_fill=True, fill_value=None):
will be done. This short-circuits computation of a mask. Result is
undefined if allow_fill == False and -1 is present in indexer.
fill_value : any, default None
Fill value to replace -1 values with. By default, this uses
the missing value sentinel for this type, ``self._fill_value``.
Fill value to replace -1 values with. If applicable, this should
use the sentinel missing value for this type.

Notes
-----
Expand All @@ -198,17 +247,20 @@ def take(self, indexer, allow_fill=True, fill_value=None):

Examples
--------
Suppose the extension array somehow backed by a NumPy structured array
and that the underlying structured array is stored as ``self.data``.
Then ``take`` may be written as
Suppose the extension array is backed by a NumPy array stored as
``self.data``. Then ``take`` may be written as

.. code-block:: python

def take(self, indexer, allow_fill=True, fill_value=None):
mask = indexer == -1
result = self.data.take(indexer)
result[mask] = self._fill_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, is there a reason we don't define _na_value? (e.g we do on indexes) as a property?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally there was a self._fill_value, but because this was not used internally, we left it out for the moment. The discussion about that is a bit spread, but see eg #19520 (comment) (4th bullet point is about fill_value, and comment below with Tom's answer) and #19520 (comment) and the comments below that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that's fine. I think we do need / want a way to set this and a property is good, and _na_value is used elsewhere (eg. in Index)

result[mask] = np.nan # NA for this type
return type(self)(result)

See Also
--------
numpy.take
"""
raise AbstractMethodError(self)

Expand All @@ -230,17 +282,12 @@ def copy(self, deep=False):
# ------------------------------------------------------------------------
# Block-related methods
# ------------------------------------------------------------------------
@property
def _fill_value(self):
# type: () -> Any
"""The missing value for this type, e.g. np.nan"""
return None

def _formatting_values(self):
# type: () -> np.ndarray
# At the moment, this has to be an array since we use result.dtype
"""An array of values to be printed in, e.g. the Series repr"""
raise AbstractMethodError(self)
return np.array(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO, this should be implemented only by subclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very sensible default IMO, subclasses can always override if needed


@classmethod
def _concat_same_type(cls, to_concat):
Expand All @@ -257,6 +304,7 @@ def _concat_same_type(cls, to_concat):
"""
raise AbstractMethodError(cls)

@property
def _can_hold_na(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interface change: This should have been a property.

# type: () -> bool
"""Whether your array can hold missing values. True by default.
Expand Down
57 changes: 47 additions & 10 deletions pandas/core/dtypes/base.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
"""Extend pandas with custom array types"""
import numpy as np

from pandas import compat
from pandas.errors import AbstractMethodError


Expand All @@ -23,6 +26,32 @@ class ExtensionDtype(object):
def __str__(self):
return self.name

def __eq__(self, other):
"""Check whether 'other' is equal to self.

By default, 'other' is considered equal if

* it's a string matching 'self.name'.
* it's an instance of this type.

Parameters
----------
other : Any

Returns
-------
bool
"""
if isinstance(other, compat.string_types):
return other == self.name
elif isinstance(other, type(self)):
return True
else:
return False

def __ne__(self, other):
return not self.__eq__(other)

@property
def type(self):
# type: () -> type
Expand Down Expand Up @@ -102,11 +131,12 @@ def construct_from_string(cls, string):

@classmethod
def is_dtype(cls, dtype):
"""Check if we match 'dtype'
"""Check if we match 'dtype'.

Parameters
----------
dtype : str or dtype
dtype : object
The object to check.

Returns
-------
Expand All @@ -118,12 +148,19 @@ def is_dtype(cls, dtype):

1. ``cls.construct_from_string(dtype)`` is an instance
of ``cls``.
2. 'dtype' is ``cls`` or a subclass of ``cls``.
2. ``dtype`` is an object and is an instance of ``cls``
3. ``dtype`` has a ``dtype`` attribute, and any of the above
conditions is true for ``dtype.dtype``.
"""
if isinstance(dtype, str):
try:
return isinstance(cls.construct_from_string(dtype), cls)
except TypeError:
return False
else:
return issubclass(dtype, cls)
dtype = getattr(dtype, 'dtype', dtype)

if isinstance(dtype, np.dtype):
return False
elif dtype is None:
return False
elif isinstance(dtype, cls):
return True
try:
return cls.construct_from_string(dtype) is not None
except TypeError:
return False
2 changes: 1 addition & 1 deletion pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1708,9 +1708,9 @@ def is_extension_array_dtype(arr_or_dtype):
"""
from pandas.core.arrays import ExtensionArray

# we want to unpack series, anything else?
if isinstance(arr_or_dtype, (ABCIndexClass, ABCSeries)):
arr_or_dtype = arr_or_dtype._values

return isinstance(arr_or_dtype, (ExtensionDtype, ExtensionArray))


Expand Down
25 changes: 0 additions & 25 deletions pandas/core/dtypes/dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,6 @@ def __hash__(self):
raise NotImplementedError("sub-classes should implement an __hash__ "
"method")

def __eq__(self, other):
raise NotImplementedError("sub-classes should implement an __eq__ "
"method")

def __ne__(self, other):
return not self.__eq__(other)

def __getstate__(self):
# pickle support; we don't want to pickle the cache
return {k: getattr(self, k, None) for k in self._metadata}
Expand All @@ -82,24 +75,6 @@ def reset_cache(cls):
""" clear the cache """
cls._cache = {}

@classmethod
def is_dtype(cls, dtype):
""" Return a boolean if the passed type is an actual dtype that
we can match (via string or type)
"""
if hasattr(dtype, 'dtype'):
dtype = dtype.dtype
if isinstance(dtype, np.dtype):
return False
elif dtype is None:
return False
elif isinstance(dtype, cls):
return True
try:
return cls.construct_from_string(dtype) is not None
except:
return False


class CategoricalDtypeType(type):
"""
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/dtypes/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ def _check(cls, inst):
ABCDateOffset = create_pandas_abc_type("ABCDateOffset", "_typ",
("dateoffset",))
ABCInterval = create_pandas_abc_type("ABCInterval", "_typ", ("interval", ))
ABCExtensionArray = create_pandas_abc_type("ABCExtensionArray", "_typ",
("extension", "categorical",))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't Categorical a first class EA?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we for Categorical we actually override the _type (what we say to external extension authors not to do), because we also have a ABCCategorical that needs that.



class _ABCGeneric(type):
Expand Down
Loading