-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Allow storing ExtensionArrays in containers #19520
Changes from 6 commits
b7069ef
9cd92c7
9211bbd
80f83a6
ca004d8
5d4a686
9f4ad42
b1db4e8
00d6bb3
1608e3d
52e2180
e6d06e2
d356f19
a6ae340
41f09d8
29cfd7c
05eb9f3
82fd0c6
8b1e7d6
10af4b6
cd5f1eb
0a9d9fd
3185f4e
ffa888a
5a59591
476f75d
b15ee5a
659073f
b15ecac
88b8f4f
349ac1a
27ab045
8358fb1
8ef34a9
340d11b
8297888
7accb67
9b8d2a5
9fbac29
55305dc
0e63708
fbbbc8a
46a0a49
2c4445a
5612cda
b012c19
d49e6aa
d7d31ee
7b89f1b
b0dbffd
66b936f
32ee0ef
a9882e2
f53652a
2425621
fc337de
6abe9da
512fb89
c5db5da
0b112f2
170d0c7
402620f
b556f83
268aabc
d671259
cb740ed
d9e8dd6
815d202
a727b21
f368c29
d74c5c9
0ec6958
8104ee5
4d08218
f8e29b9
0cd9faa
8fcdb70
34a6a22
b4de5c4
c233c28
d6e8051
3af8a21
3a5118b
1e8e87e
0f5e4f0
1436d1d
c4dab88
a312ba5
758689f
02c3d40
9e17037
4599db4
d34d9ca
29d2528
905d72e
412c951
78834f1
f8eac55
f09c863
cedb63d
cae2c26
8088096
8aed325
453728a
cc13c8d
f90ac07
4a03b26
635223f
cf423a7
2d1a66c
c849865
c3ec822
f4cf45c
9c5d479
1175c0d
c816d99
9c9f59e
08af9a3
2e992f7
cc5cc3e
704ee67
c262968
8bf0334
c8d88da
24f3b60
50bd5dd
879bc84
33c9d1f
f07c166
79d43b1
91c629b
a1ebf53
aa57cad
c82748c
e919343
1ea74da
0c41a34
009bece
ea5562b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
"""An interface for extending pandas with custom arrays.""" | ||
import numpy as np | ||
|
||
from pandas.errors import AbstractMethodError | ||
|
||
_not_implemented_message = "{} does not implement {}." | ||
|
@@ -18,19 +20,20 @@ class ExtensionArray(object): | |
|
||
* __getitem__ | ||
* __len__ | ||
* __iter__ | ||
* dtype | ||
* nbytes | ||
* isna | ||
* take | ||
* copy | ||
* _formatting_values | ||
* _concat_same_type | ||
|
||
Some additional methods are required to satisfy pandas' internal, private | ||
block API. | ||
|
||
* _concat_same_type | ||
* _can_hold_na | ||
* _formatting_values | ||
* _fill_value | ||
|
||
This class does not inherit from 'abc.ABCMeta' for performance reasons. | ||
Methods and properties required by the interface raise | ||
|
@@ -51,9 +54,6 @@ class ExtensionArray(object): | |
Extension arrays should be able to be constructed with instances of | ||
the class, i.e. ``ExtensionArray(extension_array)`` should return | ||
an instance, not error. | ||
|
||
Additionally, certain methods and interfaces are required for proper | ||
this array to be properly stored inside a ``DataFrame`` or ``Series``. | ||
""" | ||
# ------------------------------------------------------------------------ | ||
# Must be a Sequence | ||
|
@@ -177,9 +177,9 @@ def take(self, indexer, allow_fill=True, fill_value=None): | |
|
||
Examples | ||
-------- | ||
Suppose the extension array somehow backed by a NumPy structured array | ||
and that the underlying structured array is stored as ``self.data``. | ||
Then ``take`` may be written as | ||
Suppose the extension array somehow backed by a NumPy array and that | ||
the underlying structured array is stored as ``self.data``. Then | ||
``take`` may be written as | ||
|
||
.. code-block:: python | ||
|
||
|
@@ -219,7 +219,7 @@ def _formatting_values(self): | |
# type: () -> np.ndarray | ||
# At the moment, this has to be an array since we use result.dtype | ||
"""An array of values to be printed in, e.g. the Series repr""" | ||
raise AbstractMethodError(self) | ||
return np.array(self) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NO, this should be implemented only by subclasses. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a very sensible default IMO, subclasses can always override if needed |
||
|
||
@classmethod | ||
def _concat_same_type(cls, to_concat): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
"""Extend pandas with custom array types""" | ||
import inspect | ||
|
||
from pandas.errors import AbstractMethodError | ||
|
||
|
||
|
@@ -106,7 +108,8 @@ def is_dtype(cls, dtype): | |
|
||
Parameters | ||
---------- | ||
dtype : str or dtype | ||
dtype : str, object, or type | ||
The dtype to check. | ||
|
||
Returns | ||
------- | ||
|
@@ -118,12 +121,15 @@ def is_dtype(cls, dtype): | |
|
||
1. ``cls.construct_from_string(dtype)`` is an instance | ||
of ``cls``. | ||
2. 'dtype' is ``cls`` or a subclass of ``cls``. | ||
2. ``dtype`` is an object and is an instance of ``cls`` | ||
3. 'dtype' is a class and is ``cls`` or a subclass of ``cls``. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shouldn't this have double-ticks? |
||
""" | ||
if isinstance(dtype, str): | ||
try: | ||
return isinstance(cls.construct_from_string(dtype), cls) | ||
except TypeError: | ||
return False | ||
else: | ||
elif inspect.isclass(dtype): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you have 2 conditions the same here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
return issubclass(dtype, cls) | ||
else: | ||
return isinstance(dtype, cls) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,9 +10,10 @@ | |
is_datetimelike_v_numeric, is_float_dtype, | ||
is_datetime64_dtype, is_datetime64tz_dtype, | ||
is_timedelta64_dtype, is_interval_dtype, | ||
is_complex_dtype, is_categorical_dtype, | ||
is_complex_dtype, | ||
is_string_like_dtype, is_bool_dtype, | ||
is_integer_dtype, is_dtype_equal, | ||
is_extension_array_dtype, | ||
needs_i8_conversion, _ensure_object, | ||
pandas_dtype, | ||
is_scalar, | ||
|
@@ -52,12 +53,15 @@ def isna(obj): | |
|
||
|
||
def _isna_new(obj): | ||
from ..arrays import ExtensionArray | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls don't do this. rather define an ABCExtensionArray. instead. This is the purpose of these, to avoid smell like this. |
||
|
||
if is_scalar(obj): | ||
return libmissing.checknull(obj) | ||
# hack (for now) because MI registers as ndarray | ||
elif isinstance(obj, ABCMultiIndex): | ||
raise NotImplementedError("isna is not defined for MultiIndex") | ||
elif isinstance(obj, (ABCSeries, np.ndarray, ABCIndexClass)): | ||
elif isinstance(obj, (ABCSeries, np.ndarray, ABCIndexClass, | ||
ExtensionArray)): | ||
return _isna_ndarraylike(obj) | ||
elif isinstance(obj, ABCGeneric): | ||
return obj._constructor(obj._data.isna(func=isna)) | ||
|
@@ -124,17 +128,20 @@ def _use_inf_as_na(key): | |
|
||
|
||
def _isna_ndarraylike(obj): | ||
|
||
values = getattr(obj, 'values', obj) | ||
dtype = values.dtype | ||
|
||
if is_string_dtype(dtype): | ||
if is_categorical_dtype(values): | ||
from pandas import Categorical | ||
if not isinstance(values, Categorical): | ||
values = values.values | ||
result = values.isna() | ||
elif is_interval_dtype(values): | ||
if is_extension_array_dtype(obj): | ||
if isinstance(obj, ABCIndexClass): | ||
values = obj._as_best_array() | ||
elif isinstance(obj, ABCSeries): | ||
values = obj._values | ||
else: | ||
values = obj | ||
result = values.isna() | ||
elif is_string_dtype(dtype): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know it was already there, but I find the use of
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. instead of this, pls fix is_string_dtype (to exclude interval as well as period dtypes (you might be able to say object and not extension_type). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will fix itself once IntervalArray is a proper extension type. Added tests to confirm that extension types are not True for |
||
if is_interval_dtype(values): | ||
# TODO(IntervalArray): remove this if block | ||
from pandas import IntervalIndex | ||
result = IntervalIndex(obj).isna() | ||
else: | ||
|
@@ -406,4 +413,7 @@ def remove_na_arraylike(arr): | |
""" | ||
Return array-like containing only true/non-NaN values, possibly empty. | ||
""" | ||
return arr[notna(lib.values_from_object(arr))] | ||
if is_extension_array_dtype(arr): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ultimately values_from_object just calls .get_values() on the object if it exists. This smells like something is missing from the EA interface. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this if / elif can be avoided. |
||
return arr[notna(arr)] | ||
else: | ||
return arr[notna(lib.values_from_object(arr))] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,6 +39,7 @@ | |
is_categorical_dtype, | ||
is_object_dtype, | ||
is_extension_type, | ||
is_extension_array_dtype, | ||
is_datetimetz, | ||
is_datetime64_any_dtype, | ||
is_datetime64tz_dtype, | ||
|
@@ -71,7 +72,7 @@ | |
create_block_manager_from_arrays, | ||
create_block_manager_from_blocks) | ||
from pandas.core.series import Series | ||
from pandas.core.arrays import Categorical | ||
from pandas.core.arrays import Categorical, ExtensionArray | ||
import pandas.core.algorithms as algorithms | ||
from pandas.compat import (range, map, zip, lrange, lmap, lzip, StringIO, u, | ||
OrderedDict, raise_with_traceback) | ||
|
@@ -511,7 +512,7 @@ def _get_axes(N, K, index=index, columns=columns): | |
index, columns = _get_axes(len(values), 1) | ||
return _arrays_to_mgr([values], columns, index, columns, | ||
dtype=dtype) | ||
elif is_datetimetz(values): | ||
elif (is_datetimetz(values) or is_extension_array_dtype(values)): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if the dtype is an extension dtype? (like the check above for categorical) Similar for Series, we need to either define what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In [7]: pd.Series([0, 1, 2], dtype=cyberpandas.IPType())
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-7fc01964da1d> in <module>()
----> 1 pd.Series([0, 1, 2], dtype=cyberpandas.IPType())
~/sandbox/pandas-ip/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
246 else:
247 data = _sanitize_array(data, index, dtype, copy,
--> 248 raise_cast_failure=True)
249
250 data = SingleBlockManager(data, index, fastpath=True)
~/sandbox/pandas-ip/pandas/pandas/core/series.py in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
3214 if dtype is not None:
3215 try:
-> 3216 subarr = _try_cast(data, False)
3217 except Exception:
3218 if raise_cast_failure: # pragma: no cover
~/sandbox/pandas-ip/pandas/pandas/core/series.py in _try_cast(arr, take_fast_path)
3168 subarr = maybe_cast_to_datetime(arr, dtype)
3169 if not is_extension_type(subarr):
-> 3170 subarr = np.array(subarr, dtype=dtype, copy=copy)
3171 except (ValueError, TypeError):
3172 if is_categorical_dtype(dtype):
TypeError: data type not understood How should we handle this? I'm fine with raising with a better error message. We don't know how to cast from Although... maybe we could support Oh hey, that "works" In [10]: pd.Series(cyberpandas.IPArray([0, 1, 2]), dtype=cyberpandas.IPType())
Out[10]:
0 0.0.0.0
1 0.0.0.1
2 0.0.0.2
dtype: ip But it only works since we ignore the dtype entirely :) I will raise if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also for things like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. isn't a dateimetz an extension type now? why can't you remove the first part of the clauase? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope, not yet. Just |
||
# GH19157 | ||
if columns is None: | ||
columns = [0] | ||
|
@@ -2796,15 +2797,15 @@ def reindexer(value): | |
# now align rows | ||
value = reindexer(value).T | ||
|
||
elif isinstance(value, Categorical): | ||
elif isinstance(value, ExtensionArray): | ||
value = value.copy() | ||
|
||
elif isinstance(value, Index) or is_sequence(value): | ||
from pandas.core.series import _sanitize_index | ||
|
||
# turn me into an ndarray | ||
value = _sanitize_index(value, self.index, copy=False) | ||
if not isinstance(value, (np.ndarray, Index)): | ||
if not isinstance(value, (np.ndarray, Index, ExtensionArray)): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll rust remove this change until we support array-backed indexes. |
||
if isinstance(value, list) and len(value) > 0: | ||
value = maybe_convert_platform(value) | ||
else: | ||
|
@@ -2826,7 +2827,7 @@ def reindexer(value): | |
value = maybe_cast_to_datetime(value, value.dtype) | ||
|
||
# return internal types directly | ||
if is_extension_type(value): | ||
if is_extension_type(value) or is_extension_array_dtype(value): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is an unfortunate intermediate stage where some of our internal extension types (sparse, datetime w/ tz) are not yet actual extension array types. We'll be able to clean this up this eventually. |
||
return value | ||
|
||
# broadcast across multiple columns if necessary | ||
|
@@ -3355,12 +3356,9 @@ class max type | |
new_obj = self.copy() | ||
|
||
def _maybe_casted_values(index, labels=None): | ||
if isinstance(index, PeriodIndex): | ||
values = index.astype(object).values | ||
elif isinstance(index, DatetimeIndex) and index.tz is not None: | ||
values = index | ||
else: | ||
values = index.values | ||
values = index._as_best_array() | ||
# TODO: Check if nescessary... | ||
if not isinstance(index, (PeriodIndex, DatetimeIndex)): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what are you trying to do here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is identical to what happened before in the |
||
if values.dtype == np.object_: | ||
values = lib.maybe_convert_objects(values) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ | |
from pandas import compat | ||
|
||
from pandas.core.accessor import CachedAccessor | ||
from pandas.core.arrays import ExtensionArray | ||
from pandas.core.dtypes.generic import ( | ||
ABCSeries, ABCDataFrame, | ||
ABCMultiIndex, | ||
|
@@ -1038,6 +1039,31 @@ def _to_embed(self, keep_tz=False, dtype=None): | |
|
||
return self.values.copy() | ||
|
||
def _as_best_array(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I found a need for a method like this. It may be good to add to Series as well, and maybe make public. Essentially, we want a clear way to get the "best" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (this will also depend on decision if we will change return value for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think so. Maybe I should re-use that convention. Unfortunately, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes ths should be equiv of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opened #19548 |
||
# type: () -> Union[ExtensionArray, ndarary] | ||
"""Return the underlying values as the best array type. | ||
|
||
Indexes backed by ExtensionArrays will return the ExtensionArray. | ||
Otherwise, an ndarray is returned. | ||
|
||
Examples | ||
-------- | ||
>>> pd.Index([0, 1, 2])._as_best_array() | ||
array([0, 1, 2]) | ||
|
||
>>> pd.CategoricalIndex(['a', 'a', 'b'])._as_best_array() | ||
[a, a, b] | ||
Categories (2, object): [a, b] | ||
|
||
>>> pd.IntervalIndex.from_breaks([0, 1, 2])._as_best_array() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Whoops, this example isn't true (yet TomAugspurger/pandas@pandas-array-upstream+fu1...TomAugspurger:pandas-array-upstream+fu1+interval) |
||
IntervalArray([(0, 1], (1, 2]]) | ||
""" | ||
# We need this since CategoricalIndex.values -> Categorical | ||
# but IntervalIndex.values -> ndarray[object] | ||
# TODO: IntervalIndex defines _array_values. Would be nice to | ||
# have an unambiguous way of getting an ndarray (or just use asarray?) | ||
return self.values | ||
|
||
_index_shared_docs['astype'] = """ | ||
Create an Index with values cast to dtypes. The class of a new Index | ||
is determined by dtype. When conversion is impossible, a ValueError | ||
|
@@ -1946,6 +1972,12 @@ def _format_with_header(self, header, na_rep='NaN', **kwargs): | |
|
||
if is_categorical_dtype(values.dtype): | ||
values = np.array(values) | ||
|
||
elif isinstance(values, ExtensionArray): | ||
# This is still un-exercised within pandas, since all our | ||
# extension dtypes have custom indexes. | ||
values = values._formatting_values() | ||
|
||
elif is_object_dtype(values.dtype): | ||
values = lib.maybe_convert_objects(values, safe=1) | ||
|
||
|
@@ -2525,7 +2557,7 @@ def get_value(self, series, key): | |
# if we have something that is Index-like, then | ||
# use this, e.g. DatetimeIndex | ||
s = getattr(series, '_values', None) | ||
if isinstance(s, Index) and is_scalar(key): | ||
if isinstance(s, (ExtensionArray, Index)) and is_scalar(key): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is_array_like might work here |
||
try: | ||
return s[key] | ||
except (IndexError, ValueError): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,7 +56,10 @@ | |
is_null_datelike_scalar) | ||
import pandas.core.dtypes.concat as _concat | ||
|
||
from pandas.core.dtypes.generic import ABCSeries, ABCDatetimeIndex | ||
from pandas.core.dtypes.generic import ( | ||
ABCSeries, | ||
ABCDatetimeIndex, | ||
ABCIndexClass) | ||
import pandas.core.common as com | ||
import pandas.core.algorithms as algos | ||
|
||
|
@@ -1854,6 +1857,20 @@ class ExtensionBlock(NonConsolidatableMixIn, Block): | |
|
||
ExtensionArrays are limited to 1-D. | ||
""" | ||
|
||
def __init__(self, values, placement, ndim=None): | ||
values = self._maybe_coerce_values(values) | ||
super(ExtensionBlock, self).__init__(values, placement, ndim) | ||
|
||
def _maybe_coerce_values(self, values): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. xref #19492 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this called anywhere else? (e.g. IOW I think can eliminate some code if maybe this was defined in the superclass) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, this is only called in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eventually Block.init will call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok |
||
# Unboxes Series / Index | ||
# Doesn't change any underlying dtypes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this not just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you make a real doc-string There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, since there's no parent. |
||
if isinstance(values, ABCSeries): | ||
values = values.values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry it's hit in |
||
elif isinstance(values, ABCIndexClass): | ||
values = values._as_best_array() | ||
return values | ||
|
||
@property | ||
def _holder(self): | ||
# For extension blocks, the holder is values-dependent. | ||
|
@@ -4101,7 +4118,8 @@ def set(self, item, value, check=False): | |
# FIXME: refactor, clearly separate broadcasting & zip-like assignment | ||
# can prob also fix the various if tests for sparse/categorical | ||
|
||
value_is_extension_type = is_extension_type(value) | ||
value_is_extension_type = (is_extension_type(value) or | ||
is_extension_array_dtype(value)) | ||
|
||
# categorical/spares/datetimetz | ||
if value_is_extension_type: | ||
|
@@ -4834,13 +4852,10 @@ def form_blocks(arrays, names, axes): | |
if len(items_dict['ExtensionBlock']): | ||
|
||
external_blocks = [] | ||
|
||
for i, _, array in items_dict['ExtensionBlock']: | ||
if isinstance(array, ABCSeries): | ||
array = array.values | ||
# Allow our internal arrays to chose their block type. | ||
block_type = getattr(array, '_block_type', ExtensionBlock) | ||
external_blocks.append( | ||
make_block(array, klass=block_type, | ||
make_block(array, klass=ExtensionBlock, | ||
fastpath=True, placement=[i])) | ||
blocks.extend(external_blocks) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche we removed this earlier since collections implementing
__len__
and__getitem__
are iterable. But (strangely?)isinstance(arr, collections.Iterable)
is False, so these aren't considered "list-like" unless they actually implement__iter__
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we use this somewhere? (the expectation that isinstance(arr, collections.Iterable) is True)
As the docs https://docs.python.org/3/library/collections.abc.html#collections.abc.Iterable says this is not a correct way to test that something is iterable, exactly for the reason you just mentioned.
Anyway, if we add it, I would add a default implementation (
return iter(self)
?) and not require subclasses to provide one manually.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry. We use it in
Series.__init__
, viais_list_like(data)
.Will add the default implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait,
iter(self)
will cause infinite recursion, and we can't assume anything about how the data is stored. That's unfortunate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could fall back to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could fix
is_list_like
..Why would it cause infinite recursion?
With
iter(self)
we don't assume anything about how the data is stored, I don't know how iter exactly works under the hood, but I assume it just uses__getitem__
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to #17654 we did that, but it was fragile to things like
str
(the class, not an instance) being considered iterable. Not sure if changing it back is an improvement.will call
self.__iter__
, which callsself.__iter__
, ...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yes of course :-)
Anyhow, the other way is a fine default implementation