-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrayManager] REF: Implement concat with reindexing #39612
Changes from 1 commit
6cd1c4d
73d9de2
272d674
555d7ac
ebad8a4
ee495e5
19c7f75
42e1b05
724be3e
db3f0ed
a2aa388
6bdd175
910e1fe
cab90f6
c22a010
eec0161
6c69869
04ead63
427b6f4
8c10a53
ec5bd11
f0061f7
9ba8854
ad61f2f
d960619
a3c2662
0fafb1a
f67e9e2
f655e33
d21bd3a
9435c39
22ea7d2
77b05f4
81d0954
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -5,14 +5,19 @@ | |||||||||
|
||||||||||
import numpy as np | ||||||||||
|
||||||||||
from pandas._libs import NaT, lib | ||||||||||
from pandas._typing import ArrayLike, DtypeObj | ||||||||||
|
||||||||||
from pandas.core.dtypes.cast import find_common_type | ||||||||||
from pandas.core.dtypes.common import ( | ||||||||||
is_bool_dtype, | ||||||||||
is_categorical_dtype, | ||||||||||
is_datetime64_ns_dtype, | ||||||||||
is_dtype_equal, | ||||||||||
is_extension_array_dtype, | ||||||||||
is_integer_dtype, | ||||||||||
is_sparse, | ||||||||||
is_timedelta64_ns_dtype, | ||||||||||
) | ||||||||||
from pandas.core.dtypes.generic import ABCCategoricalIndex, ABCSeries | ||||||||||
|
||||||||||
|
@@ -21,11 +26,78 @@ | |||||||||
from pandas.core.construction import array, ensure_wrapped_if_datetimelike | ||||||||||
|
||||||||||
|
||||||||||
class NullArrayProxy: | ||||||||||
""" | ||||||||||
Proxy object for an all-NA array. | ||||||||||
|
||||||||||
Only stores the length of the array, and not the dtype. The dtype | ||||||||||
will only be known when actually concatenating (after determining the | ||||||||||
common dtype, for which this proxy is ignored). | ||||||||||
Using this object avoids that the internals/concat.py needs to determine | ||||||||||
the proper dtype and array type. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is strictly speaking not fully correct, as Was there something unclear in the original sentence that I can try to improve otherwise? |
||||||||||
""" | ||||||||||
|
||||||||||
ndim = 1 | ||||||||||
|
||||||||||
def __init__(self, n: int): | ||||||||||
self.n = n | ||||||||||
|
||||||||||
@property | ||||||||||
def shape(self): | ||||||||||
return (self.n,) | ||||||||||
|
||||||||||
|
||||||||||
def _array_from_proxy(arr, dtype: DtypeObj, fill_value=lib.no_default): | ||||||||||
""" | ||||||||||
Helper function to create the actual all-NA array from the NullArrayProxy object. | ||||||||||
|
||||||||||
Parameters | ||||||||||
---------- | ||||||||||
arr : NullArrayProxy | ||||||||||
dtype : the dtype for the resulting array | ||||||||||
fill_value : scalar NA-like value | ||||||||||
By default uses the ExtensionDtype's na_value or np.nan. For numpy | ||||||||||
arrays, this can be overridden to be something else (eg None). | ||||||||||
|
||||||||||
Returns | ||||||||||
------- | ||||||||||
np.ndarray or ExtensionArray | ||||||||||
""" | ||||||||||
if is_extension_array_dtype(dtype): | ||||||||||
return dtype.construct_array_type()._from_sequence( | ||||||||||
[dtype.na_value] * arr.n, dtype=dtype | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should maybe consider adding a method to the ExtensionArray interface to create an empty/all-NA array of the given dtype, to avoid this rather inefficient "construct from list of NA scalars". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 on this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC this can break for dtypes that cant hold NA. the more robust version is
(in internals concat we use this in one place and when i tried to change it to what you have here something broke, dont remember what off the top of my head) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have an example dtype that cannot hold NAs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. something like Sparse[int] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed:
Will use your suggested code for now There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I opened #39776 for the idea of adding a method to the EA interface for creating an empty/all-NA array from a given dtype. |
||||||||||
) | ||||||||||
elif is_datetime64_ns_dtype(dtype): | ||||||||||
from pandas.core.arrays import DatetimeArray | ||||||||||
|
||||||||||
return DatetimeArray._from_sequence([NaT] * arr.n, dtype=dtype) | ||||||||||
elif is_timedelta64_ns_dtype(dtype): | ||||||||||
from pandas.core.arrays import TimedeltaArray | ||||||||||
|
||||||||||
return TimedeltaArray._from_sequence([NaT] * arr.n, dtype=dtype) | ||||||||||
else: | ||||||||||
if is_integer_dtype(dtype): | ||||||||||
dtype = "float64" | ||||||||||
fill_value = np.nan | ||||||||||
elif is_bool_dtype(dtype): | ||||||||||
dtype = object | ||||||||||
|
||||||||||
if fill_value is lib.no_default: | ||||||||||
fill_value = np.nan | ||||||||||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
arr = np.empty(arr.n, dtype=dtype) | ||||||||||
arr.fill(fill_value) | ||||||||||
return arr | ||||||||||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
|
||||||||||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
def _cast_to_common_type(arr: ArrayLike, dtype: DtypeObj) -> ArrayLike: | ||||||||||
""" | ||||||||||
Helper function for `arr.astype(common_dtype)` but handling all special | ||||||||||
cases. | ||||||||||
""" | ||||||||||
if isinstance(arr, NullArrayProxy): | ||||||||||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
return _array_from_proxy(arr, dtype) | ||||||||||
|
||||||||||
if ( | ||||||||||
is_categorical_dtype(arr.dtype) | ||||||||||
and isinstance(dtype, np.dtype) | ||||||||||
|
@@ -132,6 +204,75 @@ def is_nonempty(x) -> bool: | |||||||||
return np.concatenate(to_concat, axis=axis) | ||||||||||
|
||||||||||
|
||||||||||
def concat_arrays(to_concat): | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. types if you can |
||||||||||
""" | ||||||||||
Alternative for concat_compat but specialized for use in the ArrayManager. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we have different impl for array vs block manager then let's push this code down into those areas (so its localized). ok for a followup. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the problem is the logic is too scattered and very hard to grok where things are happening. |
||||||||||
|
||||||||||
Differences: only deals with 1D arrays (no axis keyword) and does not skip | ||||||||||
empty arrays to determine the dtype. | ||||||||||
In addition ensures that all NullArrayProxies get replaced with actual | ||||||||||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
arrays. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This But that resulted in quite some special cases inside |
||||||||||
|
||||||||||
Parameters | ||||||||||
---------- | ||||||||||
to_concat : list of arrays | ||||||||||
|
||||||||||
Returns | ||||||||||
------- | ||||||||||
np.ndarray or ExtensionArray | ||||||||||
""" | ||||||||||
# ignore the all-NA proxies to determine the resulting dtype | ||||||||||
to_concat_no_proxy = [x for x in to_concat if not isinstance(x, NullArrayProxy)] | ||||||||||
|
||||||||||
kinds = {obj.dtype.kind for obj in to_concat_no_proxy} | ||||||||||
single_dtype = len({x.dtype for x in to_concat_no_proxy}) == 1 | ||||||||||
any_ea = any(is_extension_array_dtype(x.dtype) for x in to_concat_no_proxy) | ||||||||||
|
||||||||||
if any_ea: | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this looks really similar to the non-AM code. it cant be shared? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See my original comment about it at #39612 (comment) Yes, it's really similar. But it's also slightly different in many places. Now, when I tried this originally, I only checked for the null proxy object when needed (giving lots of if/elses, making it hard to read). But of course I could also simply always check for it. And for the non-ArrayManager cases this proxy object will never be present, so apart from doing a bunch of unnecessary checks, it shouldn't matter. If you are OK with complicating the base |
||||||||||
if not single_dtype: | ||||||||||
target_dtype = find_common_type([x.dtype for x in to_concat_no_proxy]) | ||||||||||
to_concat = [_cast_to_common_type(arr, target_dtype) for arr in to_concat] | ||||||||||
else: | ||||||||||
target_dtype = to_concat_no_proxy[0].dtype | ||||||||||
to_concat = [ | ||||||||||
_array_from_proxy(arr, target_dtype) | ||||||||||
if isinstance(arr, NullArrayProxy) | ||||||||||
else arr | ||||||||||
for arr in to_concat | ||||||||||
] | ||||||||||
|
||||||||||
if isinstance(to_concat[0], ExtensionArray): | ||||||||||
cls = type(to_concat[0]) | ||||||||||
return cls._concat_same_type(to_concat) | ||||||||||
else: | ||||||||||
return np.concatenate(to_concat) | ||||||||||
|
||||||||||
elif any(kind in ["m", "M"] for kind in kinds): | ||||||||||
return _concat_datetime(to_concat) | ||||||||||
|
||||||||||
if not single_dtype: | ||||||||||
target_dtype = np.find_common_type( | ||||||||||
[arr.dtype for arr in to_concat_no_proxy], [] | ||||||||||
) | ||||||||||
else: | ||||||||||
target_dtype = to_concat_no_proxy[0].dtype | ||||||||||
to_concat = [ | ||||||||||
_array_from_proxy(arr, target_dtype) if isinstance(arr, NullArrayProxy) else arr | ||||||||||
for arr in to_concat | ||||||||||
] | ||||||||||
|
||||||||||
result = np.concatenate(to_concat) | ||||||||||
|
||||||||||
# TODO(ArrayManager) this is currently inconsistent between Series and DataFrame | ||||||||||
# so we should decide whether to keep the below special case or remove it | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this behavior consistent between AM and BM? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, not sure anymore where I saw this, as for both BlockManager/ArrayManager this gives object dtype with both Series or DataFrame (at least with latest version of this PR). So will remove the comment .. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, found my notes about this: this is not directly related to Block vs ArrayManager, but something I noticed here that is inconsistent between Series vs DataFrame and empty vs non-empty. For Series, with non-empty, we actually coerce to numeric, while if we have empty bool + float, we get object dtype:
While DataFrame uses object dtype in both cases. (will open a separate issue about it) But so the reason this came up specifically for ArrayManager, is that we now follow the same rules for Series/DataFrame, and thus also get numeric for non-empty for the DataFrame case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opened #39817 about this |
||||||||||
if len(result) == 0: | ||||||||||
# all empties -> check for bool to not coerce to float | ||||||||||
if len(kinds) != 1: | ||||||||||
if "b" in kinds: | ||||||||||
result = result.astype(object) | ||||||||||
return result | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see my comments above, the non-internals code should ideally not have these hacks here. This makes it really hard to follow anything. Can you do something about this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shall I simply cut and paste this function to (note that my question is slightly ironic, because it doesn't change anything fundamentally, just about semantics what is considered "internal". But I don't really understand what the problem here is). |
||||||||||
|
||||||||||
|
||||||||||
def union_categoricals( | ||||||||||
to_union, sort_categories: bool = False, ignore_order: bool = False | ||||||||||
): | ||||||||||
|
@@ -322,20 +463,35 @@ def _concat_datetime(to_concat, axis=0): | |||||||||
a single array, preserving the combined dtypes | ||||||||||
""" | ||||||||||
to_concat = [ensure_wrapped_if_datetimelike(x) for x in to_concat] | ||||||||||
to_concat_no_proxy = [x for x in to_concat if not isinstance(x, NullArrayProxy)] | ||||||||||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
single_dtype = len({x.dtype for x in to_concat}) == 1 | ||||||||||
single_dtype = len({x.dtype for x in to_concat_no_proxy}) == 1 | ||||||||||
|
||||||||||
# multiple types, need to coerce to object | ||||||||||
if not single_dtype: | ||||||||||
# ensure_wrapped_if_datetimelike ensures that astype(object) wraps | ||||||||||
# in Timestamp/Timedelta | ||||||||||
to_concat = [ | ||||||||||
_array_from_proxy(arr, dtype=object, fill_value=None) | ||||||||||
if isinstance(arr, NullArrayProxy) | ||||||||||
else arr | ||||||||||
for arr in to_concat | ||||||||||
] | ||||||||||
|
||||||||||
return _concatenate_2d([x.astype(object) for x in to_concat], axis=axis) | ||||||||||
|
||||||||||
if axis == 1: | ||||||||||
# TODO(EA2D): kludge not necessary with 2D EAs | ||||||||||
to_concat = [x.reshape(1, -1) if x.ndim == 1 else x for x in to_concat] | ||||||||||
else: | ||||||||||
to_concat = [ | ||||||||||
_array_from_proxy(arr, dtype=to_concat_no_proxy[0].dtype) | ||||||||||
if isinstance(arr, NullArrayProxy) | ||||||||||
else arr | ||||||||||
for arr in to_concat | ||||||||||
] | ||||||||||
|
||||||||||
result = type(to_concat[0])._concat_same_type(to_concat, axis=axis) | ||||||||||
result = type(to_concat_no_proxy[0])._concat_same_type(to_concat, axis=axis) | ||||||||||
|
||||||||||
if result.ndim == 2 and is_extension_array_dtype(result.dtype): | ||||||||||
# TODO(EA2D): kludge not necessary with 2D EAs | ||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ | |
is_extension_array_dtype, | ||
is_numeric_dtype, | ||
) | ||
from pandas.core.dtypes.concat import NullArrayProxy | ||
from pandas.core.dtypes.dtypes import ExtensionDtype, PandasDtype | ||
from pandas.core.dtypes.generic import ABCDataFrame, ABCSeries | ||
from pandas.core.dtypes.missing import isna | ||
|
@@ -725,10 +726,20 @@ def reindex_indexer( | |
# ignored keywords | ||
consolidate: bool = True, | ||
only_slice: bool = False, | ||
# ArrayManager specific keywords | ||
do_integrity_check=True, | ||
use_na_proxy=False, | ||
) -> T: | ||
axis = self._normalize_axis(axis) | ||
return self._reindex_indexer( | ||
new_axis, indexer, axis, fill_value, allow_dups, copy | ||
new_axis, | ||
indexer, | ||
axis, | ||
fill_value, | ||
allow_dups, | ||
copy, | ||
do_integrity_check, | ||
use_na_proxy, | ||
) | ||
|
||
def _reindex_indexer( | ||
|
@@ -739,6 +750,8 @@ def _reindex_indexer( | |
fill_value=None, | ||
allow_dups: bool = False, | ||
copy: bool = True, | ||
do_integrity_check=True, | ||
use_na_proxy=False, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. annotate |
||
) -> T: | ||
""" | ||
Parameters | ||
|
@@ -773,7 +786,9 @@ def _reindex_indexer( | |
new_arrays = [] | ||
for i in indexer: | ||
if i == -1: | ||
arr = self._make_na_array(fill_value=fill_value) | ||
arr = self._make_na_array( | ||
fill_value=fill_value, use_na_proxy=use_na_proxy | ||
) | ||
else: | ||
arr = self.arrays[i] | ||
new_arrays.append(arr) | ||
|
@@ -793,7 +808,7 @@ def _reindex_indexer( | |
new_axes = list(self._axes) | ||
new_axes[axis] = new_axis | ||
|
||
return type(self)(new_arrays, new_axes) | ||
return type(self)(new_arrays, new_axes, do_integrity_check=do_integrity_check) | ||
|
||
def take(self, indexer, axis: int = 1, verify: bool = True, convert: bool = True): | ||
""" | ||
|
@@ -820,10 +835,11 @@ def take(self, indexer, axis: int = 1, verify: bool = True, convert: bool = True | |
new_axis=new_labels, indexer=indexer, axis=axis, allow_dups=True | ||
) | ||
|
||
def _make_na_array(self, fill_value=None): | ||
def _make_na_array(self, fill_value=None, use_na_proxy=False): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need ArrayLike to include NullArrayProxy? |
||
if use_na_proxy: | ||
return NullArrayProxy(self.shape_proper[0]) | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if fill_value is None: | ||
fill_value = np.nan | ||
|
||
dtype, fill_value = infer_dtype_from_scalar(fill_value) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is the rest of this method ndarray-only? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes |
||
values = np.empty(self.shape_proper[0], dtype=dtype) | ||
values.fill(fill_value) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,7 +23,7 @@ | |
is_sparse, | ||
is_timedelta64_dtype, | ||
) | ||
from pandas.core.dtypes.concat import concat_compat | ||
from pandas.core.dtypes.concat import concat_arrays, concat_compat | ||
from pandas.core.dtypes.missing import isna_all | ||
|
||
import pandas.core.algorithms as algos | ||
|
@@ -37,6 +37,45 @@ | |
from pandas.core.arrays.sparse.dtype import SparseDtype | ||
|
||
|
||
def concatenate_array_managers( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above |
||
mgrs_indexers, axes: List[Index], concat_axis: int, copy: bool | ||
) -> Manager: | ||
""" | ||
Concatenate array managers into one. | ||
|
||
Parameters | ||
---------- | ||
mgrs_indexers : list of (ArrayManager, {axis: indexer,...}) tuples | ||
axes : list of Index | ||
concat_axis : int | ||
copy : bool | ||
|
||
Returns | ||
------- | ||
ArrayManager | ||
""" | ||
# reindex all arrays | ||
mgrs = [] | ||
for mgr, indexers in mgrs_indexers: | ||
for ax, indexer in indexers.items(): | ||
mgr = mgr.reindex_indexer( | ||
axes[ax], indexer, axis=ax, do_integrity_check=False, use_na_proxy=True | ||
) | ||
mgrs.append(mgr) | ||
|
||
# concatting along the rows -> concat the reindexed arrays | ||
if concat_axis == 1: | ||
arrays = [ | ||
concat_arrays([mgrs[i].arrays[j] for i in range(len(mgrs))]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you now remove concat_compat? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and reading your comment below, why is this not in array manger if its only used there? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Because this is the code for concatting managers, which for BlockManager also resides in internals/concat.py? |
||
for j in range(len(mgrs[0].arrays)) | ||
] | ||
return ArrayManager(arrays, [axes[1], axes[0]], do_integrity_check=False) | ||
# concatting along the columns -> combine reindexed arrays in a single manager | ||
elif concat_axis == 0: | ||
arrays = list(itertools.chain.from_iterable([mgr.arrays for mgr in mgrs])) | ||
return ArrayManager(arrays, [axes[1], axes[0]], do_integrity_check=False) | ||
|
||
|
||
def concatenate_block_managers( | ||
mgrs_indexers, axes: List[Index], concat_axis: int, copy: bool | ||
) -> Manager: | ||
|
@@ -55,19 +94,7 @@ def concatenate_block_managers( | |
BlockManager | ||
""" | ||
if isinstance(mgrs_indexers[0][0], ArrayManager): | ||
|
||
if concat_axis == 1: | ||
# TODO for now only fastpath without indexers | ||
mgrs = [t[0] for t in mgrs_indexers] | ||
arrays = [ | ||
concat_compat([mgrs[i].arrays[j] for i in range(len(mgrs))], axis=0) | ||
for j in range(len(mgrs[0].arrays)) | ||
] | ||
return ArrayManager(arrays, [axes[1], axes[0]]) | ||
elif concat_axis == 0: | ||
mgrs = [t[0] for t in mgrs_indexers] | ||
arrays = list(itertools.chain.from_iterable([mgr.arrays for mgr in mgrs])) | ||
return ArrayManager(arrays, [axes[1], axes[0]]) | ||
return concatenate_array_managers(mgrs_indexers, axes, concat_axis, copy) | ||
|
||
concat_plans = [ | ||
_get_mgr_concatenation_plan(mgr, indexers) for mgr, indexers in mgrs_indexers | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another option if we dont want a whole new thing would be to use
np.empty(shape, dtype="V")
(which has obj.nbytes = 0)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's indeed an option as well. It wouldn't really simplify code as we still need all of what is now in
to_array
and all the checking for this object/dtype inconcat_arrays
, but would indeed avoid the custom class.(Personally, I find the custom class a bit more explicit as using a dtype we otherwise don't use, so would choose that, but I am fine either way)