-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: remove methods of ExtensionIndex that duplicate base Index #34163
CLN: remove methods of ExtensionIndex that duplicate base Index #34163
Conversation
@@ -223,29 +219,14 @@ def __getitem__(self, key): | |||
deprecate_ndim_indexing(result) | |||
return result | |||
|
|||
def __iter__(self): | |||
return self._data.__iter__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be equivalent to
Lines 1034 to 1051 in 9c08fe1
def __iter__(self): | |
""" | |
Return an iterator of the values. | |
These are each a scalar type, which is a Python scalar | |
(for str, int, float) or a pandas scalar | |
(for Timestamp/Timedelta/Interval/Period) | |
Returns | |
------- | |
iterator | |
""" | |
# We are explicitly making element iterators. | |
if not isinstance(self._values, np.ndarray): | |
# Check type instead of dtype to catch DTA/TDA | |
return iter(self._values) | |
else: | |
return map(self._values.item, range(self._values.size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure (alternatively could remove L1047-1049 from the base class implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively could remove L1047-1049 from the base class implementation
No, because Series also uses that
# --------------------------------------------------------------------- | ||
|
||
def __array__(self, dtype=None) -> np.ndarray: | ||
return np.asarray(self._data, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is identical with the Index one
|
||
if self.hasnans: | ||
return self._shallow_copy(self._data[~self._isnan]) | ||
return self._shallow_copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference here with the Index one is the use of self._data
vs self._values
, which as far as I know shouldn't matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if it matters for this method, but the distinction would matter for MultiIndex, which does not have _data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, MultiIndex actually overrides dropna
, so that shouldn't even matter here.
But indeed, in general it's only for MI that using _values
vs _data
matters, for all the others it's the same (I think?), which I suppose is the reason in the base class there is more usage of _values
.
fill_value=fill_value, | ||
na_value=self._na_value, | ||
) | ||
return type(self)(taken, name=self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here ( self._data
vs self._values
), and the base class just has an extra if not self._can_hold_na:
branch
pandas/pandas/core/indexes/base.py
Lines 688 to 703 in 9c08fe1
if self._can_hold_na: | |
taken = self._assert_take_fillable( | |
self._values, | |
indices, | |
allow_fill=allow_fill, | |
fill_value=fill_value, | |
na_value=self._na_value, | |
) | |
else: | |
if allow_fill and fill_value is not None: | |
cls_name = type(self).__name__ | |
raise ValueError( | |
f"Unable to fill values because {cls_name} cannot contain NA" | |
) | |
taken = self._values.take(indices) | |
return self._shallow_copy(taken) |
self._validate_index_level(level) | ||
|
||
result = self._data.unique() | ||
return self._shallow_copy(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base class ends up dispatching to IndexOpsMixin
unique:
Lines 1257 to 1270 in 9c08fe1
def unique(self): | |
values = self._values | |
if hasattr(values, "unique"): | |
result = values.unique() | |
if self.dtype.kind in ["m", "M"] and isinstance(self, ABCSeries): | |
# GH#31182 Series._values returns EA, unpack for backward-compat | |
if getattr(self.dtype, "tz", None) is None: | |
result = np.asarray(result) | |
else: | |
result = unique1d(values) | |
return result |
The hasattr(values, "unique")
could probably be made more explicit / cleaner to check for EA, but basically this should also be the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could probably be made more explicit / cleaner
sounds worthwhile, yah
@jbrockmendel all good? |
pandas/core/base.py
Outdated
@@ -1257,8 +1257,7 @@ def value_counts( | |||
def unique(self): | |||
values = self._values | |||
|
|||
if hasattr(values, "unique"): | |||
|
|||
if not isinstance(self._values, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can re-use values
here
yep |
xref #34159
cc @jbrockmendel