-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add duplicated/drop_duplicates to Index #7979
Conversation
@@ -443,6 +444,53 @@ def searchsorted(self, key, side='left'): | |||
#### needs tests/doc-string | |||
return self.values.searchsorted(key, side=side) | |||
|
|||
def drop_duplicates(self, take_last=False, inplace=False): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these need to raise if inplace and it's anindex (as they are immutable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note - can u audit existing methods in indexOps for using inplace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be no func accepts inplace
other than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, gr8. still I think putting the check on update_inplace
might be good
|
||
if inplace: | ||
from pandas.core.index import Index | ||
if isinstance(self, Index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better is to have update_inplace
in core/base.py
that simply raises if its an Index (I think this would be overriden by the update_inplace
in core/generic.py
and so other sub-classes won't see it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I think adding update_inplace
to Index
is clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh yes, that would be better (though maybe add as a NotIMplemented to OpsMixIn just as a place holder for the abstract methdos)
@@ -469,6 +470,54 @@ def searchsorted(self, key, side='left'): | |||
#### needs tests/doc-string | |||
return self.values.searchsorted(key, side=side) | |||
|
|||
def drop_duplicates(self, take_last=False, inplace=False): | |||
""" | |||
Return Series or Index with duplicate values removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the Series or Index
, could you do something like in generic.py with substitution of klass
name so that only Series
or Index
shows up in the respective docstring?
@jreback, @jorisvandenbossche Considering both comments and fixed. Defining So defined common logic in |
try: | ||
return self._constructor(duplicated, | ||
index=self.index).__finalize__(self) | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very awkward to do. Maybe just put the immutable definition in base and override the definition in series. prob simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fixed to centralize the logic to IndexOpsMixin
. Even though update_inplace
is defined in both IndexOpsMixin
and Index
, it will never called in drop_duplicates
case (Index.drop_duplicates
blocks inplace
kw, and it is better for proper docstring
)
Just as a usage question, what do we envisage as the 'recommended' way to drop duplicate indices from a DataFrame (where you now had to say the somewhat unintuitive
or
although these are even longer than the groupby .. |
unchanged, the first is best (this is for the Index to be compatible) |
@jorisvandenbossche 's point is #2825. Though I feel |
the doc string for I would change this around. Why don't you just have Then in Index/Series put in the doc-strings (and inplace for Series)? |
ok, that's fine then. ping hwne ready |
Thanks to confirm. Now green. |
ENH: Add duplicated/drop_duplicates to Index
Closes #4060.