Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecated Index.get_duplicates() #20544

Merged
merged 8 commits into from
Apr 24, 2018
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -831,6 +831,7 @@ Deprecations
- ``pandas.tseries.plotting.tsplot`` is deprecated. Use :func:`Series.plot` instead (:issue:`18627`)
- ``Index.summary()`` is deprecated and will be removed in a future version (:issue:`18217`)
- ``NDFrame.get_ftype_counts()`` is deprecated and will be removed in a future version (:issue:`18243`)
- ``Index.get_duplicates()`` is deprecated and will be removed in a future version (:issue:`20239`)

.. _whatsnew_0230.prior_deprecations:

Expand Down
13 changes: 11 additions & 2 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1824,6 +1824,9 @@ def get_duplicates(self):
Returns a sorted list of index elements which appear more than once in
the index.

.. deprecated:: 0.23.0
Use idx[idx.duplicated()].unique() instead

Returns
-------
array-like
Expand Down Expand Up @@ -1870,14 +1873,20 @@ def get_duplicates(self):
>>> pd.Index(dates).get_duplicates()
DatetimeIndex([], dtype='datetime64[ns]', freq=None)
"""
warnings.warn("'get_duplicates' is deprecated and will be removed in "
"a future release. You can use "
"idx[idx.duplicated()].unique() instead",
FutureWarning, stacklevel=2)

return self._get_duplicates()

def _get_duplicates(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would rather change the impl itself here as well to the suggested one

from collections import defaultdict
counter = defaultdict(lambda: 0)
for k in self.values:
counter[k] += 1
return sorted(k for k, v in compat.iteritems(counter) if v > 1)

_get_duplicates = get_duplicates

def _cleanup(self):
self._engine.clear_mapping()

Expand Down
5 changes: 5 additions & 0 deletions pandas/tests/indexes/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2061,6 +2061,11 @@ def test_cached_properties_not_settable(self):
with tm.assert_raises_regex(AttributeError, "Can't set attribute"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to remove the usage from all tests or catch the warnings

(pandas) bash-3.2$ grep -r get_duplicates pandas
pandas/core/reshape/concat.py:                overlap = concat_index.get_duplicates()
Binary file pandas/core/reshape/__pycache__/concat.cpython-36.pyc matches
Binary file pandas/core/__pycache__/frame.cpython-36.pyc matches
pandas/core/frame.py:            duplicates = index.get_duplicates()
Binary file pandas/core/indexes/__pycache__/datetimelike.cpython-36.pyc matches
Binary file pandas/core/indexes/__pycache__/base.cpython-36.pyc matches
pandas/core/indexes/datetimelike.py:    def get_duplicates(self):
pandas/core/indexes/datetimelike.py:        values = Index.get_duplicates(self)
pandas/core/indexes/base.py:    def get_duplicates(self):
pandas/core/indexes/base.py:        >>> pd.Index([1, 2, 2, 3, 3, 3, 4]).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index([1., 2., 2., 3., 3., 3., 4.]).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index(['a', 'b', 'b', 'c', 'c', 'c', 'd']).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index(dates).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index([1, 2, 3, 2, 3, 4, 3]).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index([1, 2, 3, 4]).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index(dates).get_duplicates()
pandas/core/indexes/base.py:    _get_duplicates = get_duplicates
Binary file pandas/tests/indexes/__pycache__/test_multi.cpython-36-PYTEST.pyc matches
Binary file pandas/tests/indexes/datetimes/__pycache__/test_datetime.cpython-36-PYTEST.pyc matches
pandas/tests/indexes/datetimes/test_datetime.py:    def test_get_duplicates(self):
pandas/tests/indexes/datetimes/test_datetime.py:        result = idx.get_duplicates()
Binary file pandas/tests/indexes/timedeltas/__pycache__/test_timedelta.cpython-36-PYTEST.pyc matches
pandas/tests/indexes/timedeltas/test_timedelta.py:    def test_get_duplicates(self):
pandas/tests/indexes/timedeltas/test_timedelta.py:        result = idx.get_duplicates()
pandas/tests/indexes/test_multi.py:            assert mi.get_duplicates() == []
pandas/tests/indexes/test_multi.py:                assert mi.get_duplicates() == []

idx.is_unique = False

def test_get_duplicates_deprecated(self):
idx = pd.Index([1, 2, 3])
with tm.assert_produces_warning(FutureWarning):
idx.get_duplicates()


class TestMixedIntIndex(Base):
# Mostly the tests from common.py for which the results differ
Expand Down