Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecated Index.get_duplicates() #20544

Merged
merged 8 commits into from
Apr 24, 2018
Merged

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Mar 30, 2018

@codecov
Copy link

codecov bot commented Mar 30, 2018

Codecov Report

Merging #20544 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20544      +/-   ##
==========================================
- Coverage   91.85%   91.84%   -0.01%     
==========================================
  Files         153      153              
  Lines       49318    49311       -7     
==========================================
- Hits        45299    45292       -7     
  Misses       4019     4019
Flag Coverage Δ
#multiple 90.24% <100%> (-0.01%) ⬇️
#single 41.89% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/datetimelike.py 96.7% <ø> (-0.02%) ⬇️
pandas/core/frame.py 97.17% <100%> (ø) ⬆️
pandas/core/indexes/base.py 96.63% <100%> (-0.01%) ⬇️
pandas/core/reshape/concat.py 97.58% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update add3fbf...94376e4. Read the comment docs.

@@ -2061,6 +2061,11 @@ def test_cached_properties_not_settable(self):
with tm.assert_raises_regex(AttributeError, "Can't set attribute"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to remove the usage from all tests or catch the warnings

(pandas) bash-3.2$ grep -r get_duplicates pandas
pandas/core/reshape/concat.py:                overlap = concat_index.get_duplicates()
Binary file pandas/core/reshape/__pycache__/concat.cpython-36.pyc matches
Binary file pandas/core/__pycache__/frame.cpython-36.pyc matches
pandas/core/frame.py:            duplicates = index.get_duplicates()
Binary file pandas/core/indexes/__pycache__/datetimelike.cpython-36.pyc matches
Binary file pandas/core/indexes/__pycache__/base.cpython-36.pyc matches
pandas/core/indexes/datetimelike.py:    def get_duplicates(self):
pandas/core/indexes/datetimelike.py:        values = Index.get_duplicates(self)
pandas/core/indexes/base.py:    def get_duplicates(self):
pandas/core/indexes/base.py:        >>> pd.Index([1, 2, 2, 3, 3, 3, 4]).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index([1., 2., 2., 3., 3., 3., 4.]).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index(['a', 'b', 'b', 'c', 'c', 'c', 'd']).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index(dates).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index([1, 2, 3, 2, 3, 4, 3]).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index([1, 2, 3, 4]).get_duplicates()
pandas/core/indexes/base.py:        >>> pd.Index(dates).get_duplicates()
pandas/core/indexes/base.py:    _get_duplicates = get_duplicates
Binary file pandas/tests/indexes/__pycache__/test_multi.cpython-36-PYTEST.pyc matches
Binary file pandas/tests/indexes/datetimes/__pycache__/test_datetime.cpython-36-PYTEST.pyc matches
pandas/tests/indexes/datetimes/test_datetime.py:    def test_get_duplicates(self):
pandas/tests/indexes/datetimes/test_datetime.py:        result = idx.get_duplicates()
Binary file pandas/tests/indexes/timedeltas/__pycache__/test_timedelta.cpython-36-PYTEST.pyc matches
pandas/tests/indexes/timedeltas/test_timedelta.py:    def test_get_duplicates(self):
pandas/tests/indexes/timedeltas/test_timedelta.py:        result = idx.get_duplicates()
pandas/tests/indexes/test_multi.py:            assert mi.get_duplicates() == []
pandas/tests/indexes/test_multi.py:                assert mi.get_duplicates() == []


return self._get_duplicates()

def _get_duplicates(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would rather change the impl itself here as well to the suggested one

@jreback jreback added the Deprecate Functionality to remove in pandas label Mar 30, 2018
@WillAyd
Copy link
Member Author

WillAyd commented Mar 31, 2018

Hmm I made that change but it looks like this actually doesn't work for MultiIndexes

mi = pd.MultiIndex.from_arrays([['a', 'b'], ['c', 'd']])
mi[mi.duplicated()].unique()
*** TypeError: Cannot infer number of levels from empty list

Not being as familiar on the MI side of things, do you think it's worth creating that as a separate issue and merging before this (I think it's a problem with the unique method there) or do you have another suggestion on how to handle?

@jreback
Copy link
Contributor

jreback commented Mar 31, 2018

this is a bug in .unique() on an empty MI

In [13]: pd.MultiIndex.from_arrays([[], []])
Out[13]: 
MultiIndex(levels=[[], []],
           labels=[[], []])

In [14]: pd.MultiIndex.from_arrays([[], []]).unique()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-d5be3b23b8c3> in <module>()
----> 1 pd.MultiIndex.from_arrays([[], []]).unique()

~/pandas/pandas/core/indexes/multi.py in unique(self, level)
   1071 
   1072         if level is None:
-> 1073             return super(MultiIndex, self).unique()
   1074         else:
   1075             level = self._get_level_number(level)

~/pandas/pandas/core/indexes/base.py in unique(self, level)
   4366             self._validate_index_level(level)
   4367         result = super(Index, self).unique()
-> 4368         return self._shallow_copy(result)
   4369 
   4370     def drop_duplicates(self, keep='first'):

~/pandas/pandas/core/indexes/multi.py in _shallow_copy(self, values, **kwargs)
    558             # discards freq
    559             kwargs.pop('freq', None)
--> 560             return MultiIndex.from_tuples(values, **kwargs)
    561         return self.view()
    562 

~/pandas/pandas/core/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
   1315             if names is None:
   1316                 msg = 'Cannot infer number of levels from empty list'
-> 1317                 raise TypeError(msg)
   1318             arrays = [[]] * len(names)
   1319         elif isinstance(tuples, (np.ndarray, Index)):

TypeError: Cannot infer number of levels from empty list

let's fix that one first, i'll create an issue.

@toobaz
Copy link
Member

toobaz commented Apr 1, 2018

Hmm I made that change but it looks like this actually doesn't work for MultiIndexes

@WillAyd : fixed, can you rebase?


with warnings.catch_warnings(record=True):
# Deprecated - see GH20239
assert mi.get_duplicates().equals(MultiIndex.from_arrays(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to tweak the assertion here given the return value is no longer a list, though I assume you are aware of that from the original issue. With that said, this is a different behavior for non-datetimelikes (which were returning a like-Index object) - is it worth documenting that in the whatsnew or is this getting into too technical of a distinction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine

@jreback jreback added this to the 0.23.0 milestone Apr 21, 2018
@jreback
Copy link
Contributor

jreback commented Apr 21, 2018

can you rebase

@jreback
Copy link
Contributor

jreback commented Apr 21, 2018

can you just check that we don't have any warnings showing from this

@WillAyd
Copy link
Member Author

WillAyd commented Apr 23, 2018

Checked the output of test_fast.sh and there were no warnings so I think this is OK - lmk if I should be checking somewhere else

@jreback jreback merged commit ed511e9 into pandas-dev:master Apr 24, 2018
@jreback
Copy link
Contributor

jreback commented Apr 24, 2018

thanks @WillAyd nice patch!

@WillAyd WillAyd deleted the depr-get-dup branch April 24, 2018 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: Index.get_duplicates
3 participants