Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change in MultiIndex.has_duplicates behavior from 0.15.0 -> 0.15.2 #9075

Closed
lebedov opened this issue Dec 14, 2014 · 1 comment · Fixed by #9101
Closed

change in MultiIndex.has_duplicates behavior from 0.15.0 -> 0.15.2 #9075

lebedov opened this issue Dec 14, 2014 · 1 comment · Fixed by #9101

Comments

@lebedov
Copy link

lebedov commented Dec 14, 2014

The following code prints False when run with Pandas 0.15.0, but True with 0.15.2; I'm using Python 2.7.6 on Ubuntu 14.04.1 in both cases. Is this change in behavior expected? (I was using has_duplicates to detect duplicate tuples in the index.)

import pandas as pd
t = [(u'x', u'out', u'z', 5, u'y', u'in', u'z', 169),
     (u'x', u'out', u'z', 7, u'y', u'in', u'z', 119),
     (u'x', u'out', u'z', 9, u'y', u'in', u'z', 135),
     (u'x', u'out', u'z', 13, u'y', u'in', u'z', 145),
     (u'x', u'out', u'z', 14, u'y', u'in', u'z', 158),
     (u'x', u'out', u'z', 16, u'y', u'in', u'z', 122),
     (u'x', u'out', u'z', 17, u'y', u'in', u'z', 160),
     (u'x', u'out', u'z', 18, u'y', u'in', u'z', 180),
     (u'x', u'out', u'z', 20, u'y', u'in', u'z', 143),
     (u'x', u'out', u'z', 21, u'y', u'in', u'z', 128),
     (u'x', u'out', u'z', 22, u'y', u'in', u'z', 129),
     (u'x', u'out', u'z', 25, u'y', u'in', u'z', 111),
     (u'x', u'out', u'z', 28, u'y', u'in', u'z', 114),
     (u'x', u'out', u'z', 29, u'y', u'in', u'z', 121),
     (u'x', u'out', u'z', 31, u'y', u'in', u'z', 126),
     (u'x', u'out', u'z', 32, u'y', u'in', u'z', 155),
     (u'x', u'out', u'z', 33, u'y', u'in', u'z', 123),
     (u'x', u'out', u'z', 12, u'y', u'in', u'z', 144)]
idx = pd.MultiIndex.from_tuples(t)
print idx.has_duplicates

Reference on StackOverflow here.

@jreback
Copy link
Contributor

jreback commented Dec 15, 2014

a bug caused by the indexer overflowing the smaller dtypes. fixed in #9077. This only had one test.

That said, the drop_duplicates/duplicated methods exist on Index objects (also as of 0.15.0.)
Another way to do this is idx.duplicated().any() which in this case is faster, though it depends on the length of the index versus the level depth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment