change in MultiIndex.has_duplicates behavior from 0.15.0 -> 0.15.2 #9075

lebedov · 2014-12-14T19:51:04Z

The following code prints False when run with Pandas 0.15.0, but True with 0.15.2; I'm using Python 2.7.6 on Ubuntu 14.04.1 in both cases. Is this change in behavior expected? (I was using has_duplicates to detect duplicate tuples in the index.)

import pandas as pd
t = [(u'x', u'out', u'z', 5, u'y', u'in', u'z', 169),
     (u'x', u'out', u'z', 7, u'y', u'in', u'z', 119),
     (u'x', u'out', u'z', 9, u'y', u'in', u'z', 135),
     (u'x', u'out', u'z', 13, u'y', u'in', u'z', 145),
     (u'x', u'out', u'z', 14, u'y', u'in', u'z', 158),
     (u'x', u'out', u'z', 16, u'y', u'in', u'z', 122),
     (u'x', u'out', u'z', 17, u'y', u'in', u'z', 160),
     (u'x', u'out', u'z', 18, u'y', u'in', u'z', 180),
     (u'x', u'out', u'z', 20, u'y', u'in', u'z', 143),
     (u'x', u'out', u'z', 21, u'y', u'in', u'z', 128),
     (u'x', u'out', u'z', 22, u'y', u'in', u'z', 129),
     (u'x', u'out', u'z', 25, u'y', u'in', u'z', 111),
     (u'x', u'out', u'z', 28, u'y', u'in', u'z', 114),
     (u'x', u'out', u'z', 29, u'y', u'in', u'z', 121),
     (u'x', u'out', u'z', 31, u'y', u'in', u'z', 126),
     (u'x', u'out', u'z', 32, u'y', u'in', u'z', 155),
     (u'x', u'out', u'z', 33, u'y', u'in', u'z', 123),
     (u'x', u'out', u'z', 12, u'y', u'in', u'z', 144)]
idx = pd.MultiIndex.from_tuples(t)
print idx.has_duplicates

Reference on StackOverflow here.

The text was updated successfully, but these errors were encountered:

jreback · 2014-12-15T00:08:54Z

a bug caused by the indexer overflowing the smaller dtypes. fixed in #9077. This only had one test.

That said, the drop_duplicates/duplicated methods exist on Index objects (also as of 0.15.0.)
Another way to do this is idx.duplicated().any() which in this case is faster, though it depends on the length of the index versus the level depth.

lebedov mentioned this issue Dec 14, 2014

Possible regression in Pandas 0.15.2 affects validation of connections in Pattern class neurokernel/neurokernel#28

Closed

jreback added Bug MultiIndex labels Dec 14, 2014

jreback added this to the 0.16.0 milestone Dec 14, 2014

jreback mentioned this issue Dec 14, 2014

BUG: Bug in MultiIndex.has_duplicates when having many levels causes an indexer overflow (GH9075) #9077

Closed

behzadnouri mentioned this issue Dec 18, 2014

overflow bug in multi-index when checking for duplicates #9101

Merged

jreback closed this as completed in #9101 Dec 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change in MultiIndex.has_duplicates behavior from 0.15.0 -> 0.15.2 #9075

change in MultiIndex.has_duplicates behavior from 0.15.0 -> 0.15.2 #9075

lebedov commented Dec 14, 2014

jreback commented Dec 15, 2014

change in MultiIndex.has_duplicates behavior from 0.15.0 -> 0.15.2 #9075

change in MultiIndex.has_duplicates behavior from 0.15.0 -> 0.15.2 #9075

Comments

lebedov commented Dec 14, 2014

jreback commented Dec 15, 2014