Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: .get_slice_bound in MultiIndex needs documentation. #29967

Closed
proost opened this issue Dec 2, 2019 · 1 comment · Fixed by #30257
Closed

DOC: .get_slice_bound in MultiIndex needs documentation. #29967

proost opened this issue Dec 2, 2019 · 1 comment · Fixed by #30257

Comments

@proost
Copy link
Contributor

proost commented Dec 2, 2019

In [10]: i = Index(['c','a','d','b'])                                           
In [12]: i.get_slice_bound('a',side="left",kind="ix")                           
Out[12]: 1
In [13]: mi = MultiIndex.from_arrays([['c','a','d','b']])                       
In [15]: mi.get_slice_bound('a',side="left",kind="ix")                          
---------------------------------------------------------------------------
UnsortedIndexError                        Traceback (most recent call last)
<ipython-input-15-fbc28595630f> in <module>
----> 1 mi.get_slice_bound('a',side="left",kind="ix")
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/multi.py in get_slice_bound(self, label, side, kind)
   2523         if not isinstance(label, tuple):
   2524             label = (label,)
-> 2525         return self._partial_tup_index(label, side=side)
   2526 
   2527     def slice_locs(self, start=None, end=None, step=None, kind=None):
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/multi.py in _partial_tup_index(self, tup, side)
   2585             raise UnsortedIndexError(
   2586                 "Key length (%d) was greater than MultiIndex"
-> 2587                 " lexsort depth (%d)" % (len(tup), self.lexsort_depth)
   2588             )
   2589 
UnsortedIndexError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'

But, If MultiIndex is sorted,

In [17]: mi = MultiIndex.from_arrays([['a','b','c','d']])                       
In [18]: mi.get_slice_bound('a',side="left",kind="ix")                          
Out[18]: 0

If only sorted 'MultiIndex' can work is not bug but intended it. then I think need documentation for 'MultiIndex'. Because just documentation on 'Index.get_slice_bound' is not enough to recognize '.get_slice_bound' only works with sorted 'MultiIndex'

INSTALLED VERSIONS

commit : None
python : 3.7.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.0.0-36-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : ko_KR.UTF-8
LOCALE : ko_KR.UTF-8

pandas : 0.25.1
numpy : 1.16.1
pytz : 2018.7
dateutil : 2.7.5
pip : 19.3.1
setuptools : 40.6.3
Cython : 0.29.2
pytest : 5.1.0
hypothesis : None
sphinx : 1.8.2
blosc : None
feather : None
xlsxwriter : 1.1.2
lxml.etree : 4.2.5
html5lib : 1.0.1
pymysql : 0.9.3
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.2.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.2.5
matplotlib : 3.0.2
numexpr : 2.6.8
odfpy : None
openpyxl : 2.5.12
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.1.0
sqlalchemy : 1.2.15
tables : 3.4.4
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.2

@kyle-ward-data
Copy link

@proost I agree there should probably be some documentation on MultiIndex.get_slice_bound(). I did some research on the matter in addition to your issue and found the following.

It appears that the lexicographical (alphabetical) order does play a significance role in the output and was possibly intentional, but I'm not entirely sure. Using the method is_lexsorted() and attribute lexsort_depth show that their existence matters.

Not sorted:

mi_not_lexi = pd.MultiIndex.from_arrays([list('zzyy'),list('abab')])
print(mi_not_lexi.is_lexsorted()) # returns False
print(mi_not_lexi.lexsort_depth) # returns 0

Sorted:

mi_lexi = mi_not_lexi.copy().sort_values()
print(mi_lexi.is_lexsorted()) # returns True
print(mi_lexi.lexsort_depth) # returns 2

This may be because when performing get_slice_bound() on a normal Index object, it can't accept duplicate fields when not next to each other, regardless of being alphabetically sorted or not. So, when applying this to a multi index, that presumably has duplicate fields in lower level indices, they force the index to be sorted to avoid the issue presented below.

Returns 2 as expected:

mi_not_lexi.get_level_values(0) \
            .get_slice_bound('y',side='left',kind='ix')

Returns KeyError:

mi_not_lexi.get_level_values(1) \
            .get_slice_bound('b',side='left',kind='ix')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-655-cfe14bbbfb62> in <module>
      1 mi_not_lexi.get_level_values(1) \
----> 2             .get_slice_bound('b',side='left',kind='ix')

c:\users\kyle_ward\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexes\base.py in get_slice_bound(self, label, side, kind)
   5173                 raise KeyError(
   5174                     "Cannot get %s slice bound for non-unique "
-> 5175                     "label: %r" % (side, original_label)
   5176                 )
   5177 

KeyError: "Cannot get left slice bound for non-unique label: 'b'"

That said, using get_locs(), which is specific to MultiIndex appears to be another way of achieving this and is documented.

mi_not_lexi.get_locs([slice(None), ['b']])
mi_not_lexi.get_locs('z')

@jreback jreback added this to the 1.0 milestone Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants