Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

Merged
merged 48 commits into from
Apr 23, 2018

Conversation

arminv
Copy link
Contributor

@arminv arminv commented Mar 30, 2018

Index & MultiIndex names need to be hashable. Both constructing and renaming without a hashable name raise TypeError exceptions now.

Examples:

  • Index:
In [2]: pd.Index([1, 2, 3], name=['foo'])
>>> Int64Index([1, 2, 3], dtype='int64', name=['foo'])
In [3]: pd.Index([1, 2, 3], name='foo').rename(['bar'])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-f3327eccf0fc> in <module>()
----> 1 pd.Index([1, 2, 3], name='foo').rename(['bar'])

~/Documents/GitHub/pandas/pandas/core/indexes/base.py in rename(self, name, inplace)
   1406         new index (of same type and class...etc) [if inplace, returns None]
   1407         """
-> 1408         return self.set_names([name], inplace=inplace)
   1409 
   1410     @property

~/Documents/GitHub/pandas/pandas/core/indexes/base.py in set_names(self, names, level, inplace)
   1387         else:
   1388             idx = self._shallow_copy()
-> 1389         idx._set_names(names, level=level)
   1390         if not inplace:
   1391             return idx

~/Documents/GitHub/pandas/pandas/core/indexes/base.py in _set_names(self, values, level)
   1323                 if not is_hashable(name):
   1324                     raise TypeError('{}.name must be a hashable type'
-> 1325                                     .format(self.__class__.__name__))
   1326         if len(values) != 1:
   1327             raise ValueError('Length of new names must be 1, got %d' %

TypeError: Int64Index.name must be a hashable type
  • MultiIndex:
In [4]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
                        labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
                        names=((['foo'], ['bar'])))
                        
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-d211526eaa3d> in <module>()
      1 pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
      2                     labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
----> 3                     names=((['foo'], ['bar'])))
      4 

~/Documents/GitHub/pandas/pandas/core/indexes/multi.py in __new__(cls, levels, labels, sortorder, names, dtype, copy, name, verify_integrity, _set_identity)
    230         if names is not None:
    231             # handles name validation
--> 232             result._set_names(names)
    233 
    234         if sortorder is not None:

~/Documents/GitHub/pandas/pandas/core/indexes/multi.py in _set_names(self, names, level, validate)
    646             if not is_hashable(name):
    647                 raise TypeError('{}.name must be a hashable type'
--> 648                                 .format(self.__class__.__name__))
    649 
    650         # GH 15110

TypeError: MultiIndex.name must be a hashable type
In [10]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
                       labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
                       names=('foo', 'bar')).rename(([1], [2]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-ff74dfc48455> in <module>()
      1 pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
      2                      labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
----> 3                      names=('foo', 'bar')).rename(([1], [2]))
      4 

~/Documents/GitHub/pandas/pandas/core/indexes/base.py in set_names(self, names, level, inplace)
   1387         else:
   1388             idx = self._shallow_copy()
-> 1389         idx._set_names(names, level=level)
   1390         if not inplace:
   1391             return idx

~/Documents/GitHub/pandas/pandas/core/indexes/multi.py in _set_names(self, names, level, validate)
    646             if not is_hashable(name):
    647                 raise TypeError('{}.name must be a hashable type'
--> 648                                 .format(self.__class__.__name__))
    649 
    650         # GH 15110

TypeError: MultiIndex.name must be a hashable type

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

  • closes #20527
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jreback
Copy link
Contributor

jreback commented Mar 30, 2018

this already works on Serie

the issue is about Index

@arminv
Copy link
Contributor Author

arminv commented Mar 30, 2018

Sorry I got confused. I will update it for Index.

@arminv arminv changed the title ERR: disallow non-hashables in Series construction & rename ERR: disallow non-hashables in Index construction & rename Mar 30, 2018
@codecov
Copy link

codecov bot commented Mar 30, 2018

Codecov Report

Merging #20548 into master will decrease coverage by <.01%.
The diff coverage is 90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20548      +/-   ##
==========================================
- Coverage   91.84%   91.84%   -0.01%     
==========================================
  Files         153      153              
  Lines       49305    49313       +8     
==========================================
+ Hits        45286    45293       +7     
- Misses       4019     4020       +1
Flag Coverage Δ
#multiple 90.23% <90%> (-0.01%) ⬇️
#single 41.89% <60%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/multi.py 95.07% <100%> (+0.01%) ⬆️
pandas/core/indexes/base.py 96.63% <80%> (-0.05%) ⬇️
pandas/core/generic.py 95.94% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8def649...97a2b06. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a few tests. and a whatsnew new (other API changes).

return self.set_names([name], inplace=inplace)
if name is not None and not is_hashable(name):
raise TypeError('Index.name must be a hashable type')
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather do this in set_names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this in set_names and a lot of tests failed. Is there a particular reason we can’t keep it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are going to need it in set_names as that is the canonical way to do this. that's where it should validate. if we have tests that are clearly in error they should be changed.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • can you add a note in other api changes section
  • can you add tests on construction & for rename (these should use our current infrastructure to exercise all subclasses)

@@ -251,6 +252,9 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
if name is None and hasattr(data, 'name'):
name = data.name

if name is not None and not is_hashable(name):
raise TypeError('Index.name must be a hashable type')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this very likely also needs checking for MultiIndex (as that's a different path in some cases).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow non-hashable names for MultiIndex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Copy link
Contributor Author

@arminv arminv Apr 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a MultiIndex, it seems that names is converted into FrozenList after creation. I found this answer from you on StackOverflow about hashability of a FrozenList.

Right now, if names can’t be converted to a FrozenList (if not hashable), it throws an exception. For example:

In [1]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
   ...:                    labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
   ...:                    names=(['foo'], ['bar']))
   ...:           

TypeError: unhashable type: 'list'

while this passes:

In [2]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
    ...:                    labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
    ...:                    names=[('foo'), ('bar')])

Do we need to change anything here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no you just need to check that each name is hashable, not the frozen list itself. that's why .set_names is the best place for this

@@ -251,6 +252,9 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
if name is None and hasattr(data, 'name'):
name = data.name

if name is not None and not is_hashable(name):
raise TypeError('Index.name must be a hashable type')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self.__class__.__name__ rather than Index here

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Error Reporting Incorrect or improved errors from pandas labels Mar 30, 2018
@pep8speaks
Copy link

pep8speaks commented Apr 2, 2018

Hello @arminv! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on April 22, 2018 at 14:48 Hours UTC

@@ -473,7 +474,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs):

result = object.__new__(cls)
result._data = values
result.name = name
result._set_names([name])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I wasn't sure if _set_names was getting called from _simple_new, so I made it explicit. Is this ok?

Also, we are not checking in __new__ anymore (as you suggested).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need to do this, and just leave the original code

setting .name name is a property that calls _set_names


Notes
-----
Both `set_names` and `rename` call this function to set name.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is needed?

@arminv arminv closed this Apr 17, 2018
@arminv arminv reopened this Apr 17, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some doc-comments. ping on green.

@@ -473,7 +474,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs):

result = object.__new__(cls)
result._data = values
result.name = name
result._set_names([name])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't need to do this, and just leave the original code

setting .name name is a property that calls _set_names


Examples
--------
on an index with no names:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the fulll doc string here (e.g. examples and such, leave Parameters and such), only on .set_names

@jreback jreback added this to the 0.23.0 milestone Apr 19, 2018
@jreback
Copy link
Contributor

jreback commented Apr 21, 2018

can you update. ping on green.

If the index is a MultiIndex (hierarchical), level(s) to set (None
for all levels). Otherwise level must be None

Returns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be Raises (and its a TypeError)

@@ -1311,6 +1312,28 @@ def _get_names(self):
return FrozenList((self.name, ))

def _set_names(self, values, level=None):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add a mention on set_names itself that the names must be hashable (and examples if you want)

@jreback
Copy link
Contributor

jreback commented Apr 22, 2018

moved the logic slightly. will merge on green.

@TomAugspurger TomAugspurger merged commit add3fbf into pandas-dev:master Apr 23, 2018
@TomAugspurger
Copy link
Contributor

Thanks @arminv !

@arminv arminv deleted the non_hashable_err branch April 23, 2018 19:35
tm.assert_raises_regex(TypeError, message, mi.set_names, names=renamed)

@pytest.mark.parametrize('names', [['a', 'b', 'a'], ['1', '1', '2'],
['1', 'a', '1']])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arminv Is there a reason that you changed those parametrize values to all strings? (I suppose by accident?)
I am reworking the test in #21423, so will revert there if this was by accident

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche IIRC I changed it (in this commit) because the test was failing, but implementation changed a lot after that commit so I'm not sure if reverting this would cause a problem now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be passing there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants