-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548
Conversation
this already works on Serie the issue is about Index |
Sorry I got confused. I will update it for Index. |
Codecov Report
@@ Coverage Diff @@
## master #20548 +/- ##
==========================================
- Coverage 91.84% 91.84% -0.01%
==========================================
Files 153 153
Lines 49305 49313 +8
==========================================
+ Hits 45286 45293 +7
- Misses 4019 4020 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a few tests. and a whatsnew new (other API changes).
pandas/core/indexes/base.py
Outdated
return self.set_names([name], inplace=inplace) | ||
if name is not None and not is_hashable(name): | ||
raise TypeError('Index.name must be a hashable type') | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather do this in set_names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this in set_names and a lot of tests failed. Is there a particular reason we can’t keep it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are going to need it in set_names as that is the canonical way to do this. that's where it should validate. if we have tests that are clearly in error they should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can you add a note in other api changes section
- can you add tests on construction & for rename (these should use our current infrastructure to exercise all subclasses)
pandas/core/indexes/base.py
Outdated
@@ -251,6 +252,9 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None, | |||
if name is None and hasattr(data, 'name'): | |||
name = data.name | |||
|
|||
if name is not None and not is_hashable(name): | |||
raise TypeError('Index.name must be a hashable type') | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this very likely also needs checking for MultiIndex (as that's a different path in some cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allow non-hashable names for MultiIndex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a MultiIndex, it seems that names
is converted into FrozenList after creation. I found this answer from you on StackOverflow about hashability of a FrozenList.
Right now, if names
can’t be converted to a FrozenList (if not hashable), it throws an exception. For example:
In [1]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
...: labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
...: names=(['foo'], ['bar']))
...:
TypeError: unhashable type: 'list'
while this passes:
In [2]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
...: labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
...: names=[('foo'), ('bar')])
Do we need to change anything here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no you just need to check that each name is hashable, not the frozen list itself. that's why .set_names is the best place for this
pandas/core/indexes/base.py
Outdated
@@ -251,6 +252,9 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None, | |||
if name is None and hasattr(data, 'name'): | |||
name = data.name | |||
|
|||
if name is not None and not is_hashable(name): | |||
raise TypeError('Index.name must be a hashable type') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use self.__class__.__name__
rather than Index
here
Hello @arminv! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on April 22, 2018 at 14:48 Hours UTC |
pandas/core/indexes/base.py
Outdated
@@ -473,7 +474,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs): | |||
|
|||
result = object.__new__(cls) | |||
result._data = values | |||
result.name = name | |||
result._set_names([name]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I wasn't sure if _set_names
was getting called from _simple_new
, so I made it explicit. Is this ok?
Also, we are not checking in __new__
anymore (as you suggested).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't need to do this, and just leave the original code
setting .name
name is a property that calls _set_names
pandas/core/indexes/base.py
Outdated
|
||
Notes | ||
----- | ||
Both `set_names` and `rename` call this function to set name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. some doc-comments. ping on green.
pandas/core/indexes/base.py
Outdated
@@ -473,7 +474,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs): | |||
|
|||
result = object.__new__(cls) | |||
result._data = values | |||
result.name = name | |||
result._set_names([name]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shouldn't need to do this, and just leave the original code
setting .name
name is a property that calls _set_names
pandas/core/indexes/base.py
Outdated
|
||
Examples | ||
-------- | ||
on an index with no names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the fulll doc string here (e.g. examples and such, leave Parameters and such), only on .set_names
can you update. ping on green. |
pandas/core/indexes/base.py
Outdated
If the index is a MultiIndex (hierarchical), level(s) to set (None | ||
for all levels). Otherwise level must be None | ||
|
||
Returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be Raises (and its a TypeError)
@@ -1311,6 +1312,28 @@ def _get_names(self): | |||
return FrozenList((self.name, )) | |||
|
|||
def _set_names(self, values, level=None): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a mention on set_names
itself that the names must be hashable (and examples if you want)
moved the logic slightly. will merge on green. |
Thanks @arminv ! |
tm.assert_raises_regex(TypeError, message, mi.set_names, names=renamed) | ||
|
||
@pytest.mark.parametrize('names', [['a', 'b', 'a'], ['1', '1', '2'], | ||
['1', 'a', '1']]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche IIRC I changed it (in this commit) because the test was failing, but implementation changed a lot after that commit so I'm not sure if reverting this would cause a problem now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be passing there!
Index & MultiIndex names need to be hashable. Both constructing and renaming without a hashable name raise TypeError exceptions now.
Examples:
Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):
git diff upstream/master -u -- "*.py" | flake8 --diff