Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

Merged
merged 48 commits into from
Apr 23, 2018
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
9047d60
Check non-hashability on series construction and renaming
arminv Mar 30, 2018
df7650d
Removed changes from pandas/core/series.py
arminv Mar 30, 2018
dd64219
Check non-hashability on Index construction and renaming
arminv Mar 30, 2018
89e92ab
modified test_getitem_list example to disallow non-hashable names
arminv Mar 30, 2018
cd3e53a
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Mar 30, 2018
cd070e3
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 1, 2018
351691f
Changed ErrorType message for hashability requirement
arminv Apr 1, 2018
3a7b0b2
Fixed how rename calls set_names to allow for MultiIndex hashable typ…
arminv Apr 1, 2018
70933d5
Moved type checking from set_names back to rename
arminv Apr 1, 2018
56fd617
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 1, 2018
d4ed636
Moved hashable checking to set_names. Changed exception messages.
arminv Apr 1, 2018
b554bb3
Modified test_duplicate_level_names to pass with new (hashable names)…
arminv Apr 2, 2018
6efd6cc
Added test_constructor_nonhashable_names for checking hashability on …
arminv Apr 2, 2018
4fb3a6b
Fixed a typo
arminv Apr 2, 2018
786f43f
Minor refactoring of test_constructor_nonhashable_names
arminv Apr 2, 2018
01b712e
Added test_constructor_nonhashable_name for checking hashability on name
arminv Apr 2, 2018
6f13cd0
Added note in Other API Changes on hashability of names
arminv Apr 2, 2018
26433c3
Improved wording of the note
arminv Apr 2, 2018
91ef466
Addressed PEP 8 issues
arminv Apr 2, 2018
85e35ea
Modified exception message of Index
arminv Apr 2, 2018
5c2e240
Changed exception message format
arminv Apr 2, 2018
4ca2a52
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 2, 2018
840cd88
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 3, 2018
18bcf2a
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 6, 2018
d98014f
Refactoring
arminv Apr 8, 2018
b8a1d7e
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 9, 2018
edfbd1d
Added internal comment
arminv Apr 9, 2018
2322346
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 10, 2018
fa52655
Refactoring
arminv Apr 10, 2018
c0f6936
Moved check from set_names to _set_names
arminv Apr 10, 2018
a9c14e6
test with fixture
jreback Apr 11, 2018
30da596
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 12, 2018
667d495
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 16, 2018
c4c1011
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 16, 2018
bd75433
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 17, 2018
74a9b54
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 17, 2018
b1cb7fd
Refactoring. Internal docstring. Minor typos
arminv Apr 17, 2018
863f7d3
PEP 8
arminv Apr 17, 2018
0723009
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 17, 2018
7092d49
Improved docstring wording
arminv Apr 17, 2018
1d8f67a
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 19, 2018
12488ff
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 20, 2018
4a500ba
Shorten docstring
arminv Apr 21, 2018
9ec64b0
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 21, 2018
47903ae
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 22, 2018
04f2eed
Added examples
arminv Apr 22, 2018
1a68188
remove examples from _set_names
jreback Apr 22, 2018
97a2b06
consolidate logic a bit
jreback Apr 22, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
is_datetime64_any_dtype,
is_datetime64tz_dtype,
is_timedelta64_dtype,
is_hashable,
needs_i8_conversion,
is_iterator, is_list_like,
is_scalar)
Expand Down Expand Up @@ -251,6 +252,9 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
if name is None and hasattr(data, 'name'):
name = data.name

if name is not None and not is_hashable(name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually think you need this here, as _set_name gets called from _simple_new (when .name is set). so rather more logical to put this in _set_name (like you do for MI)

Copy link
Contributor Author

@arminv arminv Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MI, the name validation is delegated to _set_names in this way:

result = object.__new__(MultiIndex)
# we've already validated levels and labels, so shortcut here
result._set_levels(levels, copy=copy, validate=False)
result._set_labels(labels, copy=copy, validate=False)
if names is not None:
# handles name validation
result._set_names(names)

Maybe add something like this in __new__():

_result = object.__new__(Index)

if name is not None:
    # handles name validation
    _result._set_names([name])

So that pd.Index([1, 2, 3], name=['foo']) would still raise but we check in a more logical place.

raise TypeError('Index.name must be a hashable type')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self.__class__.__name__ rather than Index here


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this very likely also needs checking for MultiIndex (as that's a different path in some cases).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow non-hashable names for MultiIndex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no

Copy link
Contributor Author

@arminv arminv Apr 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a MultiIndex, it seems that names is converted into FrozenList after creation. I found this answer from you on StackOverflow about hashability of a FrozenList.

Right now, if names can’t be converted to a FrozenList (if not hashable), it throws an exception. For example:

In [1]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
   ...:                    labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
   ...:                    names=(['foo'], ['bar']))
   ...:           

TypeError: unhashable type: 'list'

while this passes:

In [2]: pd.MultiIndex(levels=[[1, 2], [u'one', u'two']],
    ...:                    labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
    ...:                    names=[('foo'), ('bar')])

Do we need to change anything here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no you just need to check that each name is hashable, not the frozen list itself. that's why .set_names is the best place for this

if fastpath:
return cls._simple_new(data, name)

Expand Down Expand Up @@ -1392,7 +1396,10 @@ def rename(self, name, inplace=False):
-------
new index (of same type and class...etc) [if inplace, returns None]
"""
return self.set_names([name], inplace=inplace)
if name is not None and not is_hashable(name):
raise TypeError('Index.name must be a hashable type')
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather do this in set_names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this in set_names and a lot of tests failed. Is there a particular reason we can’t keep it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are going to need it in set_names as that is the canonical way to do this. that's where it should validate. if we have tests that are clearly in error they should be changed.

return self.set_names([name], inplace=inplace)

@property
def _has_complex_internals(self):
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/frame/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,12 +125,12 @@ def test_getitem_list(self):
# tuples
df = DataFrame(randn(8, 3),
columns=Index([('foo', 'bar'), ('baz', 'qux'),
('peek', 'aboo')], name=['sth', 'sth2']))
('peek', 'aboo')], name=('sth', 'sth2')))

result = df[[('foo', 'bar'), ('baz', 'qux')]]
expected = df.iloc[:, :2]
assert_frame_equal(result, expected)
assert result.columns.names == ['sth', 'sth2']
assert result.columns.names == ('sth', 'sth2')

def test_getitem_callable(self):
# GH 12533
Expand Down