Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

Merged
merged 48 commits into from
Apr 23, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
9047d60
Check non-hashability on series construction and renaming
arminv Mar 30, 2018
df7650d
Removed changes from pandas/core/series.py
arminv Mar 30, 2018
dd64219
Check non-hashability on Index construction and renaming
arminv Mar 30, 2018
89e92ab
modified test_getitem_list example to disallow non-hashable names
arminv Mar 30, 2018
cd3e53a
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Mar 30, 2018
cd070e3
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 1, 2018
351691f
Changed ErrorType message for hashability requirement
arminv Apr 1, 2018
3a7b0b2
Fixed how rename calls set_names to allow for MultiIndex hashable typ…
arminv Apr 1, 2018
70933d5
Moved type checking from set_names back to rename
arminv Apr 1, 2018
56fd617
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 1, 2018
d4ed636
Moved hashable checking to set_names. Changed exception messages.
arminv Apr 1, 2018
b554bb3
Modified test_duplicate_level_names to pass with new (hashable names)…
arminv Apr 2, 2018
6efd6cc
Added test_constructor_nonhashable_names for checking hashability on …
arminv Apr 2, 2018
4fb3a6b
Fixed a typo
arminv Apr 2, 2018
786f43f
Minor refactoring of test_constructor_nonhashable_names
arminv Apr 2, 2018
01b712e
Added test_constructor_nonhashable_name for checking hashability on name
arminv Apr 2, 2018
6f13cd0
Added note in Other API Changes on hashability of names
arminv Apr 2, 2018
26433c3
Improved wording of the note
arminv Apr 2, 2018
91ef466
Addressed PEP 8 issues
arminv Apr 2, 2018
85e35ea
Modified exception message of Index
arminv Apr 2, 2018
5c2e240
Changed exception message format
arminv Apr 2, 2018
4ca2a52
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 2, 2018
840cd88
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 3, 2018
18bcf2a
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 6, 2018
d98014f
Refactoring
arminv Apr 8, 2018
b8a1d7e
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 9, 2018
edfbd1d
Added internal comment
arminv Apr 9, 2018
2322346
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 10, 2018
fa52655
Refactoring
arminv Apr 10, 2018
c0f6936
Moved check from set_names to _set_names
arminv Apr 10, 2018
a9c14e6
test with fixture
jreback Apr 11, 2018
30da596
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 12, 2018
667d495
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 16, 2018
c4c1011
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 16, 2018
bd75433
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 17, 2018
74a9b54
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 17, 2018
b1cb7fd
Refactoring. Internal docstring. Minor typos
arminv Apr 17, 2018
863f7d3
PEP 8
arminv Apr 17, 2018
0723009
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 17, 2018
7092d49
Improved docstring wording
arminv Apr 17, 2018
1d8f67a
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 19, 2018
12488ff
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 20, 2018
4a500ba
Shorten docstring
arminv Apr 21, 2018
9ec64b0
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 21, 2018
47903ae
Merge remote-tracking branch 'upstream/master' into non_hashable_err
arminv Apr 22, 2018
04f2eed
Added examples
arminv Apr 22, 2018
1a68188
remove examples from _set_names
jreback Apr 22, 2018
97a2b06
consolidate logic a bit
jreback Apr 22, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -827,6 +827,7 @@ Other API Changes
- A :class:`Series` of ``dtype=category`` constructed from an empty ``dict`` will now have categories of ``dtype=object`` rather than ``dtype=float64``, consistently with the case in which an empty list is passed (:issue:`18515`)
- All-NaN levels in a ``MultiIndex`` are now assigned ``float`` rather than ``object`` dtype, promoting consistency with ``Index`` (:issue:`17929`).
- Levels names of a ``MultiIndex`` (when not None) are now required to be unique: trying to create a ``MultiIndex`` with repeated names will raise a ``ValueError`` (:issue:`18872`)
- Both construction and renaming of ``Index``/``MultiIndex`` with non-hashable ``name``/``names`` will now raise ``TypeError`` (:issue:`20527`)
- :func:`Index.map` can now accept ``Series`` and dictionary input objects (:issue:`12756`, :issue:`18482`, :issue:`18509`).
- :func:`DataFrame.unstack` will now default to filling with ``np.nan`` for ``object`` columns. (:issue:`12815`)
- :class:`IntervalIndex` constructor will raise if the ``closed`` parameter conflicts with how the input data is inferred to be closed (:issue:`18421`)
Expand Down
30 changes: 28 additions & 2 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
is_datetime64_any_dtype,
is_datetime64tz_dtype,
is_timedelta64_dtype,
is_hashable,
needs_i8_conversion,
is_iterator, is_list_like,
is_scalar)
Expand Down Expand Up @@ -1311,9 +1312,33 @@ def _get_names(self):
return FrozenList((self.name, ))

def _set_names(self, values, level=None):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add a mention on set_names itself that the names must be hashable (and examples if you want)

Set new names on index. Each name has to be a hashable type.

Parameters
----------
values : str or sequence
name(s) to set
level : int, level name, or sequence of int/level names (default None)
If the index is a MultiIndex (hierarchical), level(s) to set (None
for all levels). Otherwise level must be None

Raises
------
TypeError if each name is not hashable.
"""
if not is_list_like(values):
raise ValueError('Names must be a list-like')
if len(values) != 1:
raise ValueError('Length of new names must be 1, got %d' %
len(values))

# GH 20527
# All items in 'name' need to be hashable:
for name in values:
if not is_hashable(name):
raise TypeError('{}.name must be a hashable type'
.format(self.__class__.__name__))
self.name = values[0]

names = property(fset=_set_names, fget=_get_names)
Expand All @@ -1339,9 +1364,9 @@ def set_names(self, names, level=None, inplace=False):
Examples
--------
>>> Index([1, 2, 3, 4]).set_names('foo')
Int64Index([1, 2, 3, 4], dtype='int64')
Int64Index([1, 2, 3, 4], dtype='int64', name='foo')
>>> Index([1, 2, 3, 4]).set_names(['foo'])
Int64Index([1, 2, 3, 4], dtype='int64')
Int64Index([1, 2, 3, 4], dtype='int64', name='foo')
>>> idx = MultiIndex.from_tuples([(1, u'one'), (1, u'two'),
(2, u'one'), (2, u'two')],
names=['foo', 'bar'])
Expand All @@ -1354,6 +1379,7 @@ def set_names(self, names, level=None, inplace=False):
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'baz', u'bar'])
"""

if level is not None and self.nlevels == 1:
raise ValueError('Level must be None for non-MultiIndex')

Expand Down
38 changes: 33 additions & 5 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
_ensure_platform_int,
is_categorical_dtype,
is_object_dtype,
is_hashable,
is_iterator,
is_list_like,
pandas_dtype,
Expand Down Expand Up @@ -634,12 +635,29 @@ def _get_names(self):

def _set_names(self, names, level=None, validate=True):
"""
Set new names on index. Each name has to be a hashable type.

Parameters
----------
values : str or sequence
name(s) to set
level : int, level name, or sequence of int/level names (default None)
If the index is a MultiIndex (hierarchical), level(s) to set (None
for all levels). Otherwise level must be None
validate : boolean, default True
validate that the names match level lengths

Raises
------
TypeError if each name is not hashable.

Notes
-----
sets names on levels. WARNING: mutates!

Note that you generally want to set this *after* changing levels, so
that it only acts on copies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good here. can you update the doc-string with the comment below

"""

# GH 15110
# Don't allow a single string for names in a MultiIndex
if names is not None and not is_list_like(names):
Expand All @@ -662,10 +680,20 @@ def _set_names(self, names, level=None, validate=True):

# set the name
for l, name in zip(level, names):
if name is not None and name in used:
raise ValueError('Duplicated level name: "{}", assigned to '
'level {}, is already used for level '
'{}.'.format(name, l, used[name]))
if name is not None:

# GH 20527
# All items in 'names' need to be hashable:
if not is_hashable(name):
raise TypeError('{}.name must be a hashable type'
.format(self.__class__.__name__))

if name in used:
raise ValueError(
'Duplicated level name: "{}", assigned to '
'level {}, is already used for level '
'{}.'.format(name, l, used[name]))

self.levels[l].rename(name, inplace=True)
used[name] = l

Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/frame/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,12 +125,12 @@ def test_getitem_list(self):
# tuples
df = DataFrame(randn(8, 3),
columns=Index([('foo', 'bar'), ('baz', 'qux'),
('peek', 'aboo')], name=['sth', 'sth2']))
('peek', 'aboo')], name=('sth', 'sth2')))

result = df[[('foo', 'bar'), ('baz', 'qux')]]
expected = df.iloc[:, :2]
assert_frame_equal(result, expected)
assert result.columns.names == ['sth', 'sth2']
assert result.columns.names == ('sth', 'sth2')

def test_getitem_callable(self):
# GH 12533
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/indexes/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -435,6 +435,24 @@ def test_constructor_empty(self):
assert isinstance(empty, MultiIndex)
assert not len(empty)

def test_constructor_nonhashable_name(self, indices):
# GH 20527

if isinstance(indices, MultiIndex):
pytest.skip("multiindex handled in test_multi.py")

name = ['0']
message = "Index.name must be a hashable type"
tm.assert_raises_regex(TypeError, message, name=name)

# With .rename()
renamed = [['1']]
tm.assert_raises_regex(TypeError, message,
indices.rename, name=renamed)
# With .set_names()
tm.assert_raises_regex(TypeError, message,
indices.set_names, names=renamed)

def test_view_with_args(self):

restricted = ['unicodeIndex', 'strIndex', 'catIndex', 'boolIndex',
Expand Down
23 changes: 21 additions & 2 deletions pandas/tests/indexes/test_multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -615,8 +615,27 @@ def test_constructor_mismatched_label_levels(self):
with tm.assert_raises_regex(ValueError, label_error):
self.index.copy().set_labels([[0, 0, 0, 0], [0, 0]])

@pytest.mark.parametrize('names', [['a', 'b', 'a'], [1, 1, 2],
[1, 'a', 1]])
def test_constructor_nonhashable_names(self):
# GH 20527
levels = [[1, 2], [u'one', u'two']]
labels = [[0, 0, 1, 1], [0, 1, 0, 1]]
names = ((['foo'], ['bar']))
message = "MultiIndex.name must be a hashable type"
tm.assert_raises_regex(TypeError, message,
MultiIndex, levels=levels,
labels=labels, names=names)

# With .rename()
mi = MultiIndex(levels=[[1, 2], [u'one', u'two']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=('foo', 'bar'))
renamed = [['foor'], ['barr']]
tm.assert_raises_regex(TypeError, message, mi.rename, names=renamed)
# With .set_names()
tm.assert_raises_regex(TypeError, message, mi.set_names, names=renamed)

@pytest.mark.parametrize('names', [['a', 'b', 'a'], ['1', '1', '2'],
['1', 'a', '1']])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arminv Is there a reason that you changed those parametrize values to all strings? (I suppose by accident?)
I am reworking the test in #21423, so will revert there if this was by accident

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche IIRC I changed it (in this commit) because the test was failing, but implementation changed a lot after that commit so I'm not sure if reverting this would cause a problem now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be passing there!

def test_duplicate_level_names(self, names):
# GH18872
pytest.raises(ValueError, pd.MultiIndex.from_product,
Expand Down