Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Creating Index with the names argument #19168

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 25 additions & 7 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,14 +113,18 @@ class Index(IndexOpsMixin, PandasObject):
dtype : NumPy dtype (default: object)
copy : bool
Make a copy of input ndarray
name : object
name : object, optional
Name to be stored in the index
names : sequence of objects, optional
Names for the index levels
tupleize_cols : bool (default: True)
When True, attempt to create a MultiIndex if possible

Notes
-----
An Index instance can **only** contain hashable objects
An Index instance can **only** contain hashable objects.

Only one of `name` and `names` can be specified at the same time.

Examples
--------
Expand Down Expand Up @@ -176,10 +180,25 @@ class Index(IndexOpsMixin, PandasObject):
str = CachedAccessor("str", StringMethods)

def __new__(cls, data=None, dtype=None, copy=False, name=None,
fastpath=False, tupleize_cols=True, **kwargs):
fastpath=False, tupleize_cols=True, names=None,
**kwargs):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment that checking name / names

Copy link
Contributor Author

@spacesphere spacesphere Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback, Sorry, I didn't get it: a comment for what? Could you, maybe, rephrase your request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't answer my comments. pls add some comments in the code for this section.

# The main purpose of `names` is to use it with a `MultiIndex`.
# Although for consistency it's also used to retrieve `name` for a
# one-level indices if `name` is not provided (see GH 19082).

if names is not None and name is not None:
raise TypeError("Can provide only one of names and name arguments")

if names is not None and not is_list_like(names):
raise TypeError("names must be list-like")

if name is None and hasattr(data, 'name'):
name = data.name
if name is None:
if hasattr(data, 'name'):
name = data.name
# extract `name` from `names` in case MultiIndex cannot be created
elif names:
name = names[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an

elif names is not None:
     if name is not None:
           raise ValuError(....)
    name = names


if fastpath:
return cls._simple_new(data, name)
Expand Down Expand Up @@ -358,8 +377,7 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
# 10697
if all(isinstance(e, tuple) for e in data):
from .multi import MultiIndex
return MultiIndex.from_tuples(
data, names=name or kwargs.get('names'))
return MultiIndex.from_tuples(data, names=names or name)
# other iterable of some kind
subarr = _asarray_tuplesafe(data, dtype=object)
return Index(subarr, dtype=dtype, copy=copy, name=name, **kwargs)
Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/indexes/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,30 @@ def test_constructor_simple_new(self):
result = idx._simple_new(idx, 'obj')
tm.assert_index_equal(result, idx)

def test_constructor_names(self):
# test both `name` and `names` provided
with pytest.raises(TypeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you have these first two assertions in a separate test that uses pandas.util.testing.all_index_generator, and passes the name to exhaustively check all the index types (do this in a parameterization)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, correct me if I'm wrong, but don't you think it's a bit redundant? I mean I can't use name or names for all_index_generator to check constructors of different index types. All I can do is to create a new Index with each of the generated indexes passing names and/or name. In this case it doesn't really matter what type of index is going to be created as all the checks are made at the very beginning of Index constructor (before objects creation) and they should guarantee that name and names are valid for any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this redundant? we need to generally test passing name & names to each of the individual constructors. I suspect this actually does break some of the subclasses. its simple to modify all_index_generator to pass thru kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the Index subclasses (apart from MultiIndex) sets names for their objects. They don't even consider this argument. It shouldn't break anything. The only purpose for names is to create a MultiIndex instance (either via Index or MultiIndex constuctor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz , could you, please, add any remarks about the tests too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My remarks here are not worth much as I'm not a core dev (I can only share your frustration :-) )

Theoretically, it could be that some day somebody modified some Index subclass so that it misbehaves on names. Your tests would capture that regression. Yeah, I know, that's paranoia - but also the quickest way to get your PR accepted.

This said: if we end up deprecating names (as I think we should, but I don't think there ever was any discussion, so not sure how it would end up), then all of this is makes even less sense, so you might try to open that discussion now.

Copy link
Contributor Author

@spacesphere spacesphere Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just paranoia, I think, it's something that can just mess things up. We're talking about error cases, but Index subclasses don't consider names at all, so they won't raise any errors when passing wrong names arguments as it's expected for Index and MultiIndex. So, to make these tests work correctly, I'll have to modify all the subclasses too (I don't know how to handle this case otherwise).
The more I think about it, the more I dislike the way things are going :\

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think it's time for #19295 ;-)

Copy link
Contributor

@jreback jreback Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PoppyBagel here's the big pictures. We want to consider the consistency guarantees that pandas offers. We have a loose guaranteee that using the Index constructor or its sub-classes have the same signature, IM is an exception to this. But for example name is accepted.

We want to lockdown this guaranteee via tests in one place. It is somewhat tested in each subclass, but not centrally. Further this locks down the behavior of future subclasses, and prevent inadvertant adding it back. More to the point, I am considering removing names entirely (via deprecation). We need to check for error messages for this.

This is a little bit overkill, to check that we are not accepting an argument, but we have lots of code and inspection can easily fail.

I don't want you to add the names args anywhere, If its only accepted in MI and Index, then we can next consider the removal implications.

idx = Index([1, 2, 3], name='a', names=('a',))

# test non-list-like `names`
with pytest.raises(TypeError):
idx = Index([1, 2, 3], names='a')

# test using `name` for a flat `Index`
idx = Index([1, 2, 3], name='a')
assert idx.name == 'a'
assert idx.names == ('a',)

# test using `names` for a flat `Index`
idx = Index([1, 2, 3], names=('a',))
assert idx.name == 'a'
assert idx.names == ('a',)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally replace the two blocks with a loop (over kwargs, or over idx itself), but it's a matter of taste. You could also include the case name=('a',), which should still work.

# test using `names` for `MultiIndex` creation
idx = Index([('A', 1), ('A', 2)], names=('a', 'b'))
midx = MultiIndex.from_tuples([('A', 1), ('A', 2)], names=('a', 'b'))
tm.assert_index_equal(idx, midx, check_names=True)

def test_constructor_dtypes(self):

for idx in [Index(np.array([1, 2, 3], dtype=int)),
Expand Down