Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add allow_sets-kwarg to is_list_like #23065

Merged
merged 21 commits into from
Oct 18, 2018
Merged
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pandas/compat/__init__.py
Original file line number Diff line number Diff line change
@@ -141,6 +141,7 @@ def lfilter(*args, **kwargs):
Mapping = collections.abc.Mapping
Sequence = collections.abc.Sequence
Sized = collections.abc.Sized
Set = collections.abc.Set

else:
# Python 2
@@ -201,6 +202,7 @@ def get_range_parameters(data):
Mapping = collections.Mapping
Sequence = collections.Sequence
Sized = collections.Sized
Set = collections.Set

if PY2:
def iteritems(obj, **kw):
8 changes: 4 additions & 4 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
@@ -17,10 +17,10 @@
ABCSparseArray, ABCSparseSeries, ABCCategoricalIndex, ABCIndexClass,
ABCDateOffset)
from pandas.core.dtypes.inference import ( # noqa:F401
is_bool, is_integer, is_hashable, is_iterator, is_float,
is_dict_like, is_scalar, is_string_like, is_list_like, is_number,
is_file_like, is_re, is_re_compilable, is_sequence, is_nested_list_like,
is_named_tuple, is_array_like, is_decimal, is_complex, is_interval)
is_bool, is_integer, is_float, is_number, is_decimal, is_complex,
is_re, is_re_compilable, is_dict_like, is_string_like, is_file_like,
is_list_like, is_nested_list_like, is_sequence, is_named_tuple,
is_hashable, is_iterator, is_array_like, is_scalar, is_interval)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat of an artefact of the version with is_ordered_list_like, where I tried to group these methods by similarity (i.e. scalar dtypes, regexes, containers), but I decided to keep it because I think it helps. Can revert that part of course

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, on any change, pls try to do the minimal changeset. This will lessen reviewer burden and make things go faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"yes, please try to do minimal changeset [next time]" or "yes please revert"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine as is for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now, but generally pls don't change unrelated things.



_POSSIBLY_CAST_DTYPES = {np.dtype(t).name
14 changes: 9 additions & 5 deletions pandas/core/dtypes/inference.py
Original file line number Diff line number Diff line change
@@ -5,7 +5,7 @@
from numbers import Number
from pandas import compat
from pandas.compat import (PY2, string_types, text_type,
string_and_binary_types, re_type)
string_and_binary_types, re_type, Set)
from pandas._libs import lib

is_bool = lib.is_bool
@@ -247,7 +247,7 @@ def is_re_compilable(obj):
return True


def is_list_like(obj):
def is_list_like(obj, strict=False):
"""
Check if the object is list-like.
@@ -259,6 +259,8 @@ def is_list_like(obj):
Parameters
----------
obj : The object to check.
strict : boolean, default False
If this parameter is True, sets will not be considered list-like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded tag

Returns
-------
@@ -283,11 +285,13 @@ def is_list_like(obj):
False
"""

return (isinstance(obj, compat.Iterable) and
return (isinstance(obj, compat.Iterable)
# we do not count strings/unicode/bytes as list-like
not isinstance(obj, string_and_binary_types) and
and not isinstance(obj, string_and_binary_types)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct, leave the and where it was

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 is clear about this (https://www.python.org/dev/peps/pep-0008/#should-a-line-break-before-or-after-a-binary-operator)

Binary operators (like and is one) should come after a line-break. It's also more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, changing this is in principle fine, we have been following that PEP8 rule recently (typically we only want such changes on lines that are already touched by the PR, but since you are here already touching the function some lines below, I would say it is fine).

Note that that is a recent change in PEP8, so you will see many places in the code that does it differently.

# exclude zero-dimensional numpy arrays, effectively scalars
not (isinstance(obj, np.ndarray) and obj.ndim == 0))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from adding the kwarg everywhere, this is the only substantial change of this PR.

and not (isinstance(obj, np.ndarray) and obj.ndim == 0)
# exclude sets if ordered_only
and not (strict and isinstance(obj, Set)))


def is_array_like(obj):
9 changes: 5 additions & 4 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
@@ -2083,12 +2083,12 @@ def _get_series_list(self, others, ignore_index=False):
elif isinstance(others, np.ndarray) and others.ndim == 2:
others = DataFrame(others, index=idx)
return ([others[x] for x in others], False)
elif is_list_like(others):
elif is_list_like(others, strict=True):
others = list(others) # ensure iterators do not get read twice etc

# in case of list-like `others`, all elements must be
# either one-dimensional list-likes or scalars
if all(is_list_like(x) for x in others):
if all(is_list_like(x, strict=True) for x in others):
los = []
join_warn = False
depr_warn = False
@@ -2116,7 +2116,8 @@ def _get_series_list(self, others, ignore_index=False):
# nested list-likes are forbidden:
# -> elements of nxt must not be list-like
is_legal = ((no_deep and nxt.dtype == object)
or all(not is_list_like(x) for x in nxt))
or all(not is_list_like(x, strict=True)
for x in nxt))

# DataFrame is false positive of is_legal
# because "x in df" returns column names
@@ -2134,7 +2135,7 @@ def _get_series_list(self, others, ignore_index=False):
'deprecated and will be removed in a future '
'version.', FutureWarning, stacklevel=3)
return (los, join_warn)
elif all(not is_list_like(x) for x in others):
elif all(not is_list_like(x, strict=True) for x in others):
return ([Series(others, index=idx)], False)
raise TypeError(err_msg)

31 changes: 25 additions & 6 deletions pandas/tests/dtypes/test_inference.py
Original file line number Diff line number Diff line change
@@ -66,20 +66,39 @@ def __getitem__(self):
@pytest.mark.parametrize(
"ll",
[
[], [1], (1, ), (1, 2), {'a': 1},
{1, 'a'}, Series([1]),
Series([]), Series(['a']).str,
np.array([2])])
[], [1], tuple(), (1, ), (1, 2), {'a': 1}, {1, 'a'}, np.array([2]),
Series([1]), Series([]), Series(['a']).str, Index([]), Index([1]),
DataFrame(), DataFrame([[1]]), iter([1, 2]), (x for x in [1, 2]),
np.ndarray((2,) * 2), np.ndarray((2,) * 3), np.ndarray((2,) * 4)
])
def test_is_list_like_passes(ll):
assert inference.is_list_like(ll)


@pytest.mark.parametrize(
"ll", [1, '2', object(), str, np.array(2)])
@pytest.mark.parametrize("ll", [1, '2', object(), str, np.array(2)])
def test_is_list_like_fails(ll):
assert not inference.is_list_like(ll)


@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to have 2 tests total to avoid the duplication of the args here (IOW 1 for allow_sets=True and 1 for False).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if my solution is what you had in mind, but I gave it a shot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the earlier version, but I don't think this is what Jeff had in mind. If we want to de-duplicate the arguments, you would need a fixture giving them

@pytest.fixture(params=...)
def maybe_list_like(request):
    return request.param

Each of the params would be a tuple like ([], True), ('2', False), and I guess something like ({}, None) or ({}, 'maybe'}) for set-likes.

Then we would have two tests. In the first we do

obj, expected = ...
if expected:
    expected = True

assert is_list_like(obj) is expected

and in the second

if expected is None:
    expected = False

assert is_list_like(obj, include_sets=False) is expected

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah @TomAugspurger suggestion is good here. The issues is we can't list the args twice.

"ll",
[
[], [1], tuple(), (1, ), (1, 2), {'a': 1}, np.array([2]),
Series([1]), Series([]), Series(['a']).str, Index([]), Index([1]),
DataFrame(), DataFrame([[1]]), iter([1, 2]), (x for x in [1, 2]),
np.ndarray((2,) * 2), np.ndarray((2,) * 3), np.ndarray((2,) * 4)
])
def test_is_list_like_strict_passes(ll):
assert inference.is_list_like(ll, strict=True)


@pytest.mark.parametrize("ll", [1, '2', object(), str, np.array(2),
{1, 'a'}, frozenset({1, 'a'})])
def test_is_list_like_strict_fails(ll):
# GH 23061
assert not inference.is_list_like(ll, strict=True)


def test_is_array_like():
assert inference.is_array_like(Series([]))
assert inference.is_array_like(Series([1, 2]))