-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add allow_sets-kwarg to is_list_like #23065
Conversation
# we do not count strings/unicode/bytes as list-like | ||
not isinstance(obj, string_and_binary_types) and | ||
# exclude zero-dimensional numpy arrays, effectively scalars | ||
not (isinstance(obj, np.ndarray) and obj.ndim == 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from adding the kwarg everywhere, this is the only substantial change of this PR.
pandas/core/dtypes/inference.py
Outdated
not isinstance(obj, string_and_binary_types) and | ||
# exclude zero-dimensional numpy arrays, effectively scalars | ||
not (isinstance(obj, np.ndarray) and obj.ndim == 0)) | ||
if strict is None and isinstance(obj, set): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the isinstance
checks should be against something like collections.abc.Set
to catch more exotic set
variations:
In [2]: x = list('aabbc')
In [3]: s1 = set(x)
In [4]: s2 = frozenset(x)
In [5]: isinstance(s2, set)
Out[5]: False
In [6]: isinstance(s2, collections.abc.Set)
Out[6]: True
In [7]: isinstance(s1, collections.abc.Set)
Out[7]: True
def test_is_list_like_strict(): | ||
ll = {1, 'a'} | ||
assert inference.is_list_like(ll, strict=False) | ||
assert not inference.is_list_like(ll, strict=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test should cover the strict=None
case: verify that the warning is raised, and the behavior is consistent with strict=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add issue number
as i believe tom commented this is not likely to be worth changing |
pandas/core/dtypes/inference.py
Outdated
not (isinstance(obj, np.ndarray) and obj.ndim == 0)) | ||
if strict is None and isinstance(obj, set): | ||
# only raise warning if necessary | ||
warnings.warn('is_list_like will in the future return False for sets. ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a consensus that we want this to be the default future behavior instead of a non-default option? I'm not convinced. Would be interesting to see how many of the existing uses of is_list_like
are valid vs. invalid for sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole idea was just that - just an idea
i don’t think we actually want to add a kwarg at all
the idea was to change it within pandas only (if this is useful)
and not actually change the external api at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since list
is ordered in python, the name is_list_like
suggests the same property. If that's not the case it's good ("explicit is better than implicit") to have to specify it.
Don't see why that shouldn't apply to external API as well (with deprecation cycle etc.)
Codecov Report
@@ Coverage Diff @@
## master #23065 +/- ##
==========================================
- Coverage 92.19% 92.18% -0.01%
==========================================
Files 169 169
Lines 50954 50956 +2
==========================================
+ Hits 46975 46976 +1
- Misses 3979 3980 +1
Continue to review full report at Codecov.
|
As I believe Tom commented that this is not worth breaking, but this is not a breaking change. |
@h-vetinari it's maybe not breaking, but you still plan to change the default behaviour (the warning). |
@jorisvandenbossche
|
@h-vetinari my point in bringing this up would be to see if this is worth it this was a good experiment - maybe i wasn’t clear - trying means reporting what happened, then we have a discussion about it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments
Summarizing how I see things:
So we can either do things like if is_list_like(x) and not is_unodered(x):
... or do things like if is_list_like(x, require_ordered=True):
pass Between those to, the keyword to I don't think changing the default behavior of considering sets as list-like is worth it. |
OK, not changing the default. I see two options:
|
i think is_ordered_list_like is nicer here (but just make it internal for now, don’t add to api.types) |
Hello @h-vetinari! Thanks for updating the PR.
Comment last updated on October 17, 2018 at 16:18 Hours UTC |
@h-vetinari is there an actual use case of the checking for ordered dict / dict below 3.6 ? Or just to want to be strict in the checking for orderedness? |
Well, that's what CPython does. I thought I'd be thorough about the orderedness thing, but I don't care about the |
Then please remove it. If we don't have a use case, there is no point in trying to be strict for the sake of being it, IMO |
Another thing, I am personally not sure we should use the term "unordered", instead of just being explicit that it is about Sets (because you can eg still sort a set, or convert it to an orderable list, etc. And what if there then comes up another unordered data structure?) In that sense, I would find a |
Alright, removing dict. Can you agree with @jreback and @TomAugspurger on whether it's
I don't care about which way, but prefer shorter to read/type |
Agreed with @jorisvandenbossche that
I don't have a preference between adding a kwarg to |
@jreback: Adding |
i think allow_sets as a kwarg is ok (default is True) |
Thanks for the responses / inputs everyone. Changed back to kwarg, renamed it to |
is_bool, is_integer, is_float, is_number, is_decimal, is_complex, | ||
is_re, is_re_compilable, is_dict_like, is_string_like, is_file_like, | ||
is_list_like, is_nested_list_like, is_sequence, is_named_tuple, | ||
is_hashable, is_iterator, is_array_like, is_scalar, is_interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat of an artefact of the version with is_ordered_list_like
, where I tried to group these methods by similarity (i.e. scalar dtypes, regexes, containers), but I decided to keep it because I think it helps. Can revert that part of course
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, on any change, pls try to do the minimal changeset. This will lessen reviewer burden and make things go faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"yes, please try to do minimal changeset [next time]" or "yes please revert"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for now, but generally pls don't change unrelated things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a whatsnew note
is_bool, is_integer, is_float, is_number, is_decimal, is_complex, | ||
is_re, is_re_compilable, is_dict_like, is_string_like, is_file_like, | ||
is_list_like, is_nested_list_like, is_sequence, is_named_tuple, | ||
is_hashable, is_iterator, is_array_like, is_scalar, is_interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, on any change, pls try to do the minimal changeset. This will lessen reviewer burden and make things go faster.
@@ -259,6 +259,8 @@ def is_list_like(obj): | |||
Parameters | |||
---------- | |||
obj : The object to check. | |||
allow_sets : boolean, default True | |||
If this parameter is False, sets will not be considered list-like | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versionadded tag
# exclude zero-dimensional numpy arrays, effectively scalars | ||
not (isinstance(obj, np.ndarray) and obj.ndim == 0)) | ||
and not (isinstance(obj, np.ndarray) and obj.ndim == 0) | ||
# exclude sets if allow_sets is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line before comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: I don't like the blank lines inside an if condition.
But nothing that needs to be changed now.
def test_is_list_like_fails(ll): | ||
assert not inference.is_list_like(ll) | ||
|
||
|
||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to have 2 tests total to avoid the duplication of the args here (IOW 1 for allow_sets=True and 1 for False).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if my solution is what you had in mind, but I gave it a shot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the earlier version, but I don't think this is what Jeff had in mind. If we want to de-duplicate the arguments, you would need a fixture giving them
@pytest.fixture(params=...)
def maybe_list_like(request):
return request.param
Each of the params would be a tuple like ([], True)
, ('2', False)
, and I guess something like ({}, None)
or ({}, 'maybe'})
for set-likes.
Then we would have two tests. In the first we do
obj, expected = ...
if expected:
expected = True
assert is_list_like(obj) is expected
and in the second
if expected is None:
expected = False
assert is_list_like(obj, include_sets=False) is expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah @TomAugspurger suggestion is good here. The issues is we can't list the args twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review
is_bool, is_integer, is_float, is_number, is_decimal, is_complex, | ||
is_re, is_re_compilable, is_dict_like, is_string_like, is_file_like, | ||
is_list_like, is_nested_list_like, is_sequence, is_named_tuple, | ||
is_hashable, is_iterator, is_array_like, is_scalar, is_interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"yes, please try to do minimal changeset [next time]" or "yes please revert"?
def test_is_list_like_fails(ll): | ||
assert not inference.is_list_like(ll) | ||
|
||
|
||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if my solution is what you had in mind, but I gave it a shot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 overall, one comment on the test.
is_bool, is_integer, is_float, is_number, is_decimal, is_complex, | ||
is_re, is_re_compilable, is_dict_like, is_string_like, is_file_like, | ||
is_list_like, is_nested_list_like, is_sequence, is_named_tuple, | ||
is_hashable, is_iterator, is_array_like, is_scalar, is_interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine as is for now.
# exclude zero-dimensional numpy arrays, effectively scalars | ||
not (isinstance(obj, np.ndarray) and obj.ndim == 0)) | ||
and not (isinstance(obj, np.ndarray) and obj.ndim == 0) | ||
# exclude sets if allow_sets is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nit: I don't like the blank lines inside an if condition.
But nothing that needs to be changed now.
def test_is_list_like_fails(ll): | ||
assert not inference.is_list_like(ll) | ||
|
||
|
||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the earlier version, but I don't think this is what Jeff had in mind. If we want to de-duplicate the arguments, you would need a fixture giving them
@pytest.fixture(params=...)
def maybe_list_like(request):
return request.param
Each of the params would be a tuple like ([], True)
, ('2', False)
, and I guess something like ({}, None)
or ({}, 'maybe'})
for set-likes.
Then we would have two tests. In the first we do
obj, expected = ...
if expected:
expected = True
assert is_list_like(obj) is expected
and in the second
if expected is None:
expected = False
assert is_list_like(obj, include_sets=False) is expected
is_bool, is_integer, is_float, is_number, is_decimal, is_complex, | ||
is_re, is_re_compilable, is_dict_like, is_string_like, is_file_like, | ||
is_list_like, is_nested_list_like, is_sequence, is_named_tuple, | ||
is_hashable, is_iterator, is_array_like, is_scalar, is_interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for now, but generally pls don't change unrelated things.
def test_is_list_like_fails(ll): | ||
assert not inference.is_list_like(ll) | ||
|
||
|
||
@pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah @TomAugspurger suggestion is good here. The issues is we can't list the args twice.
OK, hoping this last commit is more what you guys were after. I decided to add the
instead of
Of course it would be possible to have separate lists for the list-likes, the expected value, and the ids, but then its really instable that all three need to be edited simultaneously in case there's any change. That's why I opted for defining everything as tuples, and then doing some |
thanks @h-vetinari |
sets in str.cat?! #23009API: set should not be considered list_like #23061git diff upstream/master -u -- "*.py" | flake8 --diff
This is an attempt responding to #22486 (comment):
Following some initial discussion in #23061, I decided to go with a variant that does not break anything - i.e. adding a keyword which defaults to the current behaviour. I've added a warning that's only raised if necessary, to note that this behaviour will be changed in the future -- regardless of whether it is deprecated or not, I think that users as well as developers should have to actively choose to include unordered sets (reason i.a. for #23009, and probably some more).
The tedious part of this PR was hunting down all the internal uses of
is_list_like
and adding the kwarg there to avoid raising the warning. Hope I didn't miss any.