-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: add method/dtype coverage to str-accessor; precursor to #23167 #23582
Conversation
Hello @h-vetinari! Thanks for updating the PR.
Comment last updated on November 20, 2018 at 06:48 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #23582 +/- ##
=======================================
Coverage 92.29% 92.29%
=======================================
Files 161 161
Lines 51500 51500
=======================================
Hits 47533 47533
Misses 3967 3967
Continue to review full report at Codecov.
|
Not just him, I think almost any of us who reviews PR's would prefer that if possible. 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so its nice that you are exhaustively testing things and found some issues. pls pls avoid compound tests. These are very hard to understand. I have suggested what you should do here.
pandas/tests/test_strings.py
Outdated
@@ -26,6 +29,157 @@ def assert_series_or_index_equal(left, right): | |||
assert_index_equal(left, right) | |||
|
|||
|
|||
# method names plus minimal set of arguments to call | |||
_all_string_methods = [ | |||
('get', [0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is somewhat duplicating: test_str_accessor_api_for_categorical
can you reconcile and put in a pandas/conftest.py
some of the fixtures can stay here if they are specific to what is being tested. but the top level general stuff should move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find the test you're talking about - where is it supposed to be?
Do I understand correctly that you'd like the _all_string_methods
in pandas/conftest.py
? I would think that fixture is very specific to test_strings
. The dtype
fixtures make more sense on a higher level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(pandas) bash-3.2$ grep -r test_str_accessor_api_for_categorical pandas/tests/
Binary file pandas/tests//series/__pycache__/test_api.cpython-36-PYTEST.pyc matches
pandas/tests//series/test_api.py: def test_str_accessor_api_for_categorical(self):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. How about moving this test to test_strings
? It's testing the API, but is clearly only related to the str
accessor.
pandas/tests/test_strings.py
Outdated
|
||
@pytest.fixture(params=_all_skipna_inferred_dtypes, ids=ids) | ||
def all_skipna_inferred_dtypes(request): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so makes sense to also use these in the inferred_type tests then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean for the tests in tests/dtypes/test_inference.py
?
They're being used in this module already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking if you want me to move this particular fixture to pandas/conftest.py
and then test it within the dtype tests (because this is effectively a dtype thing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
|
||
types_passing_constructor = ['string', 'unicode', 'empty', | ||
'bytes', 'mixed', 'mixed-integer'] | ||
if inferred_dtype in types_passing_constructor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above
# one instance of parametrized fixture | ||
inferred_dtype, values = all_skipna_inferred_dtypes | ||
|
||
t = box(values, dtype=dtype) # explicit dtype to avoid casting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can actually make this 2 tests by putting the xfails in a function. then the test becomes single purpose and you don't have the if statement near the bottom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you're asking for here, sorry.
The test is already very single-purpose (except the xfails, which will be gone with #23167 and follow-up PRs), and the final if-switch makes it transparent which types are actually passing the constructor, with all other types raising. this would only get harder to understand if it's split into two, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no you have an if else branch. it makes reasoning about this impossible as its very fixture / data dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if branch is transparently for the cases where the .str
-accessor raises or not. I do not understand how you want me to structure this test (resp. this function you mentioned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if/else make the test very hard to understand. pls break in 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular test can be broken into _passes
and _raises
, but that last if condition is really not that hard. Don't get that objection, tbh.
if inferred_dtype in types_passing_constructor:
# pass
else:
# raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Respectfully, I find it an unrealistic criterion to not be able to use one simple if-condition in a test (aside from xfails, which will be gone soon after). The whole point is that it's got an extensively parametrized fixture (any_skipna_inferred_dtype
), and I have to make a single distinction based on the content of that fixture.
Even if I were to split this test into _passes
and _raises
, I'd have to use the same kind of if-condition, unless I were to needlessly break up the parametrized fixtures into smaller subsets.
method_name, minimal_args = all_string_methods | ||
|
||
# TODO: get rid of these xfails | ||
if (method_name not in ['encode', 'decode', 'len'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These xfails will be gone, and then the test reads very clearly, IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test cannot be broken up as easily, because the allowed types depend on the method being checked!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review.
so its nice that you are exhaustively testing things and found some issues. pls pls avoid compound tests. These are very hard to understand. I have suggested what you should do here.
I don't understand what you mean by "compound" tests. I agree that the xfails are not pretty, but that's the only way I could split up #23167. The idea is that all these tests are free of xfails (of course), and then they read quite clearly IMO.
Regarding the fixtures, I would leave the string-methods here (couldn't find the test you mentioned), and would maybe move the dtype ones to pandas/conftest.py
and test them again in tests/dtypes/test_inference.py
pandas/tests/test_strings.py
Outdated
@@ -26,6 +29,157 @@ def assert_series_or_index_equal(left, right): | |||
assert_index_equal(left, right) | |||
|
|||
|
|||
# method names plus minimal set of arguments to call | |||
_all_string_methods = [ | |||
('get', [0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find the test you're talking about - where is it supposed to be?
Do I understand correctly that you'd like the _all_string_methods
in pandas/conftest.py
? I would think that fixture is very specific to test_strings
. The dtype
fixtures make more sense on a higher level.
pandas/tests/test_strings.py
Outdated
|
||
@pytest.fixture(params=_all_skipna_inferred_dtypes, ids=ids) | ||
def all_skipna_inferred_dtypes(request): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean for the tests in tests/dtypes/test_inference.py
?
They're being used in this module already.
# one instance of parametrized fixture | ||
inferred_dtype, values = all_skipna_inferred_dtypes | ||
|
||
t = box(values, dtype=dtype) # explicit dtype to avoid casting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you're asking for here, sorry.
The test is already very single-purpose (except the xfails, which will be gone with #23167 and follow-up PRs), and the final if-switch makes it transparent which types are actually passing the constructor, with all other types raising. this would only get harder to understand if it's split into two, no?
method_name, minimal_args = all_string_methods | ||
|
||
# TODO: get rid of these xfails | ||
if (method_name not in ['encode', 'decode', 'len'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These xfails will be gone, and then the test reads very clearly, IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review.
pandas/tests/test_strings.py
Outdated
|
||
@pytest.fixture(params=_all_skipna_inferred_dtypes, ids=ids) | ||
def all_skipna_inferred_dtypes(request): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking if you want me to move this particular fixture to pandas/conftest.py
and then test it within the dtype tests (because this is effectively a dtype thing).
# one instance of parametrized fixture | ||
inferred_dtype, values = all_skipna_inferred_dtypes | ||
|
||
t = box(values, dtype=dtype) # explicit dtype to avoid casting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if branch is transparently for the cases where the .str
-accessor raises or not. I do not understand how you want me to structure this test (resp. this function you mentioned).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposing to move the api test into this module. It's all about the .str
accessor
pandas/tests/test_strings.py
Outdated
@@ -26,6 +29,157 @@ def assert_series_or_index_equal(left, right): | |||
assert_index_equal(left, right) | |||
|
|||
|
|||
# method names plus minimal set of arguments to call | |||
_all_string_methods = [ | |||
('get', [0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. How about moving this test to test_strings
? It's testing the API, but is clearly only related to the str
accessor.
# one instance of parametrized fixture | ||
inferred_dtype, values = all_skipna_inferred_dtypes | ||
|
||
t = box(values, dtype=dtype) # explicit dtype to avoid casting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular test can be broken into _passes
and _raises
, but that last if condition is really not that hard. Don't get that objection, tbh.
if inferred_dtype in types_passing_constructor:
# pass
else:
# raise
method_name, minimal_args = all_string_methods | ||
|
||
# TODO: get rid of these xfails | ||
if (method_name not in ['encode', 'decode', 'len'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test cannot be broken up as easily, because the allowed types depend on the method being checked!
Can I get a go-ahead on that? TBH, I don't think the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
Incorporated your feedback, as much as possible. I've played through a couple of scenarios of how to break up the tests like you wanted, but none of them lead to easier / more readable code. Ultimately, these if-conditions have an extremely simple/transparent structure:
if inferred_dtype in allowed_types:
# pass (1 LOC)
else:
# raise (4-5 LOC)
and I strongly believe these should not be broken up further.
# one instance of parametrized fixture | ||
inferred_dtype, values = all_skipna_inferred_dtypes | ||
|
||
t = box(values, dtype=dtype) # explicit dtype to avoid casting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Respectfully, I find it an unrealistic criterion to not be able to use one simple if-condition in a test (aside from xfails, which will be gone soon after). The whole point is that it's got an extensively parametrized fixture (any_skipna_inferred_dtype
), and I have to make a single distinction based on the content of that fixture.
Even if I were to split this test into _passes
and _raises
, I'd have to use the same kind of if-condition, unless I were to needlessly break up the parametrized fixtures into smaller subsets.
pandas/tests/series/test_api.py
Outdated
assert isinstance(c.str, StringMethods) | ||
|
||
# str functions, which need special arguments | ||
special_func_defs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full list of these functions & arg-combinations is reflected in the any_string_method
-fixture of test_strings.py
now.
pandas/tests/series/test_api.py
Outdated
# we can't make a categorical with lists as individual categories. | ||
# -> `s.str.split(" ").astype("category")` will error! | ||
# * `translate` has different interfaces for py2 vs. py3 | ||
_ignore_names = ["get", "join", "translate"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I also got rid of these previously "ignored" methods - meaning they're being tested now as well.
pandas/tests/series/test_api.py
Outdated
func_defs = [(f, (), {}) for f in str_func_names] | ||
func_defs.extend(special_func_defs) | ||
|
||
for func, args, kwargs in func_defs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop is now explicitly parametrized.
pandas/tests/series/test_api.py
Outdated
('startswith', ("a",), {}), | ||
('wrap', (2,), {}), | ||
('zfill', (10,), {}) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added a handful more combinations to test
pandas/tests/series/test_api.py
Outdated
|
||
with pytest.raises(AttributeError, match=msg): | ||
invalid.str | ||
assert not hasattr(invalid, 'str') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is fully tested ~100x as thoroughly by test_strings.py::TestStringMethods::test_api_per_dtype
pandas/tests/test_strings.py
Outdated
# test that the above list captures all methods of StringMethods | ||
missing_methods = {f for f in dir(strings.StringMethods) | ||
if not f.startswith('_')} - set(ids) | ||
assert not missing_methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sure that:
- fixture is complete (i.e. no methods missing; as previously in
test_str_accessor_api_for_categorical
) - the reader can easily see a list of methods
pandas/tests/test_strings.py
Outdated
(dtype, values) for dtype, values | ||
in top_level_conftest._any_skipna_inferred_dtype | ||
if dtype in {'series', 'unicode', 'empty', | ||
'bytes', 'mixed', 'mixed-integer'}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided that the cleanest way to be DRY about this is to import the top-level conftest. This should be quite transparent to read, I hope
with pytest.raises(TypeError, match=msg): | ||
method(*args, **kwargs) | ||
|
||
def test_api_for_categorical(self, any_string_method): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the remnants of test_str_accessor_api_for_categorical
after parametrization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempt to get rid of the ResourceWarning
(#23680)
pandas/core/frame.py
Outdated
Examples | ||
-------- | ||
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]}) | ||
>>> df.to_parquet('df.parquet.gzip', compression='gzip') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback The warnings are clearly unrelated as this PR is not even changing any code! See also #23680. Would like to get this PR moving so that the problems discovered by #23167 (and that PR itself) can be solved. PTAL |
@gfyoung |
@jreback Sorry for pinging again... This is blocking PRs for about 5-8 issues/PRs, so would very much like to get this moving. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry had multiple changes open and approved the wrong one - I haven't looked at this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is adding too much code and is overly complicated. i understand that you want to test things, but this need to be broken up by a significant amount into several PR's.
@@ -495,6 +495,13 @@ class TestTypeInference(object): | |||
class Dummy(): | |||
pass | |||
|
|||
def test_inferred_dtype_fixture(self, any_skipna_inferred_dtype): | |||
# see pandas/conftest.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its nice that you added this, but you didn't remove any code. if you are not going to do that , then not much point of putting the fixture in conftest.py in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
This is coming out of your review above:
I'm asking if you want me to move this particular fixture to
pandas/conftest.py
and then test it within the dtype tests (because this is effectively a dtype thing).
Yes
And I don't get how adding this fixture is tied to code removal? I'm testing the .str
-accessor on all the inferred dtypes to make sure it raises correctly, that's what I mainly need this fixture for.
That I'm testing the validity of the fixture in test_inference.py
is for consistency, because it belongs there thematically (but could otherwise test that directly in the fixture constructor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
This fixture would be a perfect candidate for splitting off a PR, but then, I'm afraid you're gonna say it doesn't do anything interesting (yet).
Do you want me to:
- split it up, and have the fixture being unusued until this PR is merged,
- or do want me to keep things logically together (i.e. in this PR)?
I split off a good chunk with #23777. Could split off the dtype fixture as well, see #23582 (comment) |
…fixed * upstream/master: DOC: more consistent flake8-commands in contributing.rst (pandas-dev#23724) DOC: Fixed the doctsring for _set_axis_name (GH 22895) (pandas-dev#22969) DOC: Improve GL03 message re: blank lines at end of docstrings. (pandas-dev#23649) TST: add tests for keeping dtype in Series.update (pandas-dev#23604) TST: For GH4861, Period and datetime in multiindex (pandas-dev#23776) TST: move .str-test to strings.py & parametrize it; precursor to pandas-dev#23582 (pandas-dev#23777) STY: isort tests/scalar, tests/tslibs, import libwindow instead of _window (pandas-dev#23787) BUG: fixed .str.contains(..., na=False) for categorical series (pandas-dev#22170) BUG: Maintain column order with groupby.nth (pandas-dev#22811) API/DEPR: replace kwarg "pat" with "sep" in str.[r]partition (pandas-dev#23767) CLN: Finish isort core (pandas-dev#23765) TST: Mark test_pct_max_many_rows with pytest.mark.single (pandas-dev#23799)
@jreback PTAL :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@h-vetinari i know we have gone back and forth a few times. but the current state is just not great. let's duplicate the fixtures, the full set in test_inference and then pick the ones you need in test_strings.
i will have followup comments after.
pandas/tests/test_strings.py
Outdated
@@ -16,6 +16,7 @@ | |||
from pandas.util.testing import assert_series_equal, assert_index_equal | |||
import pandas.util.testing as tm | |||
|
|||
import pandas.conftest as top_level_conftest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never import conftest
It's OK, some things take time. This has had one round of review and one split-off precursor so far, so not too bad either.
Not sure I understand - the fixtures are in |
@jreback PTAL |
thanks @h-vetinari would like to remove the xfails (e.g. fixing) as soon as can be done. |
git diff upstream/master -u -- "*.py" | flake8 --diff
The effort to test all methods of the
.str
-accessor against all inferred data types in #23167 unearthed a bunch of bugs. However, that PR is getting quite unwieldy, and since @jreback much prefers single-purpose PRs (and in the spirit of test-driven development), I'm splitting off just the addition of the parametrized fixtures / tests, with lots ofxfails
that will be removed by #23167 and subsequent PRs.