TST: add method/dtype coverage to str-accessor; precursor to #23167 #23582

h-vetinari · 2018-11-08T21:57:24Z

PRECURSOR to API: Series.str-accessor infers dtype (and Index.str does not raise on all-NA) #23167
tests added / passed / xfailed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

The effort to test all methods of the .str-accessor against all inferred data types in #23167 unearthed a bunch of bugs. However, that PR is getting quite unwieldy, and since @jreback much prefers single-purpose PRs (and in the spirit of test-driven development), I'm splitting off just the addition of the parametrized fixtures / tests, with lots of xfails that will be removed by #23167 and subsequent PRs.

pep8speaks · 2018-11-08T21:57:28Z

Hello @h-vetinari! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/conftest.py !
There are no PEP8 issues in the file pandas/tests/dtypes/test_inference.py !
There are no PEP8 issues in the file pandas/tests/test_strings.py !

Comment last updated on November 20, 2018 at 06:48 Hours UTC

codecov · 2018-11-08T22:35:41Z

Codecov Report

Merging #23582 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #23582   +/-   ##
=======================================
  Coverage   92.29%   92.29%           
=======================================
  Files         161      161           
  Lines       51500    51500           
=======================================
  Hits        47533    47533           
  Misses       3967     3967

Flag	Coverage Δ
#multiple	`90.69% <ø> (ø)`	⬆️
#single	`42.42% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f02bf2...a53a28e. Read the comment docs.

h-vetinari · 2018-11-09T00:08:06Z

@jreback
This is green, PTAL.
(note: strongly related to #23167, see OP)

gfyoung · 2018-11-11T00:50:22Z

@jreback much prefers single-purpose PRs

Not just him, I think almost any of us who reviews PR's would prefer that if possible. 😉

h-vetinari · 2018-11-11T09:28:05Z

@gfyoung

Not just him, I think almost any of us who reviews PR's would prefer that if possible. 😉

That's fair. To me, #23167 is single purpose, but since the diff is large (and changing tests/code at the same time makes reviewing harder), I tried to find a way to slice it somehow.

jreback

so its nice that you are exhaustively testing things and found some issues. pls pls avoid compound tests. These are very hard to understand. I have suggested what you should do here.

jreback · 2018-11-11T15:12:02Z

pandas/tests/test_strings.py

@@ -26,6 +29,157 @@ def assert_series_or_index_equal(left, right):
        assert_index_equal(left, right)


+# method names plus minimal set of arguments to call
+_all_string_methods = [
+    ('get', [0]),


this is somewhat duplicating: test_str_accessor_api_for_categorical

can you reconcile and put in a pandas/conftest.py

some of the fixtures can stay here if they are specific to what is being tested. but the top level general stuff should move.

I cannot find the test you're talking about - where is it supposed to be?

Do I understand correctly that you'd like the _all_string_methods in pandas/conftest.py? I would think that fixture is very specific to test_strings. The dtype fixtures make more sense on a higher level.

(pandas) bash-3.2$ grep -r test_str_accessor_api_for_categorical pandas/tests/ Binary file pandas/tests//series/__pycache__/test_api.cpython-36-PYTEST.pyc matches pandas/tests//series/test_api.py: def test_str_accessor_api_for_categorical(self):

Thanks. How about moving this test to test_strings? It's testing the API, but is clearly only related to the str accessor.

jreback · 2018-11-11T15:14:09Z

pandas/tests/test_strings.py

+
+@pytest.fixture(params=_all_skipna_inferred_dtypes, ids=ids)
+def all_skipna_inferred_dtypes(request):
+    """


so makes sense to also use these in the inferred_type tests then.

You mean for the tests in tests/dtypes/test_inference.py?

They're being used in this module already.

not sure what you mean

I'm asking if you want me to move this particular fixture to pandas/conftest.py and then test it within the dtype tests (because this is effectively a dtype thing).

jreback · 2018-11-11T15:15:23Z

pandas/tests/test_strings.py

+
+        types_passing_constructor = ['string', 'unicode', 'empty',
+                                     'bytes', 'mixed', 'mixed-integer']
+        if inferred_dtype in types_passing_constructor:


see my comment above

jreback · 2018-11-11T15:15:58Z

pandas/tests/test_strings.py

+        # one instance of parametrized fixture
+        inferred_dtype, values = all_skipna_inferred_dtypes
+
+        t = box(values, dtype=dtype)  # explicit dtype to avoid casting


you can actually make this 2 tests by putting the xfails in a function. then the test becomes single purpose and you don't have the if statement near the bottom.

I don't understand what you're asking for here, sorry.

The test is already very single-purpose (except the xfails, which will be gone with #23167 and follow-up PRs), and the final if-switch makes it transparent which types are actually passing the constructor, with all other types raising. this would only get harder to understand if it's split into two, no?

no you have an if else branch. it makes reasoning about this impossible as its very fixture / data dependent.

The if branch is transparently for the cases where the .str-accessor raises or not. I do not understand how you want me to structure this test (resp. this function you mentioned).

if/else make the test very hard to understand. pls break in 2

This particular test can be broken into _passes and _raises, but that last if condition is really not that hard. Don't get that objection, tbh.

if inferred_dtype in types_passing_constructor: # pass else: # raise

Respectfully, I find it an unrealistic criterion to not be able to use one simple if-condition in a test (aside from xfails, which will be gone soon after). The whole point is that it's got an extensively parametrized fixture (any_skipna_inferred_dtype), and I have to make a single distinction based on the content of that fixture.

Even if I were to split this test into _passes and _raises, I'd have to use the same kind of if-condition, unless I were to needlessly break up the parametrized fixtures into smaller subsets.

pandas/tests/test_strings.py

jreback · 2018-11-11T15:17:06Z

pandas/tests/test_strings.py

+        method_name, minimal_args = all_string_methods
+
+        # TODO: get rid of these xfails
+        if (method_name not in ['encode', 'decode', 'len']


same comment as above

These xfails will be gone, and then the test reads very clearly, IMO

this test cannot be broken up as easily, because the allowed types depend on the method being checked!

h-vetinari

Thanks for the review.

so its nice that you are exhaustively testing things and found some issues. pls pls avoid compound tests. These are very hard to understand. I have suggested what you should do here.

I don't understand what you mean by "compound" tests. I agree that the xfails are not pretty, but that's the only way I could split up #23167. The idea is that all these tests are free of xfails (of course), and then they read quite clearly IMO.

Regarding the fixtures, I would leave the string-methods here (couldn't find the test you mentioned), and would maybe move the dtype ones to pandas/conftest.py and test them again in tests/dtypes/test_inference.py

h-vetinari · 2018-11-11T17:29:59Z

pandas/tests/test_strings.py

@@ -26,6 +29,157 @@ def assert_series_or_index_equal(left, right):
        assert_index_equal(left, right)


+# method names plus minimal set of arguments to call
+_all_string_methods = [
+    ('get', [0]),


I cannot find the test you're talking about - where is it supposed to be?

Do I understand correctly that you'd like the _all_string_methods in pandas/conftest.py? I would think that fixture is very specific to test_strings. The dtype fixtures make more sense on a higher level.

h-vetinari · 2018-11-11T17:31:21Z

pandas/tests/test_strings.py

+
+@pytest.fixture(params=_all_skipna_inferred_dtypes, ids=ids)
+def all_skipna_inferred_dtypes(request):
+    """


You mean for the tests in tests/dtypes/test_inference.py?

They're being used in this module already.

h-vetinari · 2018-11-11T17:34:35Z

pandas/tests/test_strings.py

+        # one instance of parametrized fixture
+        inferred_dtype, values = all_skipna_inferred_dtypes
+
+        t = box(values, dtype=dtype)  # explicit dtype to avoid casting


I don't understand what you're asking for here, sorry.

The test is already very single-purpose (except the xfails, which will be gone with #23167 and follow-up PRs), and the final if-switch makes it transparent which types are actually passing the constructor, with all other types raising. this would only get harder to understand if it's split into two, no?

pandas/tests/test_strings.py

h-vetinari · 2018-11-11T17:36:49Z

pandas/tests/test_strings.py

+        method_name, minimal_args = all_string_methods
+
+        # TODO: get rid of these xfails
+        if (method_name not in ['encode', 'decode', 'len']


These xfails will be gone, and then the test reads very clearly, IMO

h-vetinari

Thanks for the review.

h-vetinari · 2018-11-11T22:15:28Z

pandas/tests/test_strings.py

+
+@pytest.fixture(params=_all_skipna_inferred_dtypes, ids=ids)
+def all_skipna_inferred_dtypes(request):
+    """


I'm asking if you want me to move this particular fixture to pandas/conftest.py and then test it within the dtype tests (because this is effectively a dtype thing).

h-vetinari · 2018-11-11T22:16:34Z

pandas/tests/test_strings.py

+        # one instance of parametrized fixture
+        inferred_dtype, values = all_skipna_inferred_dtypes
+
+        t = box(values, dtype=dtype)  # explicit dtype to avoid casting


The if branch is transparently for the cases where the .str-accessor raises or not. I do not understand how you want me to structure this test (resp. this function you mentioned).

pandas/tests/test_strings.py

h-vetinari

Proposing to move the api test into this module. It's all about the .str accessor

h-vetinari · 2018-11-12T07:35:22Z

pandas/tests/test_strings.py

@@ -26,6 +29,157 @@ def assert_series_or_index_equal(left, right):
        assert_index_equal(left, right)


+# method names plus minimal set of arguments to call
+_all_string_methods = [
+    ('get', [0]),


Thanks. How about moving this test to test_strings? It's testing the API, but is clearly only related to the str accessor.

h-vetinari · 2018-11-12T07:36:46Z

pandas/tests/test_strings.py

+        # one instance of parametrized fixture
+        inferred_dtype, values = all_skipna_inferred_dtypes
+
+        t = box(values, dtype=dtype)  # explicit dtype to avoid casting


This particular test can be broken into _passes and _raises, but that last if condition is really not that hard. Don't get that objection, tbh.

if inferred_dtype in types_passing_constructor: # pass else: # raise

h-vetinari · 2018-11-12T07:37:17Z

pandas/tests/test_strings.py

+        method_name, minimal_args = all_string_methods
+
+        # TODO: get rid of these xfails
+        if (method_name not in ['encode', 'decode', 'len']


this test cannot be broken up as easily, because the allowed types depend on the method being checked!

h-vetinari · 2018-11-12T18:25:24Z

@jreback

Thanks. How about moving this test [test_str_accessor_api_for_categorical] to test_strings.py? It's testing the API, but is clearly only related to the .str accessor.

Can I get a go-ahead on that? TBH, I don't think the pandas/conftest.py should be polluted with the very .str-specific methods. Would also make the unification much easier. Can move in separate PR, if you want.

h-vetinari

@jreback
Incorporated your feedback, as much as possible. I've played through a couple of scenarios of how to break up the tests like you wanted, but none of them lead to easier / more readable code. Ultimately, these if-conditions have an extremely simple/transparent structure:

if inferred_dtype in allowed_types:
    # pass (1 LOC)
else:
    # raise (4-5 LOC)

and I strongly believe these should not be broken up further.

h-vetinari · 2018-11-13T22:33:03Z

pandas/tests/test_strings.py

+        # one instance of parametrized fixture
+        inferred_dtype, values = all_skipna_inferred_dtypes
+
+        t = box(values, dtype=dtype)  # explicit dtype to avoid casting


Respectfully, I find it an unrealistic criterion to not be able to use one simple if-condition in a test (aside from xfails, which will be gone soon after). The whole point is that it's got an extensively parametrized fixture (any_skipna_inferred_dtype), and I have to make a single distinction based on the content of that fixture.

Even if I were to split this test into _passes and _raises, I'd have to use the same kind of if-condition, unless I were to needlessly break up the parametrized fixtures into smaller subsets.

h-vetinari · 2018-11-13T22:34:12Z