@pytest.mark.xfail is being applied too broadly #4425

ricardoV94 · 2021-01-20T10:23:09Z

When working in https://github.com/pymc-devs/pymc3/blob/master/pymc3/tests/test_distributions.py in #4421 I noticed that several of the @pytest.mark.xfail are being applied to more than one function at a time, usually to the pymc3_matches_scipy and the check_logcdf (and also entirely due to my fault in #4393, the new check_selfconsistency_discrete_logcdf).

I don't know if the xfail is clever enough to evaluate each subfunction separately or if it is enough if one fails. Even if pytest counts each (surprising) successful subfunction as an additional xpass, I don't think people pay much attention to these and we may not notice if their behavior changes in the future.

Here is one concrete example:

https://github.com/pymc-devs/pymc3/blob/6360b005fc610d0505f84885743215a3e09f046e/pymc3/tests/test_distributions.py#L1258-L1278

From this PR https://github.com/pymc-devs/pymc3/pull/3944/files

It seems that xfail was added to the test_inverse_gamma when the check_logcdf method was added, and it was not necessary when only the pymc3_matches_scipy was being tested. If the behavior of the logp of the Gamma changes in the future (albheit in a contrived scenario that only fails in float32), we would miss it.

The text was updated successfully, but these errors were encountered:

matteo-pallini · 2021-02-25T09:20:07Z

I would love to work on this one, if still free.

ricardoV94 · 2021-02-25T09:42:44Z

Go ahead :) Let us know if you need anything.

matteo-pallini · 2021-02-27T19:12:42Z

It seems like xfail will treat the subsequent test function as a single tested unit. ie it won't be clever enough to consider each subfunction individually and add its result to the final summary. So, effectively for the example you shared we will have 1 xfail in both cases when only one of check_logcdf and check_logp fails and when both fail. To generalize a single failing subfunction should be enough to mark that test as xfail and that would be it.

Below a toy example I used to reach these conclusions and its relative pytest summary

import pytest


class TestTwoFailures:
    CONDITION = True

    def foo(self):
        assert (1 == 2)

    def bar(self):
        assert False

    @pytest.mark.xfail(
        condition=CONDITION, reason="test"
    )
    def test_main(self):
        self.foo()
        self.bar()


class TestOneFailure:
    CONDITION = True

    def foo(self):
        assert (1 == 1)

    def bar(self):
        assert False

    @pytest.mark.xfail(
        condition=CONDITION, reason="test"
    )
    def test_main(self):
        self.foo()
        self.bar()


class TestNoFailure:
    CONDITION = True

    def foo(self):
        assert (1 == 1)

    def bar(self):
        assert True

    @pytest.mark.xfail(
        condition=CONDITION, reason="test"
    )
    def test_main(self):
        self.foo()
        self.bar()

(pymc3-dev-py37) ➜  pymc3 git:(master) ✗ pytest -rxXs pymc3/tests/foo.py
==================================== test session starts =====================================
platform linux -- Python 3.7.10, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /home/matteo/personal/pymc3, configfile: pytest.ini
plugins: cov-2.11.1
collected 3 items                                                                            

pymc3/tests/foo.py xxX                                                                 [100%]

================================== short test summary info ===================================
XFAIL pymc3/tests/foo.py::TestTwoFailures::test_main
  test
XFAIL pymc3/tests/foo.py::TestOneFailure::test_main
  test
XPASS pymc3/tests/foo.py::TestNoFailure::test_main test
=============================== 2 xfailed, 1 xpassed in 0.14s ================================

Proposed solution

I personally think that splitting the test in one test per subfunction is the cleanest solution. Specifically, in this case, in which as you mentioned only check_logcdf requires the xfail, while logp doesn't, this piece of info should probably be available directly by reading the code. This is just my point of view though. So, the example you shared would become

def test_wald_scipy_logp(self):
    self.check_logp(
        Wald,
        Rplus,
        {"mu": Rplus, "alpha": Rplus},
        lambda value, mu, alpha: sp.invgauss.logpdf(value, mu=mu, loc=alpha),
        decimal=select_by_precision(float64=6, float32=1),
    )

@pytest.mark.xfail(
    condition=(theano.config.floatX == "float32"),
    reason="Poor CDF in SciPy. See scipy/scipy#869 for details.",
)
def test_wald_scipy_logcdf(self):
    self.check_logcdf(
        Wald,
        Rplus,
        {"mu": Rplus, "alpha": Rplus},
        lambda value, mu, alpha: sp.invgauss.logcdf(value, mu=mu, loc=alpha),
    )

An alternative approach could be wrapping each subfunction in a try/except, catch the errors and append them to a list and then raise if the list is non-empty and show its content with the assert. As per the accepted SO answer
https://stackoverflow.com/questions/39896716/can-i-perform-multiple-assertions-in-pytest

Let me know what seems more sensible to you (and whether anything is not clear)

ricardoV94 · 2021-02-28T16:30:10Z

Thanks for the analysis. I think splitting each function within a common xfail is the best solution. We can also check git history to figure out which ones might not have needed it as in my example above (we can always put them under a new xfail if they also fail).

matteo-pallini · 2021-03-02T21:18:11Z

@ricardoV94 The PR should be ready for review. Would you like to review it?
#4497

ricardoV94 changed the title ~~@pytest.mark.xfail is being applied to broadly~~ @pytest.mark.xfail is being applied too broadly Jan 20, 2021

michaelosthege added this to the vNext (3.11.1) milestone Jan 22, 2021

ricardoV94 added the beginner friendly label Jan 22, 2021

MarcoGorelli added the tests label Jan 24, 2021

michaelosthege modified the milestones: vNext (3.11.1), 4.0.0 Jan 28, 2021

matteo-pallini mentioned this issue Feb 24, 2021

Broken PyMC's Jupyter Notebook Style link in docs #4485

Closed

ricardoV94 assigned matteo-pallini Feb 25, 2021

matteo-pallini mentioned this issue Mar 2, 2021

Break down tests having multiple checks and xfail decorated #4497

Merged

2 tasks

ricardoV94 closed this as completed in #4497 Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

@pytest.mark.xfail is being applied too broadly #4425

@pytest.mark.xfail is being applied too broadly #4425

ricardoV94 commented Jan 20, 2021 •

edited

Loading

matteo-pallini commented Feb 25, 2021

ricardoV94 commented Feb 25, 2021

matteo-pallini commented Feb 27, 2021 •

edited

Loading

ricardoV94 commented Feb 28, 2021

matteo-pallini commented Mar 2, 2021

@pytest.mark.xfail is being applied too broadly #4425

@pytest.mark.xfail is being applied too broadly #4425

Comments

ricardoV94 commented Jan 20, 2021 • edited Loading

matteo-pallini commented Feb 25, 2021

ricardoV94 commented Feb 25, 2021

matteo-pallini commented Feb 27, 2021 • edited Loading

Proposed solution

ricardoV94 commented Feb 28, 2021

matteo-pallini commented Mar 2, 2021

ricardoV94 commented Jan 20, 2021 •

edited

Loading

matteo-pallini commented Feb 27, 2021 •

edited

Loading