Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@pytest.mark.xfail is being applied too broadly #4425

Closed
ricardoV94 opened this issue Jan 20, 2021 · 5 comments · Fixed by #4497
Closed

@pytest.mark.xfail is being applied too broadly #4425

ricardoV94 opened this issue Jan 20, 2021 · 5 comments · Fixed by #4497

Comments

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 20, 2021

When working in https://github.com/pymc-devs/pymc3/blob/master/pymc3/tests/test_distributions.py in #4421 I noticed that several of the @pytest.mark.xfail are being applied to more than one function at a time, usually to the pymc3_matches_scipy and the check_logcdf (and also entirely due to my fault in #4393, the new check_selfconsistency_discrete_logcdf).

I don't know if the xfail is clever enough to evaluate each subfunction separately or if it is enough if one fails. Even if pytest counts each (surprising) successful subfunction as an additional xpass, I don't think people pay much attention to these and we may not notice if their behavior changes in the future.

Here is one concrete example:

https://github.com/pymc-devs/pymc3/blob/6360b005fc610d0505f84885743215a3e09f046e/pymc3/tests/test_distributions.py#L1258-L1278

From this PR https://github.com/pymc-devs/pymc3/pull/3944/files

It seems that xfail was added to the test_inverse_gamma when the check_logcdf method was added, and it was not necessary when only the pymc3_matches_scipy was being tested. If the behavior of the logp of the Gamma changes in the future (albheit in a contrived scenario that only fails in float32), we would miss it.

@ricardoV94 ricardoV94 changed the title @pytest.mark.xfail is being applied to broadly @pytest.mark.xfail is being applied too broadly Jan 20, 2021
@michaelosthege michaelosthege added this to the vNext (3.11.1) milestone Jan 22, 2021
@michaelosthege michaelosthege modified the milestones: vNext (3.11.1), 4.0.0 Jan 28, 2021
@matteo-pallini
Copy link
Contributor

I would love to work on this one, if still free.

@ricardoV94
Copy link
Member Author

Go ahead :) Let us know if you need anything.

@matteo-pallini
Copy link
Contributor

matteo-pallini commented Feb 27, 2021

It seems like xfail will treat the subsequent test function as a single tested unit. ie it won't be clever enough to consider each subfunction individually and add its result to the final summary. So, effectively for the example you shared we will have 1 xfail in both cases when only one of check_logcdf and check_logp fails and when both fail. To generalize a single failing subfunction should be enough to mark that test as xfail and that would be it.

Below a toy example I used to reach these conclusions and its relative pytest summary

import pytest


class TestTwoFailures:
    CONDITION = True

    def foo(self):
        assert (1 == 2)

    def bar(self):
        assert False

    @pytest.mark.xfail(
        condition=CONDITION, reason="test"
    )
    def test_main(self):
        self.foo()
        self.bar()


class TestOneFailure:
    CONDITION = True

    def foo(self):
        assert (1 == 1)

    def bar(self):
        assert False

    @pytest.mark.xfail(
        condition=CONDITION, reason="test"
    )
    def test_main(self):
        self.foo()
        self.bar()


class TestNoFailure:
    CONDITION = True

    def foo(self):
        assert (1 == 1)

    def bar(self):
        assert True

    @pytest.mark.xfail(
        condition=CONDITION, reason="test"
    )
    def test_main(self):
        self.foo()
        self.bar()
(pymc3-dev-py37) ➜  pymc3 git:(master) ✗ pytest -rxXs pymc3/tests/foo.py
==================================== test session starts =====================================
platform linux -- Python 3.7.10, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /home/matteo/personal/pymc3, configfile: pytest.ini
plugins: cov-2.11.1
collected 3 items                                                                            

pymc3/tests/foo.py xxX                                                                 [100%]

================================== short test summary info ===================================
XFAIL pymc3/tests/foo.py::TestTwoFailures::test_main
  test
XFAIL pymc3/tests/foo.py::TestOneFailure::test_main
  test
XPASS pymc3/tests/foo.py::TestNoFailure::test_main test
=============================== 2 xfailed, 1 xpassed in 0.14s ================================

Proposed solution

I personally think that splitting the test in one test per subfunction is the cleanest solution. Specifically, in this case, in which as you mentioned only check_logcdf requires the xfail, while logp doesn't, this piece of info should probably be available directly by reading the code. This is just my point of view though. So, the example you shared would become

def test_wald_scipy_logp(self):
    self.check_logp(
        Wald,
        Rplus,
        {"mu": Rplus, "alpha": Rplus},
        lambda value, mu, alpha: sp.invgauss.logpdf(value, mu=mu, loc=alpha),
        decimal=select_by_precision(float64=6, float32=1),
    )

@pytest.mark.xfail(
    condition=(theano.config.floatX == "float32"),
    reason="Poor CDF in SciPy. See scipy/scipy#869 for details.",
)
def test_wald_scipy_logcdf(self):
    self.check_logcdf(
        Wald,
        Rplus,
        {"mu": Rplus, "alpha": Rplus},
        lambda value, mu, alpha: sp.invgauss.logcdf(value, mu=mu, loc=alpha),
    )

An alternative approach could be wrapping each subfunction in a try/except, catch the errors and append them to a list and then raise if the list is non-empty and show its content with the assert. As per the accepted SO answer
https://stackoverflow.com/questions/39896716/can-i-perform-multiple-assertions-in-pytest

Let me know what seems more sensible to you (and whether anything is not clear)

@ricardoV94
Copy link
Member Author

Thanks for the analysis. I think splitting each function within a common xfail is the best solution. We can also check git history to figure out which ones might not have needed it as in my example above (we can always put them under a new xfail if they also fail).

@matteo-pallini
Copy link
Contributor

@ricardoV94 The PR should be ready for review. Would you like to review it?
#4497

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants