-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@pytest.mark.xfail is being applied too broadly #4425
Comments
I would love to work on this one, if still free. |
Go ahead :) Let us know if you need anything. |
It seems like Below a toy example I used to reach these conclusions and its relative pytest summary import pytest
class TestTwoFailures:
CONDITION = True
def foo(self):
assert (1 == 2)
def bar(self):
assert False
@pytest.mark.xfail(
condition=CONDITION, reason="test"
)
def test_main(self):
self.foo()
self.bar()
class TestOneFailure:
CONDITION = True
def foo(self):
assert (1 == 1)
def bar(self):
assert False
@pytest.mark.xfail(
condition=CONDITION, reason="test"
)
def test_main(self):
self.foo()
self.bar()
class TestNoFailure:
CONDITION = True
def foo(self):
assert (1 == 1)
def bar(self):
assert True
@pytest.mark.xfail(
condition=CONDITION, reason="test"
)
def test_main(self):
self.foo()
self.bar()
Proposed solutionI personally think that splitting the test in one test per subfunction is the cleanest solution. Specifically, in this case, in which as you mentioned only def test_wald_scipy_logp(self):
self.check_logp(
Wald,
Rplus,
{"mu": Rplus, "alpha": Rplus},
lambda value, mu, alpha: sp.invgauss.logpdf(value, mu=mu, loc=alpha),
decimal=select_by_precision(float64=6, float32=1),
)
@pytest.mark.xfail(
condition=(theano.config.floatX == "float32"),
reason="Poor CDF in SciPy. See scipy/scipy#869 for details.",
)
def test_wald_scipy_logcdf(self):
self.check_logcdf(
Wald,
Rplus,
{"mu": Rplus, "alpha": Rplus},
lambda value, mu, alpha: sp.invgauss.logcdf(value, mu=mu, loc=alpha),
) An alternative approach could be wrapping each subfunction in a try/except, catch the errors and append them to a list and then raise if the list is non-empty and show its content with the assert. As per the accepted SO answer Let me know what seems more sensible to you (and whether anything is not clear) |
Thanks for the analysis. I think splitting each function within a common xfail is the best solution. We can also check git history to figure out which ones might not have needed it as in my example above (we can always put them under a new xfail if they also fail). |
@ricardoV94 The PR should be ready for review. Would you like to review it? |
When working in https://github.com/pymc-devs/pymc3/blob/master/pymc3/tests/test_distributions.py in #4421 I noticed that several of the
@pytest.mark.xfail
are being applied to more than one function at a time, usually to thepymc3_matches_scipy
and thecheck_logcdf
(and also entirely due to my fault in #4393, the newcheck_selfconsistency_discrete_logcdf
).I don't know if the
xfail
is clever enough to evaluate each subfunction separately or if it is enough if one fails. Even if pytest counts each (surprising) successful subfunction as an additionalxpass
, I don't think people pay much attention to these and we may not notice if their behavior changes in the future.Here is one concrete example:
https://github.com/pymc-devs/pymc3/blob/6360b005fc610d0505f84885743215a3e09f046e/pymc3/tests/test_distributions.py#L1258-L1278
From this PR https://github.com/pymc-devs/pymc3/pull/3944/files
It seems that xfail was added to the
test_inverse_gamma
when thecheck_logcdf
method was added, and it was not necessary when only thepymc3_matches_scipy
was being tested. If the behavior of the logp of the Gamma changes in the future (albheit in a contrived scenario that only fails in float32), we would miss it.The text was updated successfully, but these errors were encountered: