Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase logcdf coverage for invalid parameter values #4421

Merged
merged 6 commits into from
Jan 20, 2021

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Jan 17, 2021

This PR extends the work started in the #4393 PR, and addresses the issue that most logcdf methods for continuous distributions were not checking for invalid parameteres (closes #4399 issue).

The main change is that test_distributions.py::check_logcdf method now tests that distributions return -inf when given invalid parameters (i.e., one unit below or above a finite domain edge). This is done for one invalid parameter at a time.

This extended test detected failures in all the continuous distribution logcdf methods mentioned in #4399 (and some more), as well as an issue in the discrete BetaBinomial distribution that I had missed before. The solution to most cases was to simply wrap the returned result in a Bound.

For the Uniform, Triangular and DiscreteUniform distributions it does not make sense to test values outside of the "Domain edges" since these are set arbitrarily (so that the upper and lower parameters do not clash). I added an optional argument skip_paramdomain_outside_edge_test that skips this step for these distributions.

The ExGaussian distribution was not being tested with check_logcdf probably because there is no Scipy implementation. To have it covered in the new tests I added the optional argument skip_paramdomain_inside_edge_test which skips the pymc3 vs scipy comparison for valid parameters / values (i.e., the default behavior of check_logcdf).

Since this PR and #4393 added considerable complexity to the check_logcdf method, I went ahead and documented it as best as I could. This will hopefully make it easier to maintain in the future.

Two issues remain:

Gamma / InverseGamma traceback
test_distributions.py::TestMatchesScipy::test_gamma Fatal Python error: Aborted

Current thread 0x00007f54df052740 (most recent call first):
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/theano/link/c/basic.py", line 1829 in __call__
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/theano/gof/op.py", line 883 in rval
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/theano/gof/op.py", line 109 in compute_test_value
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/theano/gof/op.py", line 640 in __call__
  File "/home/ricardo/Documents/Projects/pymc3/pymc3/distributions/continuous.py", line 2625 in logcdf
  File "/home/ricardo/Documents/Projects/pymc3/pymc3/tests/test_distributions.py", line 663 in check_logcdf
  File "/home/ricardo/Documents/Projects/pymc3/pymc3/tests/test_distributions.py", line 1155 in test_gamma
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/python.py", line 184 in pytest_pyfunc_call
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/python.py", line 1627 in runtest
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/runner.py", line 163 in pytest_runtest_call
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/runner.py", line 256 in <lambda>
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/runner.py", line 310 in from_call
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/runner.py", line 255 in call_runtest_hook
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/runner.py", line 216 in call_and_report
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/runner.py", line 127 in runtestprotocol
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/runner.py", line 110 in pytest_runtest_protocol
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/main.py", line 338 in pytest_runtestloop
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/main.py", line 313 in _main
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/main.py", line 257 in wrap_session
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/main.py", line 306 in pytest_cmdline_main
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 84 in <lambda>
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in __call__
  File "/home/ricardo/Documents/Projects/pymc3-venv/lib/python3.8/site-packages/_pytest/config/__init__.py", line 164 in main
  File "/opt/pycharm-2020.2.3/plugins/python/helpers/pycharm/_jb_pytest_runner.py", line 43 in <module>

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Follow-up question: Do you think it would be helpful to add these new checks to pymc3_matches_scipy (the equivalent test for logp methods)?

@ricardoV94 ricardoV94 marked this pull request as draft January 17, 2021 15:06
@ricardoV94 ricardoV94 marked this pull request as ready for review January 17, 2021 15:06
@ricardoV94 ricardoV94 force-pushed the increase_logcdf_coverage_2 branch from 7cd42b5 to 83014e8 Compare January 20, 2021 08:05
Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great docstring for the helper test function!

But I can't review the parts in continuous.py they touch some very important distributions and I'm not into the details enough.

@@ -807,7 +885,8 @@ def test_chi_squared(self):
lambda value, nu: sp.chi2.logpdf(value, df=nu),
)

@pytest.mark.xfail(reason="Poor CDF in SciPy. See scipy/scipy#869 for details.")
# TODO: Is this still needed? It passes locally.
# @pytest.mark.xfail(reason="Poor CDF in SciPy. See scipy/scipy#869 for details.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, an XFAIL should error if the test does NOT fail.
In any case, these commented lines must be dealt with before merging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably misinterpreted something somewhere, but this test was not failing (nor failing because it was not failing) on my machine. Anyway, it should soon be cleared up when the tests conclude in here.

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jan 20, 2021

But I can't review the parts in continuous.py they touch some very important distributions and I'm not into the details enough.

Fair enough! In most cases I simply had to borrow the same bounds as in the logp methods, except for the bound on the input value, since for cdfs the values beyond the supported limit should evaluate to log(1)=0 and not -inf, provided the other distribution parameters are valid.

@ricardoV94 ricardoV94 force-pushed the increase_logcdf_coverage_2 branch from 2596855 to 9c5aa05 Compare January 20, 2021 11:37
@ricardoV94
Copy link
Member Author

Unrelated failing test.

I will wait for reviews before forcing the tests to rerun. The last commit was just a change in the docstrings, and the tests before it were fine.

Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is all good.
The bound checks should improve the user experience.

@twiecki
Copy link
Member

twiecki commented Jan 20, 2021

=================================== FAILURES ===================================
__________________ TestMvNormal.test_with_cov_rv[1-(3,)-(1,)] __________________

self = <pymc3.tests.test_distributions_random.TestMvNormal object at 0x7fd3cdccd7d0>
sample_shape = 1, dist_shape = (3,), mu_shape = (1,)

    @pytest.mark.parametrize(
        ["sample_shape", "dist_shape", "mu_shape"],
        generate_shapes(include_params=False),
        ids=str,
    )
    def test_with_cov_rv(self, sample_shape, dist_shape, mu_shape):
        with pm.Model() as model:
            mu = pm.Normal("mu", 0.0, 1.0, shape=mu_shape)
            sd_dist = pm.Exponential.dist(1.0, shape=3)
            chol, corr, stds = pm.LKJCholeskyCov(
                "chol_cov", n=3, eta=2, sd_dist=sd_dist, compute_corr=True
            )
            mv = pm.MvNormal("mv", mu, cov=pm.math.dot(chol, chol.T), shape=dist_shape)
>           prior = pm.sample_prior_predictive(samples=sample_shape)

pymc3/tests/test_distributions_random.py:1760: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pymc3/sampling.py:1942: in sample_prior_predictive
    values = draw_values([model[name] for name in names], size=samples)
pymc3/distributions/distribution.py:791: in draw_values
    value = _draw_value(next_, point=point, givens=temp_givens, size=size)
pymc3/distributions/distribution.py:955: in _draw_value
    return param.random(point=point, size=size)
pymc3/model.py:106: in __call__
    return getattr(self.obj, self.method_name)(*args, **kwargs)
pymc3/distributions/multivariate.py:265: in random
    mu, param = draw_values([self.mu, param_attribute], point=point, size=size)
pymc3/distributions/distribution.py:834: in draw_values
    value = _draw_value(param, point=point, givens=givens.values(), size=size)
pymc3/distributions/distribution.py:1006: in _draw_value
    output = func(*input_vals)
/usr/share/miniconda/envs/pymc3-dev-py37/lib/python3.7/site-packages/numpy/lib/function_base.py:2108: in __call__
    return self._vectorize_call(func=func, args=vargs)
/usr/share/miniconda/envs/pymc3-dev-py37/lib/python3.7/site-packages/numpy/lib/function_base.py:2182: in _vectorize_call
    res = self._vectorize_call_with_signature(func, args)
/usr/share/miniconda/envs/pymc3-dev-py37/lib/python3.7/site-packages/numpy/lib/function_base.py:2212: in _vectorize_call_with_signature
    args, input_core_dims)
/usr/share/miniconda/envs/pymc3-dev-py37/lib/python3.7/site-packages/numpy/lib/function_base.py:1873: in _parse_input_dimensions
    _update_dim_sizes(dim_sizes, arg, core_dims)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

dim_sizes = {}, arg = array(0., dtype=float32), core_dims = ('i_0_0',)

    def _update_dim_sizes(dim_sizes, arg, core_dims):
        """
        Incrementally check and update core dimension sizes for a single argument.
    
        Arguments
        ---------
        dim_sizes : Dict[str, int]
            Sizes of existing core dimensions. Will be updated in-place.
        arg : ndarray
            Argument to examine.
        core_dims : Tuple[str, ...]
            Core dimensions for this argument.
        """
        if not core_dims:
            return
    
        num_core_dims = len(core_dims)
        if arg.ndim < num_core_dims:
            raise ValueError(
                '%d-dimensional argument does not have enough '
                'dimensions for all core dimensions %r'
>               % (arg.ndim, core_dims))
E           ValueError: 0-dimensional argument does not have enough dimensions for all core dimensions ('i_0_0',)

/usr/share/miniconda/envs/pymc3-dev-py37/lib/python3.7/site-packages/numpy/lib/function_base.py:1839: ValueError
=============================== warnings summary =============================

@twiecki
Copy link
Member

twiecki commented Jan 20, 2021

let's open an issue about that then.

@ricardoV94 what about the unchecked boxes above?

@ricardoV94
Copy link
Member Author

@ricardoV94 what about the unchecked boxes above?

Basically it seems that in the pytest context we are still triggering those annoying C-level assections in the Theano Gamma functions mentioned in aesara-devs/aesara#224

# In pytest context

pymc3_dist = pm.Gamma
valid_value = array(0.01)
test_params = {'alpha': -1.0, 'beta': array(0.5)}
invalid_dist = pymc3_dist.dist(**test_params)

assert_equal(
    invalid_dist.logcdf(valid_value).tag.test_value,  # This triggers the C level assertion. Is it evaluating greedily or using cached results?
    -np.inf,
    err_msg=str(test_params),
)
 double GammaP(double, double): Assertion `(n > 0) && (x >= 0)' failed.

Actually it seems that simply calling logcdf is enough to trigger the exception

invalid_dist.logcdf(valid_value)

This previous PR #4393 was supposed to avoid this by using tt.switch before trying to call the gamma function, but somehow it is not working in the pytest context. As expected it is still avoiding the C level assertion in a normal user context:

# in normal iPython context
invalid_dist.logcdf(valid_value).eval()
array(-inf)

Unless someone has a good clue at what is going on, I would suggest to just keep those tests disabled, as nothing was changed about those functions. I cannot use pytest.mark.xfail, because the entire python process exits when it hits that condition. If people agree I will just leave a more informative comment for why those tests are being skipped. Once the Theano gamma.c code is fixed to return nan instead of asserting invalid value this test issue should go away.

@michaelosthege
Copy link
Member

@ricardoV94 helpful comment & waiting for aesara-devs/aesara#224 sounds good.

@ricardoV94 ricardoV94 force-pushed the increase_logcdf_coverage_2 branch from 723d8d4 to 8358a97 Compare January 20, 2021 16:01
Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ricardoV94 merge?

@ricardoV94
Copy link
Member Author

Yes (I don't know how to)

@AlexAndorra
Copy link
Contributor

Just click on the big green button below 👇 Stay on the "Squash and merge" option, and add that it "Closes #4399" in the commit message -- and then celebrate 🍾

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jan 20, 2021

Haha maybe next time!

Edit: I thought someone had done it already! Done

@ricardoV94 ricardoV94 merged commit 89916ad into pymc-devs:master Jan 20, 2021
@ricardoV94 ricardoV94 deleted the increase_logcdf_coverage_2 branch January 21, 2021 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logcdf methods of several distributions do not check for invalid parameters
5 participants