-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace slow dist_math::incomplete_beta with aesara op #4519
Conversation
df52381
to
d15d752
Compare
not sure if this should be merged into master or into v4 🤔 cc @brandonwillard @twiecki @michaelosthege |
It shouldn't be difficult to rebase |
Strange failure on 32-bit run, for the logcdf of the beta. E AssertionError:
E Arrays are not almost equal to 3 decimals
E {'alpha': array(100., dtype=float32), 'beta': array(0.1, dtype=float32), 'value': array(0.001, dtype=float32)}
E x and y -inf location mismatch:
E x: array(-inf, dtype=float32)
E y: array(-697.172) I don't understand why the aesara op would underflow but not the scipy one. It seems scipy is basically calling betainc and then taking the logarithm, which is exactly the same thing we are doing: https://github.com/scipy/scipy/blob/master/scipy/stats/_continuous_distns.py#L610-L611 Afaik the scipy should also underflow to 0 and then -inf in a float-32 environment: np.log(st.beta(100.0, 0.1).cdf(0.001).astype('float32'))
<ipython-input-32-b9435d6250d7>:1: RuntimeWarning: divide by zero encountered in log
np.log(st.beta(100.0, 0.1).cdf(0.001).astype('float32'))
Out[32]: -inf |
@ricardoV94 this error could indicate that
|
I was probably unclear. The error appeared in the 32bit GH run: https://github.com/pymc-devs/pymc3/runs/2075254035?check_suite_focus=true AFAIK both should have underflowed. |
In terms of V3 / V4, I think it's fine if this only appears in the V4 branch. I can toggle this as a draft until we are ready for it. |
6ea14a8
to
780bd6e
Compare
Waiting a bit on this one following some discussion on the STAN side to make sure these gradient approximations are good enough: stan-dev/math#2417 |
This value was not representative of its name.
…d basic dists These changes can be summarized as follows: - `Model` objects now track fully functional Theano graphs that represent all relationships between random and "deterministic" variables. These graphs are called these "sample-space" graphs. `Model.unobserved_RVs`, `Model.basic_RVs`, `Model.free_RVs`, and `Model.observed_RVs` contain these graphs (i.e. `TensorVariable`s), which are generated by `RandomVariable` `Op`s. - For each random variable, there is now a corresponding "measure-space" variable (i.e. a `TensorVariable` that corresponds to said variable in a log-likelihood graph). These variables are available as `rv_var.tag.value_var`, for each random variable `rv_var`, or via `Model.vars`. - Log-likelihood (i.e. measure-space) graphs are now created for individual random variables by way of the generic functions `logpt`, `logcdf`, `logp_nojac`, and `logpt_sum` in `pymc3.distributions`. - Numerous uses of concrete shape information stemming from `Model` objects (e.g. `Model.size`) have been removed/refactored. - Use of `FreeRV`, `ObservedRV`, `MultiObservedRV`, and `TransformedRV` has been deprecated. The information previously stored in these classes is now tracked using `TensorVariable.tag`, and log-likelihoods are generated using the aforementioned `log*` generic functions.
This commit changes `DictToArrayBijection` so that it returns a `RaveledVars` datatype that contains the original raveled and concatenated vector along with the information needed to revert it back to dictionay/variables form. Simply put, the variables-to-single-vector mapping steps have been pushed away from the model object and its symbolic terms and closer to the (sampling) processes that produce and work with `ndarray` values for said terms. In doing so, we can operate under fewer unnecessarily strong assumptions (e.g. that the shapes of each term are static and equal to the initial test points), and let the sampling processes that require vector-only steps deal with any changes in the mappings.
The approach currently being used is rather inefficient. Instead, we should change the `size` parameters for `RandomVariable` terms in the sample-space graph(s) so that they match arrays of the inputs in the trace and the desired number of output samples. This would allow the compiled graph to vectorize operations (when it can) and sample variables more efficiently in large batches.
Classes and functions removed: - PyMC3Variable - ObservedRV - FreeRV - MultiObservedRV - TransformedRV - ArrayOrdering - VarMap - DataMap - _DrawValuesContext - _DrawValuesContextBlocker - is_fast_drawable - _compile_theano_function - vectorize_theano_function - get_vectorize_signature - _draw_value - draw_values - generate_samples - fast_sample_posterior_predictive Modules removed: - pymc3.distributions.posterior_predictive - pymc3.tests.test_random
* refactor Moyal distribution * fixed ndim_params on moyal Co-authored-by: Farhan Reynaldo <[email protected]>
* refactor kumaraswamy * add logcdf method Co-authored-by: Farhan Reynaldo <[email protected]> Co-authored-by: Ricardo <[email protected]>
* refactor Skewnormal and Rice distribution * add test for rice b and skewnorm tau params * change default parameter to b * Add float32 xfail to `test_rice` Co-authored-by: Farhan Reynaldo <[email protected]> Co-authored-by: Ricardo <[email protected]>
* Refactor Flat and HalfFlat distributions * Re-enable Gumbel logp test * Remove redundant test
…nverseGamma` Closes pymc-devs#4467
This PR replaces the
incomplete_beta
method in dist_math by an aesara opbetainc
that wraps thescipy.special.betainc
method.This provides a large increase in speed (and graph compilation) even without providing the c_code implementation directly (this can always be added later). It also allows the evaluation of the logcdf for multiple values in
Beta
,StudentT
,Binomial
,NegativeBinomial
(and zero inflated variants). I removed the previous limitation warning when trying to evaluate these methods for multiple values. These changes are critical for any practical implementation of Truncated Versions of these distributions.Gradients were adapted from the STAN implementation, and require the use of two extra "pseudo aesara ops" that simply wrap the equivalent python functions. This does not allow for further graph optimizations, but I doubt aesara could do much with the previous scan implementation either. Let me know if you think there is a better solution.
Depending on what your PR does, here are a few things you might want to address in the description:
Closes #4420