Refactor Mixture distribution for V4 #5438

ricardoV94 · 2022-02-01T15:51:43Z

This PR is an attempt to refactor (Marginalized) Mixture distributions for V4

Refactor Mixture with iterable components
Refactor Mixture with single component
Refactor NormalMixture
Rebase from main and handle new meaning of size for multivariate dists
Reenable TestMixtureVsLatent
Deprecate MixtureSameFamily (keep tests)
Add more design decision context to refactor commit
Add moment and tests
Allow for Nested Mixtures / Other symbolic distributions as components (reminder Reimplement nested Mixtures #5533)

Changes

There are two big changes in how Mixture works compared to V3:

The support dimensionality of the Mixture no longer depends on the size of the weights, but on the support dimensionality of the components. This means one cannot use a batched scalar distribution (like Normal) as if it was a vector distribution, which was supposedly allowed by the docstrings. However, while the old random method respected this, the logp method did not (the logp would be indifferent to mixing of values across this "fake" vector components), so this is not clearly a regression.

# This mixture now assumes interchangeability across the 5 values of each component, 
# regardless of the dimensionality of the weights
mix = pm.Mixture.dist(w=[0.5, 0.5], comp_dists=pm.Normal.dist([-10, 10], size=(5, 2)))

I am exploring adding a keyword argument to override the support dimensionality in a follow-up PR. I managed to do this for the random method, but haven't had time yet to figure out what needs to be changed in the logp method.

Nested Mixtures or Mixtures using non pure RandomVariables (such as Censored variables) are not yet possible. This will require creating a dispatch to retrieve and manipulate these variables (i.e., check ndim_supp, resize), in the same way that's being done with the component RandomVariables. This should be straightforward, since these had to be implemented as methods of the SymbolicDistribution anyway...

Closes #4781

codecov · 2022-02-01T16:00:00Z

Codecov Report

Merging #5438 (0e442d1) into main (afe210a) will increase coverage by 0.86%.
The diff coverage is 95.30%.

@@            Coverage Diff             @@
##             main    #5438      +/-   ##
==========================================
+ Coverage   87.29%   88.15%   +0.86%     
==========================================
  Files          81       81              
  Lines       14247    14238       -9     
==========================================
+ Hits        12437    12552     +115     
+ Misses       1810     1686     -124

Impacted Files	Coverage Δ
pymc/distributions/distribution.py	`90.80% <ø> (ø)`
pymc/distributions/mixture.py	`93.16% <95.27%> (+71.99%)`	⬆️
pymc/distributions/__init__.py	`100.00% <100.00%> (ø)`
pymc/distributions/multivariate.py	`92.30% <0.00%> (+0.11%)`	⬆️

ricardoV94 · 2022-02-04T09:58:20Z

@Sayam753 could you give some context on what was the idea with MixtureSameFamily?

Now that we have meta information about the components (ndim_supp, ndims_params), it is perhaps no longer needed?

Sayam753 · 2022-02-05T09:43:01Z

could you give some context on what was the idea with MixtureSameFamily?

MixtureSameFamily distribution helps to create mixture distribution for multivariate distributions.

The legacy Mixture distribution assumes that the mixture components are present in the last dimension of a distribution. This creates a problem for multivariate distributions, because in this case, the last dimension corresponds to events, not the mixture components.

MixtureSameFamily takes an input an integer asking which axis to consider for mixture components. And it reduces that axis during logp computations.

@lucianopaz is the magician behind this distribution.

ricardoV94 · 2022-02-05T10:00:13Z

If I understand correctly, I think we can infer the correct mixture axis from the meta information we have now.

That's my TODO point above. So far, I have just been hacking it by trial and error.

Do you think there is inherent ambiguity still? Even considering ndim_supp, ndims_params, shape_from_params... and whatever else methods we have to reason about shapes of RandomVariables?

pymc/distributions/mixture.py

ricardoV94 · 2022-02-21T17:10:47Z

Figured out the weights shape padding/broadcasting thanks tho @lucianopaz!

pymc/distributions/mixture.py

ricardoV94 · 2022-02-27T23:03:11Z

pymc/tests/test_mixture.py

-            # Expected to fail if comp_shape is not provided,
-            # nd is multidim and it does not broadcast with ncomp. If by chance
-            # it does broadcast, an error is raised if the mixture is given
-            # observed data.
-            # Furthermore, the Mixture will also raise errors when the observed
-            # data is multidimensional but it does not broadcast well with
-            # comp_dists.


This no longer seems to be a problem. @lucianopaz can you confirm that if the current tests pass, this is indeed fine?

ricardoV94 · 2022-03-03T10:30:15Z

@OriolAbril Any idea why the tensor_like here does not have a link to the glossary? https://pymc--5438.org.readthedocs.build/en/5438/api/distributions/generated/pymc.NormalMixture.html

OriolAbril · 2022-03-03T13:11:06Z

There should be spaces both before and after the colon that separates parameter name and type

ricardoV94 · 2022-03-07T16:11:11Z

Moments are also ready thanks to @larryshamalama. This is ready for review and merge 🚀

pymc/distributions/mixture.py

larryshamalama

Looks great 🙂 Thanks for leading the effort @ricardoV94

Mixtures now use an `OpFromGraph` that encapsulates the Aesara random method. This is used so that logp can be easily dispatched to the distribution without requiring involved pattern matching. The Mixture random and logp methods now fully respect the support dimensionality of its components, whereas previously only the logp method did, leading to inconsistencies between the two methods. In the case where the weights (or size) indicate the need for more draws than what is given by the component distributions, the latter are resized to ensure there are no repeated draws. This refactoring forces Mixture components to be basic RandomVariables, meaning that nested Mixtures or Mixtures of Symbolic distributions (like Censored) are not currently possible. Co-authored-by: Larry Dong <[email protected]>

* Emphasize equivalency between iterable of components and single batched component * Add example with mixture of two distinct distributions * Add example with multivariate components

The two tests relied on implicit behavior of V3, where the dimensionality of the weights implied the support dimension of mixture distribution. This, however, led to inconsistent behavior between the random method and the logp, as the latter did not enforce this assumption, and did not distinguish if values were mixed across the implied support dimension. In this refactoring, the support dimensionality of the component variables determines the dimensionality of the mixture distribution, regardless of the weights. This leads to consistent behavior between the random and logp methods as asserted by the new checks. Future work will explore allowing the user to specify an artificial support dimensionality that is higher than the one implied by the component distributions, but this is for now not possible.

Behavior is now implemented in Mixture

ricardoV94 requested review from brandonwillard, lucianopaz and kc611 February 1, 2022 15:51

ricardoV94 mentioned this pull request Feb 1, 2022

Implement Marginalized RVs aesara-devs/aeppl#118

Closed

ricardoV94 force-pushed the mixtures branch 6 times, most recently from ce37fd4 to a349d35 Compare February 4, 2022 09:28

ricardoV94 commented Feb 5, 2022

View reviewed changes