Refactor pm.Simulator (WIP) #4802

ricardoV94 · 2021-06-24T16:36:36Z

This is a proposal on how to pm.Simulator could be refactored into V4. It provides a helper method to create a SimulatorRV which tries to behave as a typical RV for prior and posterior predictive sampling, and whose logp is distance(epsilon, sum_stat(value), sim_stat(sim_rv)) where sim_rv is just another reinstantiation of the original SimulatorRV.

The advantages is that we don't need special logic for the ABC kernel in sample_smc as it works just like a normal variable. It can also be used in some conventional samplers (at least it seems to work with Metropolis MCMC). It does away with the limitation of having a single pm.Simulator and so on.

Two other options would be to:

Separate the Simulator object from a pm.Distance likelihood term which may be more transparent / easier to think for the user. The mixing of the two doesn't make a lot of sense if we think of Simulator as a true RV (for example if we use the mean summary statistic, then the logp shape is completely different than the samples shape). It may also make it easier to manipulate parameters like the epsilon by the sampler if we separate the two. Perhaps distance should even be completely the responsibility of the sampler kernel.
Keep most of the unique logic from v3, which keeps (for better or worse) pm.Simulator and ABC on their very specific corner in the library.

CC: @aloctavodia @junpenglao

michaelosthege · 2021-06-27T11:32:29Z

pymc3/distributions/simulator.py

-            if sum_stat != "identity":
-                _log.info(f"Automatically setting sum_stat to identity as expected by {distance}")
-                sum_stat = "identity"
+            raise NotImplementedError("KL not refactored yet")


Would be great to link a Github issue right here.

michaelosthege · 2021-06-27T11:39:36Z

pymc3/distributions/simulator.py

+        rv_type = type(sim_op)
+
+        @_logp.register(rv_type)
+        def logp(op, sim_rv, rvs_to_values, *sim_params, **kwargs):


locally defined functions often cause problems with pickling.

Also it looks like this logp uses variables from the __new__ scope. Doesn't this, in combination with the register lead to problems when having more than one Simulator?

I am not sure about pickling. Depends on the point at which the function is used (after obtaining the logp graph, the function is not needed anymore).

The registration part should be fine because I am creating a new type which has a unique name. I also have a test for multiple Simulators with different methods and it seems to be fine.

I can also try to add a layer of indirection when creating the simulator so that I can attach what are now the local variables to the tag, in which case a single logp function / dispatcher would be enough.

pymc3/smc/smc.py

pymc3/tests/test_smc.py

michaelosthege · 2021-06-27T11:43:22Z

pymc3/tests/test_smc.py

@@ -144,7 +147,7 @@ def abs_diff(eps, obs_data, sim_data):
            )

        with pm.Model() as self.SMABC_potential:
-            a = pm.Normal("a", mu=0, sigma=1)
+            a = pm.Normal("a", mu=0, sigma=1, initval=0.5)


Why was the initval added? Maybe this indicates an underlying problem?

pymc3/tests/test_smc.py

ricardoV94 · 2021-06-27T15:45:29Z

@michaelosthege thanks for the pointers. This was not meant for implementation review yet, just proof of concept but some of the points you raise are already useful.

aloctavodia

I really like where this is going!

I am not sure about separating the Simulator from the Distance, they both together define a pseudolikelihood. So it seems reasonable to have them as parts of the same object. For the same reason making the distance part of the sampling method is weird. Actually. I think the first version of SMC-ABC was like that. We can of course discuss the benefits of trying to automatically tune epsilon, nut I think having to define a single value of epsilon is actually a good feature in practice as allows to control the level of accuracy and the cost of the simulation with minimal hand tuning.

pymc3/distributions/simulator.py

codecov · 2021-07-05T17:00:41Z

Codecov Report

Merging #4802 (729daea) into main (125256f) will increase coverage by 0.99%.
The diff coverage is 89.56%.

@@            Coverage Diff             @@
##             main    #4802      +/-   ##
==========================================
+ Coverage   71.97%   72.97%   +0.99%     
==========================================
  Files          85       85              
  Lines       13839    13838       -1     
==========================================
+ Hits         9961    10098     +137     
+ Misses       3878     3740     -138

Impacted Files	Coverage Δ
pymc3/distributions/simulator.py	`79.13% <86.36%> (+55.94%)`	⬆️
pymc3/aesaraf.py	`91.34% <100.00%> (+0.07%)`	⬆️
pymc3/distributions/distribution.py	`66.46% <100.00%> (+1.66%)`	⬆️
pymc3/smc/sample_smc.py	`96.72% <100.00%> (+4.72%)`	⬆️
pymc3/smc/smc.py	`99.31% <100.00%> (+26.71%)`	⬆️
pymc3/distributions/__init__.py	`100.00% <0.00%> (ø)`
pymc3/distributions/discrete.py	`99.00% <0.00%> (+0.05%)`	⬆️
pymc3/parallel_sampling.py	`86.70% <0.00%> (+0.94%)`	⬆️
... and 1 more

ricardoV94 · 2021-07-05T17:08:16Z

@michaelosthege was totally right about the pickling issues. I didn't spot them in the first iteration because the default in sample_smc was parallel=False.

In contrast to SMC, the vanilla Metropolis stepper works fine in multiprocessing because the logp graph is compiled before the forking step, whereas SMC tries to pickle the entire model to be sent to each process. only on fork multiprocess

I temporarily set the ABC tests to all be singleprocess.

aloctavodia · 2021-07-06T06:22:09Z

pymc3/smc/sample_smc.py

@@ -156,6 +146,29 @@ def sample_smc(
        %282007%29133:7%28816%29>`__
    """

+    if kernel is not None:


How are we going to select if the kernel is metropolis, independent Metropolis-Hasting, HMC, etc?

Haven't thought about that yet. We might use the kernel argument for that, in which case I'll remove the DeprecationWarning. This is just temporarily.

michaelosthege · 2021-07-06T06:25:36Z

@michaelosthege was totally right about the pickling issues. I didn't spot them in the first iteration because the default in sample_smc was parallel=False.

In contrast to SMC, the vanilla Metropolis stepper works fine in multiprocessing because the logp graph is compiled before the forking step, whereas SMC tries to pickle the entire model to be sent to each process. only on fork multiprocess

I temporarily set the ABC tests to all be singleprocess.

Might be a good idea to add include the test_smc.py in the Windows tests then.

ricardoV94 · 2021-07-06T10:48:00Z

The pickling issue is proving non-trivial to overcome.

It is caused by the dynamic RV class creation (as well as user Ops created at runtime with the new helpers).
The logp dispatching does not seem to be an issue, but in any case I standardized that in my last commitl.

I tried another approach that would require users to explicit subclass from the SimulatorRV instead of creating one dynamically, and these can be pickled / used in multiprocess. However as soon as the classes are defined inside a function (such as the setup_class in TestSMCABC it fails to pickle again). This method also requires more effort from the users.

The old pm.DensityDist faced similar issues and required some workarounds to deal with the logp method. I am not sure how/if this approach can be adapted to V4 to use with RandomVariables (DensityDist hasn't been refactored and might be deprecated altogether). I fiddled with it a bit and did not succeed.

Any suggestions?

ricardoV94 added request discussion v4 labels Jun 24, 2021

michaelosthege reviewed Jun 27, 2021

View reviewed changes

ricardoV94 marked this pull request as ready for review June 27, 2021 15:45

ricardoV94 marked this pull request as draft June 27, 2021 15:46

aloctavodia reviewed Jun 28, 2021

View reviewed changes

pymc3/distributions/simulator.py Outdated Show resolved Hide resolved

ricardoV94 added the SMC Sequential Monte Carlo label Jun 28, 2021

ricardoV94 mentioned this pull request Jun 28, 2021

SMC return inferencedata and perform convergence checks #4814

Merged

kc611 mentioned this pull request Jul 1, 2021

Refactoring Mixture RV for v4 #4825

Closed

ricardoV94 added 3 commits July 5, 2021 17:52

Start refactoring Simulator

2865440

Add example test with more than 1 Simulator

0e5d6f7

Deprecate ABC-specific code and arguments and compile rv_inplace

58e2972

ricardoV94 force-pushed the restore_abc branch from 093e030 to 11cd326 Compare July 5, 2021 16:28

Automatically wrap Simulator non-symbolic functions in Aesara Ops

a8c041f

ricardoV94 force-pushed the restore_abc branch from 11cd326 to 28e1a73 Compare July 5, 2021 17:03

Run all ABC tests in single process

7e01fd2

ricardoV94 force-pushed the restore_abc branch from 28e1a73 to 7e01fd2 Compare July 5, 2021 18:52

aloctavodia reviewed Jul 6, 2021

View reviewed changes

Standardize Simulator logp

729daea

ricardoV94 mentioned this pull request Jul 23, 2021

Refactor pm.Simulator (2nd attempt) #4877

Closed

ricardoV94 closed this Aug 4, 2021

ricardoV94 mentioned this pull request Aug 4, 2021

Refactor pm.Simulator #4903

Merged

ricardoV94 deleted the restore_abc branch January 31, 2022 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor pm.Simulator (WIP) #4802

Refactor pm.Simulator (WIP) #4802

ricardoV94 commented Jun 24, 2021 •

edited

Loading

michaelosthege Jun 27, 2021

michaelosthege Jun 27, 2021

ricardoV94 Jun 27, 2021 •

edited

Loading

ricardoV94 Jun 27, 2021

michaelosthege Jun 27, 2021

ricardoV94 commented Jun 27, 2021

aloctavodia left a comment

codecov bot commented Jul 5, 2021 •

edited

Loading

ricardoV94 commented Jul 5, 2021 •

edited

Loading

aloctavodia Jul 6, 2021

ricardoV94 Jul 6, 2021

michaelosthege commented Jul 6, 2021

ricardoV94 commented Jul 6, 2021 •

edited

Loading

Refactor pm.Simulator (WIP) #4802

Refactor pm.Simulator (WIP) #4802

Conversation

ricardoV94 commented Jun 24, 2021 • edited Loading

michaelosthege Jun 27, 2021

Choose a reason for hiding this comment

michaelosthege Jun 27, 2021

Choose a reason for hiding this comment

ricardoV94 Jun 27, 2021 • edited Loading

Choose a reason for hiding this comment

ricardoV94 Jun 27, 2021

Choose a reason for hiding this comment

michaelosthege Jun 27, 2021

Choose a reason for hiding this comment

ricardoV94 commented Jun 27, 2021

aloctavodia left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 5, 2021 • edited Loading

Codecov Report

ricardoV94 commented Jul 5, 2021 • edited Loading

aloctavodia Jul 6, 2021

Choose a reason for hiding this comment

ricardoV94 Jul 6, 2021

Choose a reason for hiding this comment

michaelosthege commented Jul 6, 2021

ricardoV94 commented Jul 6, 2021 • edited Loading

ricardoV94 commented Jun 24, 2021 •

edited

Loading

ricardoV94 Jun 27, 2021 •

edited

Loading

codecov bot commented Jul 5, 2021 •

edited

Loading

ricardoV94 commented Jul 5, 2021 •

edited

Loading

ricardoV94 commented Jul 6, 2021 •

edited

Loading