Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMC return inferencedata and perform convergence checks #4814

Merged
merged 3 commits into from
Jun 30, 2021

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Jun 28, 2021

This PR makes sample_smc return InferenceData and run convergence checks by default.

@aloctavodia I am not familiar at all with InferenceData objects so let me know if I am doing something wrong or not returning as much information as we could / should.

You mentioned in #4802 (comment) that we could store the betas and the log_likelihood in sample_stats. However, the log_likelihood appears magically in the returned idata, is this one correct? Should we manually add ours instead, since it is already pre-computed. Or did you mean the log_marginal_likelihood?

These are the variables stored previously in trace.report:

trace.report._n_draws = draws
trace.report._n_tune = 0
trace.report.log_marginal_likelihood = np.array(log_marginal_likelihoods)
trace.report.log_pseudolikelihood = log_pseudolikelihood
trace.report.betas = betas
trace.report.accept_ratios = accept_ratios
trace.report.nsteps = nsteps
trace.report._t_sampling = time.time() - t1

Also I am running the same convergence checks as in the normal pm.sample, should we do this or instead, implement specific checks for SMC (or do nothing at all)?

@ricardoV94 ricardoV94 added request discussion SMC Sequential Monte Carlo labels Jun 28, 2021
@ricardoV94 ricardoV94 changed the title SMC inferencedata and convergence checks SMC return inferencedata and perform convergence checks Jun 28, 2021
@ricardoV94
Copy link
Member Author

Apologies for the reviewers spam, just thought that all of you might have a better understanding of the InferenceData than me :D

pymc3/smc/sample_smc.py Show resolved Hide resolved
pymc3/smc/sample_smc.py Show resolved Hide resolved
@aloctavodia
Copy link
Member

aloctavodia commented Jun 28, 2021

Just for the record and summarizing our previous conversation.

  • We should keep diagnostics as we have preliminary simulations showing they are useful (even when they were not designed with SMC in mind).

  • For ABC (but not "plain SMC") we should overwrite the log-likelihood with the values from "log_pseudolikelihood" (I wonder if with the changes introduced in Refactor pm.Simulator (WIP) #4802, this is still necessary... maybe not!)

  • We should store in sample_stats. the log_marginal_likelihood, betas, accept_ratios and nsteps. The _t_sampling should be automatically added to the attributes, the same goes for _n_tune (even when this does not make sense for SMC).

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jun 28, 2021

  • We should store in sample_stats. the log_marginal_likelihood, betas, accept_ratios and nsteps. The _t_sampling should be automatically added to the attributes, the same goes for _n_tune (even when this does not make sense for SMC).

I am reading here https://arviz-devs.github.io/arviz/schema/schema.html?highlight=sample_stats#sample-stats, and it seems sample_stats expects one measure per posterior sample, but most of those SMC measures do not work like this. Should I just ignore the text and save them there anyway?

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jun 28, 2021

Pushed changes to include the sampler stats as well. Feels pretty hacky, because I have to deal with the case where chains have different numbers of draws. Let me know if you have a better suggestion

@ricardoV94 ricardoV94 marked this pull request as ready for review June 28, 2021 16:42
@aloctavodia
Copy link
Member

I think that sample_stats should be more general, but that's on the ArviZ side.

Different chains should have the same number of draws.

@ricardoV94
Copy link
Member Author

I think that sample_stats should be more general, but that's on the ArviZ side.

Different chains should have the same number of draws.

What I was calling draws here are the "stages". Sorry for the confusion. Those can vary between chains

Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jun 29, 2021

The current sample_stats dimensions are "chain" x "draw" by default. Should I rename them to "chain" x "stage", or it's not worth the trouble?

@aloctavodia
Copy link
Member

I think is not worth the trouble at this point.

@ricardoV94 ricardoV94 merged commit 4505f14 into pymc-devs:main Jun 30, 2021
@ricardoV94 ricardoV94 deleted the smc_inferencedata branch June 30, 2021 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SMC Sequential Monte Carlo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants