Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prior/posterior predictive check plots #319

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

PaulinaMartin96
Copy link
Contributor

@PaulinaMartin96 PaulinaMartin96 commented Jul 19, 2021

"ppcplot" function was added for plotting prior/posterior predictive checks for one or more dependent variables. As args this function receives yobs_data, the observed data for dependet variables (a vector or matrix), and ypred_data , the posterior/prior predictive results (Chains object). It plots the observed data, a sample of predictions and the predictions mean.

As kwargs, this function receives:

  • yvar_name (vector of Symbol) which contains the name of the dependent variables to be plotted,
  • plot_type which can take :density , :cumulative, and :histogram as values,
  • predictive_check for plot titles and can be :prior or :posterior (default value is :posterior)
  • n_samples which established the number o samples to be plotted (default value is 50, but when plotting it is redefined as the minimum between 50 and sample size in ypred_data).

For more than one dependet variable in a single model, yvar_name must be provided and the order in which names variables appear must be the same as in the observed data matrix. This was done in order to separate predictions for every dependent variable, because predict does not return predictions ordered by variable.

The following is a working example for a model with one dependent variable

using Turing, StatsBase, Statistics, MCMCChains, StatsPlots

@model function linear_reg(x, y, σ = 0.1) 
            β ~ Normal(1, 0.5) 
  
            for i  eachindex(y) 
                y[i] ~ Normal* x[i], σ) 
            end 
        end; 
  
σ = 0.1; f(x) = 2 * x + 0.1 * randn();   
Δ = 0.01; xs_train = 0:Δ:10; ys_train = f.(xs_train);   
xs_test = [10 + i*Δ for i in 1:100]; ys_test = f.(xs_test); 
m_train = linear_reg(xs_train, ys_train, σ);

#Prior predictive check
chain_lin_reg = sample(m_train, Prior(), 200);   
m_test_prior = linear_reg(xs_test, Vector{Union{Missing, Float64}}(undef, length(ys_test)), σ);   
predictions_prior = predict(m_test_prior, chain_lin_reg) 
ppcplot(ys_test, predictions_prior, yvar_name = [:y_var], predictive_check = :prior, plot_type = :density )

image

And for posterior predictive check

#Posterior predictive check  
chain_lin_reg = sample(m_train, NUTS(100, 0.65), 200);   
m_test = linear_reg(xs_test, Vector{Union{Missing, Float64}}(undef, length(ys_test)), σ);   
predictions_posterior = predict(m_test, chain_lin_reg) 
ppcplot(ys_test, predictions_posterior)

Plot_type = :density
image

Plot_type = :cumulative

ppcplot(ys_test, predictions_posterior, n_samples = 20, predictive_check = :posterior, plot_type = :cumulative, size = (900, 600))

image

Plot_type = :histogram
image

Aditionally, this is a working example for a model with two dependent variables

@model function linear_reg(x, y, z, σ = 0.1) 
            β ~ Normal(0, 1)
            γ ~ Normal(0, 1)
  
            for i  eachindex(y) 
                y[i] ~ Normal* x[i], σ)
                z[i] ~ Normal* x[i], σ)    
            end 
        end; 
  
σ = 0.1; f(x) = 2 * x + 0.1 * randn(); g(x) = 4 * x + 0.4 * randn();  
Δ = 0.01; xs_train = 0:Δ:10; ys_train = f.(xs_train); zs_train = g.(xs_train); 
xs_test = [10 + i*Δ for i in 1:100]; ys_test = f.(xs_test); zs_test = g.(xs_test);  
m_train = linear_reg(xs_train, ys_train, zs_train, σ); 
  
chain_lin_reg = sample(m_train, NUTS(100, 0.65), 200); 
  
m_test = linear_reg(xs_test, Vector{Union{Missing, Float64}}(undef, length(ys_test)), Vector{Union{Missing, Float64}}(undef, length(zs_test)), σ); 
  
predictions = predict(m_test, chain_lin_reg)

var_test = hcat(ys_test, zs_test)
ppcplot(var_test, predictions, n_samples = 100, yvar_name = [:y, :z], predictive_check = :posterior, plot_type = :density, size = (900, 400))

image

ppcplot(var_test, predictions, n_samples = 30, yvar_name = [:y, :z], predictive_check = :posterior, plot_type = :cumulative, size = (900, 400))

image

var_name = [:y, :z]
ppcplot(var_test, predictions, yvar_name = var_name, n_samples = 10, predictive_check = :posterior, plot_type = :histogram, size = (900, 600))

image

@PaulinaMartin96
Copy link
Contributor Author

For this PR, should the version be 4.16.0 (after #316 ) or 5.1.0 (after #310 )?

@PaulinaMartin96 PaulinaMartin96 marked this pull request as ready for review July 23, 2021 19:32
@cpfiffer
Copy link
Member

Probably 4.16.0 since #310 is a bigger thing and probably won't have too much effect here.

@delete-merged-branch delete-merged-branch bot deleted the branch TuringLang:master December 24, 2021 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants