Skip to content

Geometric MCMC algorithms and zero-variance (ZV) Monte Carlo Bayesian routines

License

Notifications You must be signed in to change notification settings

papamarkou/GeometricMCMC.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The GeometricMCMC package has been deprecated since it has been merged into the MCMC package.

Overview of package's scope

This package implements the following 6 MCMC algorithms:

Julia functionMCMC algorithm
1mhMetropolis-Hastings (MH)
2malaMetropolis adjusted Langevin algorithm (MALA)
3smmalaSimplified Manifold MALA (SMMALA)
4mmalaManifold MALA (MMALA)
5hmcHamiltonian Monte Carlo (HMC)
6rmhmcRiemannian manifold HMC (RMHMC)

More details for the geometric MCMC methods of the package can be found in this article.

Furthermore, the package provides the linearZV() and quadraticZV() functions for the computation of the zero-variance (ZV) Monte Carlo Bayesian estimators, see this publication.

Installation and usage

To install the GeometricMCMC package, the usual commands of the Julia package manager are called:

Pkg.update()
Pkg.add("GeometricMCMC")

After installation, the following typical statement is typed in the Julia command prompt so as to use the GeometricMCMC package:

using GeometricMCMC

The Distributions package may be needed, mostly for sampling from distributions, so it may be useful to make both packages available:

using Distributions, GeometricMCMC

Tutorial

This file serves as a tutorial explaining how to use the MCMC routines of the package, as well as the linearZV() and quadraticZV() functions. Code for the Bayesian logit model with multivariate uncorrelated zero-mean Normal prior is provided to demonstrate usage and to follow through the tutorial.

To invoke each of the MCMC methods of the package, it is required to provide two input arguments. The first argument is an instance of the Model type, defined in the package, and is common across all 6 MCMC routines. The second argument is an instance of the algorithm's options type and is specific to the algorithm.

The Model type

The Model type provides the statistical model to the MCMC routines. This includes the functions defining the model, the number of the model's parameters and the data.

More specifically, the functions required for defining the model are the log-prior, the log-likelihood, the gradient of the log-posterior, the position-specific metric tensor of the Riemannian manifold of the parameter space and the tensor's partial derivatives with respect to each of the parameters. These functions need to be known in closed form as the package stands so far. The log-posterior is also one of the model's functions and it does not need to be specified by the user, since the Model type sets it to be the sum of the log-likelihood with the log-prior.

It is apparent that the Model type represents a Bayesian model. However, it is also possible to accommodate simpler statistical models, such a non-Bayesian log-target. This can be achieved, for instance, by setting the log-likelihood to be equal to the log-target and the improper log-prior to be zero.

For ease of use, all the user-defined functions in the Model type (apart from the ones starting with "rand" as explained later in the tutorial) conventionally share the same signature

function myFunction(pars::Vector{Float64}, nPars::Int, data::Union(Array{Any}, Dict{Any, Any}))

where pars are the model's parameters simulated by the MCMC algorithm and thus not needed to be numerically specified by the user, nPars is the number of parameters and data is an Array Array{Any} or a dictionary Dict{Any, Any} holding the data.

The Model can be instantiated with fewer arguments. For instance, the Metropolis-Hastings function mh() requires only the log-prior, log-likelihood and the gradient of the log-posterior. In fact, the gradient of the log-posterior is not necessary for running a Metropolis-Hastings MCMC simulation. Nevertheless, it has been set as a required argument so that mh() returns the zero-variance control variates along with the MCMC output. Similarly, the log-prior, log-likelihood and the gradient of the log-posterior suffice as arguments in the instantiation of Model in order to run MALA or HMC. SMMALA requires additionally the metric tensor. The partial derivatives of the metric tensor with respect to the parameters are needed in the Model instantiation only for MMALA or RMHMC simulations.

An example: the Bayesian logit model with Normal prior

Creating the data member of the Model type

To illustrate how to create a Bayesian Model, the logit regression with a multivariate uncorrelated zero-mean Normal prior is considered. The data are then the design matrix X, with nData rows and nPars columns, and the binary response variable y, which is a vector with nData elements. nData andnPars are the number of data points and the number of parameters, respectively. Assuming a Normal prior N(0, priorVar*I), the prior's variance priorVar is also part of the data, in the broad sense of the word.

It is up to the user's preference whether to define data as Array{Any} or Dict{Any, Any}, as long as data is manipulated accordingly in the subsequent definition of the Model functions. In what follows, data is introduced as a dictionary, holding the design matrix X, the response variable y, the prior variance priorVar and the number of data points nData. Strictly speaking, nData is not a requirement, since it can be deduced by size(X, 1). It is however more efficient to pass nData to the data dictionary, given that a typical MCMC simulation invokes the Model functions thousands of times.

As a concrete example, test/swiss.txt contains the Swiss banknotes dataset. The dataset holds the measurements of 4 covariates on 200 Swiss banknotes, of which 100 genuine and 100 counterfeit, representing the length of the bill, the width of the left and the right edge, and the bottom margin width. Therefore, the design matrix X has 200 rows and 4 columns, and the 4 regression coefficients constitute the model parameters to be estimated. The binary response variable y is the type of banknote, 0 being genuine and 1 counterfeit. To create the data dictionary for the Swiss banknotes, change into the test directory and run the test/swiss.jl script:

# Create design matrix X and response variable y from swiss data array
swiss = readdlm("swiss.txt", ' ');
covariates = swiss[:, 1:end-1];
nData, nPars = size(covariates);

covariates = (bsxfun(-, covariates, mean(covariates, 1))
./repmat(std(covariates, 1), nData, 1));

polynomialOrder = 1;
X = zeros(nData, nPars*polynomialOrder);
for i = 1:polynomialOrder
  X[:, ((i-1)*nPars+1):i*nPars] = covariates.^i;
end

y = swiss[:, end];

# Create data dictionary
data = {"X"=>X, "y"=>y, "priorVar"=>100., "nData"=>nData};

Defining the functions of the Model

Having defined the data dictionary as above, test/logitNormalPrior.jl provides the functions of the Bayesian logit model:

# Functions for Bayesian logit model with a Normal prior N(0, priorVar*I)

function logPrior(pars::Vector{Float64}, nPars::Int, data::Dict{Any, Any})
  return (-dot(pars,pars)/data["priorVar"]
    -nPars*log(2*pi*data["priorVar"]))/2
end

function logLikelihood(pars::Vector{Float64}, nPars::Int, data::Dict{Any, Any})
  XPars = data["X"]*pars
  return (XPars'*data["y"]-sum(log(1+exp(XPars))))[1]
end

function gradLogPosterior(pars::Vector{Float64}, nPars::Int,
  data::Dict{Any, Any})
  return (data["X"]'*(data["y"]-1./(1+exp(-data["X"]*pars)))
    -pars/data["priorVar"])
end

function tensor(pars::Vector{Float64}, nPars::Int, data::Dict{Any, Any})
  p = 1./(1+exp(-data["X"]*pars))
  return ((data["X"]'.*repmat((p.*(1-p))', nPars, 1))*data["X"]
    +(eye(nPars)/data["priorVar"]))
end

function derivTensor(pars::Vector{Float64}, nPars::Int, data::Dict{Any, Any})
  matrix = Array(Float64, data["nData"], nPars)
  output = Array(Float64, nPars, nPars, nPars)

  p = 1./(1+exp(-data["X"]*pars))

  for i = 1:nPars
    for j =1:nPars
      matrix[:, j] = data["X"][:, j].*((p.*(1-p)).*((1-2*p).*data["X"][:, i]))
    end

    output[:, :, i] = matrix'*data["X"]
  end
  
  return output
end

function randPrior(nPars::Int, data::Dict{Any, Any})
  return rand(Normal(0.0, sqrt(data["priorVar"])), nPars)
end

There are two points to bare in mind when defining the functions of the model. Firstly, it is practical, though not obligatory, to use the reserved names logPrior(), logLikelihood(), gradLogPosterior(), tensor(), derivTensor() and randPrior() in the relevant function definitions. Secondly, it is mandatory to adhere to the signature

function myFunction(pars::Vector{Float64}, nPars::Int, data::Union(Array{Any}, Dict{Any, Any}))

for the deterministic functions of the model, and to the signature

function myFunction(nPars::Int, data::Union(Array{Any}, Dict{Any, Any}))

for the stochastic function randPrior().

The second argument nPars to the model's functions seems redundant, since it can be derived from the first argument pars via the size(pars, 1) command. Nevertheless, it is more efficient practice to pass nPars as an argument given the number of times the functions are invoked in a single MCMC run. Besides, once the functions are passed to Model, then they are invoked with fewer arguments as members of the instantiated Model, so the interface remains simple.

Creating an instance of Model

Once the data and the functions of Model are defined, it is straightforward to instantiate Model with a single command. For example, test/logitNormalPriorSwissRmhmc.jl demosntrates how to create an instance of Model so as to run RMHMC:

model = Model(nPars, data, logPrior, logLikelihood, gradLogPosterior, tensor, derivTensor, randPrior);

As shown in test/logitNormalPriorSwissMmala.jl, the same Model instantiation is used for MMALA. In fact, the same command invocation suffices to create an instance which can be then utilized by any of the 6 MCMC algorithms of the GeometricMCMC package.

In order to make Model more user friendly, it is possible to shorten the Model invocation in some cases. For example, the partial derivatives of the metric tensor are not needed when running SMMALA. This is why derivTensor() is omitted in test/logitNormalPriorSwissSmmala.jl in the model definition:

model = Model(nPars, data, logPrior, logLikelihood, gradLogPosterior, tensor, randPrior);

As a second example, the tensor() function is not needed when running Metropolis-Hastings, MALA or HMC. For this reason, both tensor() and derivTensor() can be left out at Model instantiation. So, the model definition in test/logitNormalPriorSwissMh.jl, test/logitNormalPriorSwissMala.jl andtest/logitNormalPriorSwissHmc.jl takes the more succinct form

model = Model(nPars, data, logPrior, logLikelihood, gradLogPosterior, randPrior);

Apparently, the corresponding function definitions tensor() and derivTensor() are not required if there is no intention to run RMHMC or MMALA.

Although the functions in the model instance are meant to be invoked internally by the MCMC functions, they are also available at the disposal of the user. For this reason, it is useful to demonstrate how they can be called. Having defined model, model.randPrior(), without any input arguments, samples from the prior. Any of the deterministic functions of the model are called by passing to them a vector holding the values of the model's paramater. For example

parsSample = model.randPrior()
model.logPrior(parsSample)
model.logLikelihood(parsSample)
model.logPosterior(parsSample)
model.gradLogPosterior(parsSample)
model.tensor(parsSample)
model.derivTensor(parsSample)

The MCMC option types

The common Model type shared by all the MCMC routines is handy as it has become clear from the tutorial. On the other hand, the options of each MCMC algorithm are obviously specific to the algorithm. The only generic set of common MCMC options are the total number of MCMC iterations n, the number of burnin iterations nBurnin and the monitor rate monitorRate, which is the number of successive iterations for which the acceptance ratio is calculated. It is therefore natural to gather n, nBurnin and monitorRate in the "base" MCMC type McmcOpts. nPostBurnin is also a member of McmcOpts and it is implicity set by the constructor of McmcOpts to be the difference n-nBurnin.

McmcOpts is in turn member of the other option types, which are specific to each MCMC routine. Therefore, the Metropolis-Hastings MhOpts type consists of McmcOpts and of the additional widthCorrection member, which helps adjusting the proposal's standard deviation when the acceptance ratio is outside the [20%, 60%] acceptance ratio band. An empirically reasonable value for widthCorrection is 0.1 for example.

The user does not need to initialize McmcOpts separately, since the constructor of the options type of each MCMC routine invokes the constructor of McmcOpts. As it can be seen in test/logitNormalPriorSwissMh.jl for example,

mhOpts = MhOpts(55000, 5000, 0.1);

creates an instance of the Metropolis-Hastings MhOpts type. This means that the following values are accessible as members of MhOpts:

mhOpts's memberValue
1mhOpts.mcmc.n55000
2mhOpts.mcmc.Burnin5000
3mhOpts.mcmc.nPostBurnin50000
4mhOpts.mcmc.monitorRate100
5mhOpts.widthCorrection0.1

If a monitorRate other than the default of 100 is desired, say 200, then it can be passed as the third argument to the constructor of MhOpts:

mhOpts = MhOpts(55000, 5000, 200, 0.1);

In a similar way, the type MalaOpts holds the options of the MALA algorithm, which include McmcOpts and the drift step. MALA's drift step can be either a constant real number or a function with signature

myDriftStep(currentIter::Int, acceptanceRatio::Float64, nMcmc::Int, nBurnin::Int, currentStep::Float64)

which adjusts and returns the drift step during burnin on the basis of the current value of the acceptance ratio. src/setStep.jl contains auxiliary functions to assist the user setting up a function for adjusting the drift step during burnin. Without elaborating in full detail, if the user wants to use the available auxiliary setMalaDriftStep function, he needs to define only a vector of 10 drift steps. The first value of the vector is the one empirically considered to be reasonable. If the resulting acceptance ratio is too low or too high for MALA, then the second drift step of the vector is used, which is expected to be rather small. The successive steps of the vector should be increasing at a pace that brings the acceptance ratio to the desired levels. Apparently, defining appropriate numerical values for the 10 drift steps of the vector is a trial-and-error process. test/logitNormalPriorSwissMala.jl gives an example of how to define the drift steps and subsequently how to invoke the setMalaDriftStep function:

driftSteps = [1, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 0.25];

setDriftStep(i::Int, acceptanceRatio::Float64, nMcmc::Int, 
  nBurnin::Int, currentStep::Float64) =
  setMalaDriftStep(i, acceptanceRatio, nMcmc, nBurnin, currentStep, driftSteps)

Then the MalaOpts type, holding the MALA options, is instantiated as

malaOpts = MalaOpts(55000, 5000, setDriftStep);

The SmmalaOpts and MmalaOpts types for SMMALA and MMALA, respectively, are aliases of the MalaOpts type.

In short, MhOpts holds the options for HMC. These include McmcOpts, the number of leapfrog steps nLeaps, the mass matrix mass and the leapfrog step, which can be specified either as a constant real number or as a function that adjusts the leapfrog step, similarly to the MALA mechanism for adjusting the drift step. An example of setting up the HMC options is available in test/logitNormalPriorSwissHmc.jl:

leapSteps = [0.4, 1e-3/2, 1e-3, 1e-2, 1e-1, 0.15, 0.2, 0.25, 0.3, 0.35];

setLeapStep(i::Int, acceptanceRatio::Float64, nMcmc::Int, 
  nBurnin::Int, currentStep::Float64) =
  setHmcLeapStep(i, acceptanceRatio, nMcmc, nBurnin, currentStep, leapSteps)

hmcOpts = HmcOpts(55000, 5000, 10, setLeapStep, eye(nPars));

where 10 corresponds to nLeaps and eye(nPars) to mass.

The RmhmcOpts options type of RMHMC includes McmcOpts, the number of leapfrog steps nLeaps, the number of Newton steps nNewton and the leapfrog step, which is either a constant real or it is tuned similarly to HMC's leapfrog step. test/logitNormalPriorSwissRmhmc.jl provides an example for setting up the RMHMC options:

leapSteps = [0.9, 0.01, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8];

setLeapStep(i::Int, acceptanceRatio::Float64, nMcmc::Int, 
  nBurnin::Int, currentStep::Float64) =
  setRmhmcLeapStep(i, acceptanceRatio, nMcmc, nBurnin, currentStep, leapSteps)

rmhmcOpts = RmhmcOpts(55000, 5000, 6, setLeapStep, 4);

where 6 corresponds to nLeaps and 4 to nNewton.

Calling the MCMC routines

With the Model and MCMC option types instantiated, the MCMC routines can be called with a single command. Here are the examples for running all 6 MCMC algorithms on the Bayesian logit model with multivariate uncorrelated zero-mean Normal prior:

mcmc, z = mh(model, mhOpts);
mcmc, z = mala(model, malaOpts);
mcmc, z = smmala(model, smmalaOpts);
mcmc, z = mmala(model, mmalaOpts);
mcmc, z = hmc(model, hmcOpts);
mcmc, z = rmhmc(model, rmhmcOpts);

Each of the 6 MCMC routines return 2 Float64 arrays with opts.mcmc.nPostBurnin rows and model.nPars columns each. The mcmc array holds the simulated MCMC chains, while the z array holds the zero-variance control variates, i.e. minus half the gradient of the log-target.

Calling the zero-variance routines

The mcmc and z arrays can be passed as arguments to the linearZv() or quadraticZv() functions in order to compute the linear or the quadratic zero-variance Bayesian estimators:

linearZvMcmc, linearCoef = linearZv(mcmc, z);
quadraticZvMcmc, quadraticCoef = quadraticZv(mcmc, z);

linearZv() returns 2 Float64 arrays. The first one, saved in linearZvMcmc above, has opts.mcmc.nPostBurnin rows and model.nPars columns and holds the linear ZV-MCMC estimators. The second array, saved in linearCoef, has model.nPars rows and model.nPars columns. linearCoef contains the coefficients of the linear polynomial upon which the zero-variance approach relies.

In a similar fashion, quadraticZv() returns 2 Float64 arrays, the first of which has opts.mcmc.nPostBurnin rows and model.nPars columns and holds the quadratic ZV-MCMC estimators. The second array has model.nPars*(model.nPars+3)/2 rows and model.nPars columns, containing the coefficients of the underlying quadratic polynomial of the zero-variance method.

The Bayesian probit model with multivariate uncorrelated zero-mean Normal is also available in the test directory, accompanied by the scripts for running all 6 MCMC algorithms on the vasoconstriction dataset. The data come from an experiment conducted on human physiology to study the effect of taking a single deep breath on the occurrence of a reflex vasoconstriction in the skin of the digits. 39 samples from three individuals are available, each of them contributing 9, 8 and 22 samples. Although the data represent repeated measurements, the observations can be assumed to be independent, therefore the Bayesian probit model can be applied. Two explanatory variables are included in the study, namely the rate of inhalation and the inhaled volume of air per individual. An intercept is also added, so 3 regression coefficients comprise the parameters of the model. The occurrence or non-occurrence of vasoconstriction in the skin of the digits of each subject, corresponding to 1 and 0, plays the role of the binary response.

Future features

The GeometricMCMC package is being extended to include

  • models of ordinary differential equations (ODEs),
  • usage of the MCMC routines when the model's functions are not known in closed form.

About

Geometric MCMC algorithms and zero-variance (ZV) Monte Carlo Bayesian routines

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages