Skip to content

JoaoHenriqueOliveira/bayesmix

Repository files navigation

bayesmix is a C++ library for running MCMC simulation in Bayesian mixture models.

Current state of the software:

  • bayesmix performs inference for mixture models of the kind

Where P is either the Dirichlet process or the Pitman--Yor process.

  • We currently support univariate and multivariate location-scale mixture of Gaussian densities

  • Inference is carried out using either Algorithm 2 or Algorithm 8 in Neal (2000).

  • Serialization of the MCMC chains is possible using Google's protocol buffers

Installation:

We heavily depend on Google's Protocol Buffers, so make sure to install it beforehand!

On Linux machine the following will install the library

sudo apt-get install autoconf automake libtool curl make g++ unzip
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.14.0/protobuf-python-3.14.0.zip
unizp protobuf-python-3.14.0.zip
cd protobuf-3.14.0/
./configure --prefix=/usr
make check
sudo make install
sudo ldconfig # refresh shared library cache.

On Mac and Windows machines, follow the official install guide (link)

Finally, to work with bayesmix just clone the repository with

git clone --recurse-submodule [email protected]:bayesmix-dev/bayesmix.git

To run the executable:

mkdir build
cd build
cmake ..
make run
cd ..
./build/run

To run unit tests:

cd build
cmake ..
make test_bayesmix
./test/test_bayesmix

For Developers

Please install the pre-commit hooks before commiting anything: it clears the output of jupyter notebooks. Just type

./bash/setup_pre_commit.sh

Future steps (contributors are welcome!)

A Python package is already under development

  • Extension to normalized random measures
  • Using HMC / MALA MCMC algorithm to sample from the cluster-specific full conditionals when it's not conjugate to the base measure
  • R package

Cluster estimate

This library provides a cluster estimates computation, given a mcmc chains. It is based on expected posterior loss minimisation given a loss function and using a greedy algorithm. Sources files are in the folder src/clustering.

To run the code :

cd build
cmake ..
make run_pe
./run_pe filename_in filename_out loss Kup

where :

  • filename_in is the entry filename that contains mcmc chain (a file in which values are separated with spaces)
  • filename_out is the out filename in which cluster estimate will be writen
  • loss is the specification of the loss function : 0 for binder loss, 1 for variation of information, 2 for normalized variation of information
  • Kup is the max number of clusters (usually Kup=N is a good entry if dataset has a length of N)

Credible balls computation is also available. This aims to quantify the uncertainty of a cluster estimate. To run the credible balls code :

cd build
cmake ..
make run_cb
./run_cb filename_mcmc filename_pe filename_out loss rate

where :

  • filename_mcmc is the filename in which there is the mcmc chain.
  • filename_pe is the filename in which there is the cluster estimate.
  • filename_out is the filename in which result will be writen
  • loss is the specification of the loss function : 0 for binder loss, 1 for variation of information, 2 for normalized variation of information
  • rate : has to be > 0. The smaller it is, the longer will run the program.

The directory src/clustering/R scripts contains some scripts to generate mcmc chains for univariate and multivariate datasets.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published