From 68564fbd7a162fd44b947940428eed06f7920ed6 Mon Sep 17 00:00:00 2001
From: David van Leeuwen <david.vanleeuwen@gmail.com>
Date: Thu, 20 Nov 2014 09:56:45 +0100
Subject: [PATCH] Update README.md

---
 README.md | 114 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 71 insertions(+), 43 deletions(-)

diff --git a/README.md b/README.md
index c442b05..5b322eb 100644
--- a/README.md
+++ b/README.md
@@ -6,9 +6,9 @@ A Julia package for Gaussian Mixture Models (GMMs).
 
 This package contains support for Gaussian Mixture Models.  Basic training, likelihood calculation, model adaptation, and i/o are implemented.
 
-This Julia type is more specific than Dahua Lin's [MixtureModels](https://github.com/lindahua/MixtureModels.jl), in that it deals only with normal (multivariate) distributions (a.k.a Gaussians), but it does so more efficiently, hopefully. 
+This Julia type is more specific than Dahua Lin's [MixtureModels](http://distributionsjl.readthedocs.org/en/latest/mixture.html), in that it deals only with normal (multivariate) distributions (a.k.a Gaussians), but it does so more efficiently, hopefully.  We have support for switching between GMM and MixtureModels types. 
 
-At this moment, we have implemented both diagonal covariance and full covariance GMMs. 
+At this moment, we have implemented both diagonal covariance and full covariance GMMs, and full covariance variational Bayes GMMs.  
 
 In training the parameters of a GMM using the Expectation Maximization (EM) algorithm, the inner loop (computing the Baum-Welch statistics) can be executed efficiently using Julia's standard parallelization infrastructure, e.g., by using SGE.  We further support very large data (larger than will fit in the combined memory of the computing cluster) though [BigData](https://github.com/davidavdav/BigData.jl), which has now been incorporated in this package. 
 
@@ -37,9 +37,12 @@ My approach is to have:
 
 The consequence is that "data points run down" in a matrix, just like records do in a DataFrame.  Hence, statistics per feature dimension occur consecutive in memory which may be advantageous for caching efficiency.  On the other hand, features belonging to the same data point are separated in memory, which probably is not according to the way they are generated, and does not extend to streamlined implementation.  The choice in which direction the data must run is an almost philosophical problem that I haven't come to a final conclusion about.  
 
+Please note that the choice for "data points run down" is the opposite of the convention used in [Distributions.jl](https://github.com/JuliaStats/Distributions.jl), so if you convert `GMM`s to `MixtureModel`s in order to benefit from the methods provided for these distributions, you need to transpose the data.  
+
 Type
 ----
 
+A simplified version of the type definition for the Gaussian Mixture Model is
 ```julia
 type GMM
     n::Int                         # number of Gaussians
@@ -50,19 +53,14 @@ type GMM
     hist::Array{History}           # history of this GMM
 end
 ```
+Currently, the variable `Σ` is heterogeneous both in form and interpretation, depending on whether it represents full covariance or diagonal covariance matrices.  
+
+ - full covariance: `Σ` is represented by a `Vector` of `chol(inv(Σ), :U)`
+ - diagonal covariance: `Σ` is formed by vertically stacking row-vectors of `diag(Σ)`
 
 Constructors
 ------------
 
-```julia
-GMM(n::Int, d::Int; kind=:diag)
-```
-Initialize a GMM with `n` multivariate Gaussians of dimension `d`.
-The means are all set to **0** (the origin) and the variances to
-**I**, which is silly by itself.  If `diag=:full` is specified, the
-covariances are full rather than diagonal.  One should replace the
-values of the weights, means and covariances afterwards.
-
 ```julia
 GMM(weights::Vector, μ::Array,  Σ::Array, hist::Vector)
 ```
@@ -70,8 +68,8 @@ This is the basic outer constructor for GMM.  Here we have
  - `weights` a Vector of length `n` with the weights of the mixtures
  - `μ` a matrix of `n` by `d` means of the Gaussians
  - `Σ` either a matrix of `n` by `d` variances of diagonal Gaussians,
-   or a vector of `n` matrices of `d` by `d` covariances of full
-   covariance Gaussians
+   or a vector of `n` `Triangular` matrices of `d` by `d`,
+   representing the Cholesky decomposition of the full covariance
  - `hist` a vector of `History` objects describing how the GMM was
    obtained. (The type `History` simply contains a time `t` and a comment
    string `s`)
@@ -83,17 +81,21 @@ GMM(x::Vector)
 Create a GMM with 1 mixture, i.e., a multivariate Gaussian, and initialize with mean an variance of the data in `x`.  The data in `x` must be a `nx` x `d` Matrix, where `nx` is the number of data points, or a Vector of length `nx`. 
 
 ```julia
-GMM(x::Matrix, n::Int; method=:kmeans, kind=:diag, nInit=50, nIter=10, nFinal=nIter)
+GMM(n:Int, x::Matrix; method=:kmeans, kind=:diag, nInit=50, nIter=10, nFinal=nIter)
 ```
 Create a GMM with `n` mixtures, given the training data `x` and using the Expectation Maximization algorithm.  There are two ways of arriving at `n` Gaussians: `method=:kmeans` uses K-means clustering from the Clustering package to initialize with `n` centers.  `nInit` is the number of iterations for the K-means algorithm, `nIter` the number of iterations in EM.  The method `:split` works by initializing a single Gaussian with the data `x` and subsequently splitting the Gaussians followed by retraining using the EM algorithm until `n` Gaussians are obtained.  `n` must be a power of 2 for `method=:split`.  `nIter` is the number of iterations in the EM algorithm, and `nFinal` the number of iterations in the final step. 
 
-Other functions
--------
-
 ```julia
-split(gmm::GMM; minweight=1e-5, covfactor=0.2)
+GMM(n::Int, d::Int; kind=:diag)
 ```
-Double the number of Gaussians by splitting each Gaussian into two Gaussians.  `minweight` is used for pruning Gaussians with too little weight, these are replaced by an extra split of the Gaussian with the highest weight.  `covfactor` controls how far apart the means of the split Gaussian are positioned. 
+Initialize a GMM with `n` multivariate Gaussians of dimension `d`.
+The means are all set to **0** (the origin) and the variances to
+**I**, which is silly by itself.  If `diag=:full` is specified, the
+covariances are full rather than diagonal.  One should replace the
+values of the weights, means and covariances afterwards.
+
+Training functions
+-------
 
 ```julia
 em!(gmm::GMM, x::Matrix; nIter::Int = 10, varfloor=1e-3)
@@ -124,10 +126,32 @@ history(gmm::GMM)
 ```
 Shows the history of the GMM, i.e., how it was initialized, split, how the parameters were trained, etc.  A history item contains a time of completion and an event string. 
 
+Other functions
+---------------
+
+ - `split(gmm; minweight=1e-5, covfactor=0.2)`: Doubles the number of Gaussians by splitting each Gaussian into two Gaussians.  `minweight` is used for pruning Gaussians with too little weight, these are replaced by an extra split of the Gaussian with the highest weight.  `covfactor` controls how far apart the means of the split Gaussian are positioned. 
+ - `kind(gmm)`: returns `:diag` or `:full`, depending on the type of covariance matrix
+ - `eltype(gmm)`: returns the datatype of `w`, `μ` and `Σ` in the GMM
+ - `weights{gmm)`: returns the weights vector `w`
+ - `means(gmm)`: returns the means `μ` as an `n` by `d` matrix
+ - `covars(gmm)`: returns the covariances `Σ`
+ - `copy(gmm)`: returns a deep copy of the GMM
+ - `full(gmm)`: returns a full covariance GMM based on `gmm`
+ - `diag(gmm)`: returns a diagonal GMM, by ignoring off-diagonal elementsin `gmm`
+
+Converting the GMMs
+-------------------
+The element type in the GMM can be changed like you would expect.  We also have import and export functions for `MixtureModel`, which currently only has `Float64` element type. 
+ - `convert(::Type{GMM{datatype}}, gmm)`: convert the data type of the GMM
+ - `float16(gmm)`, `float32(gmm)`, `float64(gmm)`: convenience functions for `convert()`
+ - `MixtureModel(gmm)`: construct an instance of type `MixtureModel` from the GMM.  Please note that for functions like `pdf(m::MixtureModel, x::Matrix)` the data `x` run "sideways" rather than "down" as in this package. 
+ - `GMM(m::MixtureModel{Multivariate,Continuous,MvNormal})`: construct a GMM from the right kind of MixtureModel. 
+
 Paralellization
 ---------------
+Training a large GMM with huge quantities of data can take a significant amount of time.  We have built-in support for the parallelization infrastructure in Julia. 
 
-The method `stats()`, which is at the heart of EM, can detect multiple processors available (through `nprocs()`).  If there is more than 1 processor available, the data is split into chunks, each chunk is mapped to a separate processor, and afterwards an aggregating operation collects all the statistics from the sub-processes.  In an SGE environment you can obtain more cores (in the example below 20) by issuing
+The method `stats()`, which is at the heart of EM algorithm, can detect multiple processors available (through `nprocs()`).  If there is more than 1 processor available, the data is split into chunks, each chunk is mapped to a separate processor, and afterwards all the statistics from the sub-processes are aggregated.  In an SGE environment you can obtain more cores (in the example below 20) by issuing
 
 ```julia
 using ClusterManagers
@@ -135,6 +159,8 @@ ClusterManagers.addprocs_sge(20)
 @everywhere using GaussianMixtures
 ```
 
+The size of the GMM and the data determine whether or not it is advantageous to do this. 
+
 Memory
 ------
 The `stats()` method (see below) needs to be very efficient because for many algorithms it is at the inner loop of the calculation.  We have a highly optimized BLAS friendly and parallizable implementation, but this requires a fair bit of memory.  Therefore the input data is processed in blocks in such a way that only a limited amount of memory is used.  By default this is set at 2GB, but it can be specified though a gobal setting:
@@ -144,6 +170,13 @@ setmem(gig)
 ```
 Set the memory approximately used in `stats()`, in Gigabytes. 
 
+Baum-Welch statistics
+--------------------
+
+At the heart of EM training, and to many other operations with GMMs, lies the computation of the Baum-Welch statistics of the data when aligning them to the GMM.  We have optimized implementations of the basic calculation of the statistics:
+
+ - `stats(gmm::GMM, x::Matrix; order=2, parallel=true)` Computes the Baum-Welch statistics up to order `order` for the alignment of the data `x` to the Universal Background GMM `gmm`.  The 1st and 2nd order statistics are retuned as an `n` x `d` matrix, so for obtaining statistics in supervector format, flattening needs to be carried out in the right direction.  Theses statistics are _uncentered_. 
+
 Random GMMs
 -----------
 Sometimes is it insteresting to generate random GMMs, and use these to genrate random points. 
@@ -162,11 +195,6 @@ Speaker recognition methods
 
 The following methods are used in speaker- and language recognition, they may eventually move to another module. 
 
-```julia
-stats(gmm::GMM, x::Matrix; order=2, parallel=true)
-```
-Computes the Baum-Welch statistics up to order `order` for the alignment of the data `x` to the Universal Background GMM `gmm`.  The 1st and 2nd order statistics are retuned as an `n` x `d` matrix, so for obtaining a supervector flattening needs to be carried out in the right direction.  Theses statistics are _uncentered_. 
-
 ```julia
 csstats(gmm::GMM, x::Matrix, order=2)
 ```
@@ -224,32 +252,32 @@ In our implementation, we follow the approach from section 10.2 of [Christopher
 The variational Bayes training uses two new types, a prior and a variational GMM:
 
 ```julia
-type GMMprior{T<:FloatingPoint}
-    α0::T                       # effective prior number of observations
-    β0::T
-    m0::Vector{T}               # prior on μ
-    W0::Matrix{T}               # prior precision
-    ν0::T                       # scale precision
+type GMMprior
+    α0                       # effective prior number of observations
+    β0
+    m0::Vector               # prior on μ
+    ν0                       # scale precision
+    W0::Matrix               # prior precision
 end
 
-type VGMM{T<:FloatingPoint} <: GaussianMixture{T}
-    n::Int                      # number of Gaussians
-    d::Int                      # dimension of Gaussian
-    π::GMMprior                 # The prior used in this VGMM
-    α::Vector{T}                # Dirichlet, n
-    β::Vector{T}                # scale of precision, n
-    m::Matrix{T}                # means of means, n * d
-    ν::Vector{T}                # no. degrees of freedom, n
-    W::Vector{Matrix{T}}        # scale matrix for precision? n * d * d
-    hist::Vector{History}       # history
+type VGMM
+    n::Int                   # number of Gaussians
+    d::Int                   # dimension of Gaussian
+    π::GMMprior              # The prior used in this VGMM
+    α::Vector                # Dirichlet, n
+    β::Vector                # scale of precision, n
+    m::Matrix                # means of means, n * d
+    ν::Vector                # no. degrees of freedom, n
+    W::Vector{Matrix}        # scale matrix for precision? n * d * d
+    hist::Vector{History}    # history
 end
 ```
-Note that we currently only have full covariance VGMMs.  
+Please note that we currently only have full covariance VGMMs.  
 Training using observed data `x` needs some initial GMM and a prior.
 
 ```julia
 g = GMM(8, x, kind=:full, nIter=0) ## only do k-means initialization of GMM g
-p = GMMprior(g.d, 1.0, 1.0)  ## set α0 and β0 to 1, and other values to a default
+p = GMMprior(g.d, 0.1, 1.0)  ## set α0=0.1 and β0=1, and other values to a default
 v = VGMM(g, p) ## initialize variational GMM v with g
 ```