Skip to content


Excercise 02 finished
Browse files Browse the repository at this point in the history
  • Loading branch information
wrongu committed Jan 22, 2018
1 parent a9e94f4 commit 78a3735
Show file tree
Hide file tree
Showing 7 changed files with 240 additions and 0 deletions.
47 changes: 47 additions & 0 deletions 02 - ELBO with analytic KL/
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Exercise 02 - Extending the ELBO with an analytic KL term

A Generic Latent Base-Class

In the first exercise, we saw one latent class called `DiagonalGaussianLatent` in ``. In this exercise, a generic base class has been created called `Latent` which now takes care of the constructor and computing the output shape, since these operations would be common to all `Latent`s. Each `Latent` subclass must still implement the following as part of the keras `Layer` interface:

* `build()` - creates layer parameters
* `call(x)` - implements the layer's actual transformation of inputs to outputs

In the previous exerceise we added functions specific to our definition of a `Latent`, namely:

* `log_prob(x)` - uses the current/most recent parameters to compute the log probability of a batch of inputs
* `sample_kl()` - uses the current/most recent _sample_ of the latent to compute a monte-carlo estimate of KL from its posterior to the prior.

Implementing an analytic KL method for gaussians

For certain distributions, the KL term in the ELBO objective can be computed analytically. In exercise 01, we used the fact that KL is an _expectation_ to estimate KL using samples of `Q` (the distribution defined by `self.mean` and `self.log_var` in the `DiagonalGaussianLatent` class). This estimate of KL will have high variance simply by virtue of being a monte-carlo estimate. Because our current prior is also Gaussian, we can instead use the following formula for KL between two gaussians:

kl(p1||p2) = [log(det(C2)/det(C1)) - dim + Tr(C2^-1*C1) + (m2-m1).T*C2^-1*(m2-m1)]/2

where `C1` and `C2` are covariances, `m1` and `m2` are means, `det` is the determinant, and `Tr` is the trace.

**The goal of this exercise is to implement an interface where the ELBO objective uses the analytic form of KL when it is available and automatically falls back to the monte-carlo estimate when it is not.** For example, if we later choose to replace the gaussian prior with some complicated nonparametric form, it would automatically fall back to the monte carlo estimate under the hood with no extra work in designing the model.

**Implement `DiagonalGaussianLatent.analytic_kl()`.** Just like `sample_kl()`, it takes no inputs but instead uses the current values in `self.mean`, `self.log_var`, and `self.prior`. Your function should return a keras tensor with shape `(batch,)`. Hint: the `IsoGaussianPrior` class has mean `0` and covariance equal to the identity matrix. Using this, you should be able to compute KL using only `K.exp` and `K.sum`.

If `self.prior` is not an instance of a class for which the analytic form is known, your `analytic_kl` method should raise a `TypeError` (in python you can check if `isinstance(self.prior, IsoGaussianPrior)`).

Flexibly choosing between analytic and monte-carlo KL

**Implement the "fallback" logic in `VAE.elbo`.** Since `analytic_kl` throws a `TypeError`, you can implement the "fall back" logic using `try: ... except TypeError: ...`.

You may be worried that `try ... except ...` is either inelegant or slow. Regarding elegance, it is a surprisingly common pattern to see in python. Regarding speed, remember that _the `elbo` function is only ever called once._ This is the key difference between working with computation graphs like tensorflow or theano and working directly with data in numpy. Since this is a computation graph, `VAE.elbo()` simply builds a series of operations that are not _executed_ until later, so speed while building the operations is never a concern!

Train a model

To train the model, `cd` into `02 - ELBO with analytic KL` and run `python`.

Bonus exercise(s)

Compare the training time to reach a certain loss using the different KL methods.
72 changes: 72 additions & 0 deletions 02 - ELBO with analytic KL/
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
from keras.engine.topology import Layer
from priors import IsoGaussianPrior
import keras.backend as K

class Latent(Layer):
"""Base class for VAE latents.

def __init__(self, dim, prior, **kwargs):
# Call Layer constructor
super(Latent, self).__init__(**kwargs)

# Record instance variables
self.dim = dim
self.prior = prior

def compute_output_shape(self, input_shape):
return tuple(input_shape[:-1]) + (self.dim,)

class DiagonalGaussianLatent(Latent):
"""DiagonalGaussianLatent expects flattened input with shape (batch, dim). Internally stores
2*d parameters: 'mean' and 'log_var' of each dimension of the posterior distribution that
are themselves each constructed as a dense connection from inputs. Output is (batch, d)
*sampled value* of each latent, where d is the dimensionality passed to the constructor.

def build(self, input_shape):
# Create trainable weights of this layer for the two dense connections to 'mean' and to
# 'log_var'.
input_dim = input_shape[-1]
self.dense_mean = self.add_weight(shape=(input_dim, self.dim),
self.dense_log_var = self.add_weight(shape=(input_dim, self.dim),
self.built = True

def call(self, x):
# Apply matrix multiplication of inputs and the weights created in build() to get 'mean'
# and 'log_var' parameters.
self.mean =, self.dense_mean)
self.log_var =, self.dense_log_var)

# exp(log_var / 2) is standard deviation
std = K.exp(self.log_var / 2)

# Create (reparameterized) sample from the latent distribution
sample_shape = (K.shape(self.mean)[0], self.dim)
eps = K.random_normal(shape=sample_shape, mean=0., stddev=1.0)

# Shape of self.sample is (batch, dim)
self.sample = self.mean + eps * std

return self.sample

def log_prob(self, x):
# log gaussian probability = -1/2 sum[(x-mean)^2/variance]
variance = K.exp(self.log_var)
log_det = K.sum(self.log_var, axis=-1)
x_diff = x - self.mean
return -(K.sum((x_diff / variance) * x_diff, axis=-1) + log_det) / 2

def sample_kl(self):
# Monte carlo KL estimate is simply self.log_prob - prior.log_prob
return self.log_prob(self.sample) - self.prior.log_prob(self.sample)

def analytic_kl(self):
# you should check that self.prior is of the right type first and throw an error if it is not.
17 changes: 17 additions & 0 deletions 02 - ELBO with analytic KL/
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from keras.engine.topology import Layer
import keras.backend as K

class DiagonalGaussianLikelihood(Layer):

def __init__(self, mean, std):
# Mean must be a vector of shape (dim,). std may be a vector of the same shape or a scalar.
self.mean = mean
# If std is a scalar, this creates an array of [var, var, var, ...]. If it is already a
# vector, this does nothing.
self.var = K.ones_like(mean) * (std ** 2)

def log_prob(self, x):
# Determinant of the diagonal covariance matrix is the product of variances.
log_det = K.sum(K.log(self.var))
return -K.sum(K.square(x - self.mean) / (2 * self.var), axis=-1) - log_det / 2
55 changes: 55 additions & 0 deletions 02 - ELBO with analytic KL/
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
from vae import VAE
from latents import DiagonalGaussianLatent
from priors import IsoGaussianPrior
from likelihoods import DiagonalGaussianLikelihood
from keras.layers import Input, Dense
from data.my_mnist import img_pixels as mnist_pixels
import os

def fit_vae(vae, x_train, x_test=None, epochs=100, batch=100, weights_file=None, recompute=False, optimizer='adam'):
"""Fit a vae object to the given dataset (for datasets that fit in memory). Both x_train and
x_test must have a number of data points divisible by the batch size.

# Load existing weights if they exist
if weights_file is not None:
if os.path.exists(weights_file) and not recompute:
return vae

# Train the model
vae.model.compile(loss=None, optimizer=optimizer)
if x_test is not None:
kwargs = {'validation_data': (x_test, None)}
kwargs = {}, shuffle=True, epochs=epochs, batch_size=batch, **kwargs)

# Save trained model to a file if given
if weights_file is not None:

def gaussian_mnist(latent_dim=2, pixel_std=.05):
inpt = Input(shape=(mnist_pixels,))
q_hidden_1 = Dense(64, activation='relu')(inpt)
q_hidden_2 = Dense(64, activation='relu')(q_hidden_1)

latent = DiagonalGaussianLatent(dim=latent_dim, prior=IsoGaussianPrior(latent_dim))
latent_sample = latent(q_hidden_2)

gen_hidden_1 = Dense(64, activation='relu')(latent_sample)
gen_hidden_2 = Dense(64, activation='relu')(gen_hidden_1)
reconstruction = Dense(mnist_pixels, activation='sigmoid')(gen_hidden_2)

# Note: in some models, pixel_std is not constant but is also an output of the model so that it
# can indicate its own uncertainty.
likelihood = DiagonalGaussianLikelihood(reconstruction, pixel_std)

# Combine the above parts into a single model
return VAE(inpt=inpt, latent=latent, reconstruction=reconstruction, likelihood=likelihood)
14 changes: 14 additions & 0 deletions 02 - ELBO with analytic KL/
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import keras.backend as K

class Prior(object):
def __init__(self, dim):
self.d = dim

class IsoGaussianPrior(Prior):
def log_prob(self, x):
return -K.sum(x * x, axis=-1) / 2

def sample(self, n):
return K.random_normal(shape=(n, self.d))
10 changes: 10 additions & 0 deletions 02 - ELBO with analytic KL/
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
from models import gaussian_mnist, fit_vae
from data.my_mnist import x_train, x_test
from visualize import render_grid

# Create and train the model
vae = gaussian_mnist(latent_dim=2, pixel_std=.05)
fit_vae(vae, x_train, x_test, epochs=100, weights_file='weights.h5')

# Visualize results
render_grid(vae.latent.sample, vae.reconstruction)
25 changes: 25 additions & 0 deletions 02 - ELBO with analytic KL/
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import keras.backend as K
from keras.engine import Model

class VAE(object):
def __init__(self, inpt, latent, reconstruction, likelihood):
# Create self.inpt, self.latent, self.reconstruction, and self.likelihood

# 'Model' is a trainable keras object.
self.model = Model(inpt, reconstruction)
# To maximize ELBO, keras will minimize "loss" of -ELBO

def elbo(self):
flat_input = K.batch_flatten(self.inpt)

# LL term is E_q(z|x) [ log p(x|z) ] and has shape (batch,)
self.ll = self.likelihood.log_prob(flat_input)

# self.kl = ...

# ELBO simply (LL - KL) and has shape (batch,)
return self.ll - self.kl

0 comments on commit 78a3735

Please sign in to comment.