Excercise 02 finished

wrongu · Jan 22, 2018 · 78a3735 · 78a3735
1 parent a9e94f4
commit 78a3735
Show file tree

Hide file tree

Showing 7 changed files with 240 additions and 0 deletions.
diff --git a/02 - ELBO with analytic KL/instructions.md b/02 - ELBO with analytic KL/instructions.md
@@ -0,0 +1,47 @@
+Exercise 02 - Extending the ELBO with an analytic KL term
+=========================================================
+
+A Generic Latent Base-Class
+---------------------------
+
+In the first exercise, we saw one latent class called `DiagonalGaussianLatent` in `latents.py`.  In this exercise, a generic base class has been created called `Latent` which now takes care of the constructor and computing the output shape, since these operations would be common to all `Latent`s. Each `Latent` subclass must still implement the following as part of the keras `Layer` interface:
+
+* `build()` - creates layer parameters
+* `call(x)` - implements the layer's actual transformation of inputs to outputs
+
+In the previous exerceise we added functions specific to our definition of a `Latent`, namely:
+
+* `log_prob(x)` - uses the current/most recent parameters to compute the log probability of a batch of inputs
+* `sample_kl()` - uses the current/most recent _sample_ of the latent to compute a monte-carlo estimate of KL from its posterior to the prior.
+
+Implementing an analytic KL method for gaussians
+------------------------------------------------
+
+For certain distributions, the KL term in the ELBO objective can be computed analytically. In exercise 01, we used the fact that KL is an _expectation_ to estimate KL using samples of `Q` (the distribution defined by `self.mean` and `self.log_var` in the `DiagonalGaussianLatent` class). This estimate of KL will have high variance simply by virtue of being a monte-carlo estimate. Because our current prior is also Gaussian, we can instead use the following formula for KL between two gaussians:
+
+  kl(p1||p2) = [log(det(C2)/det(C1)) - dim + Tr(C2^-1*C1) + (m2-m1).T*C2^-1*(m2-m1)]/2
+
+where `C1` and `C2` are covariances, `m1` and `m2` are means, `det` is the determinant, and `Tr` is the trace.
+
+**The goal of this exercise is to implement an interface where the ELBO objective uses the analytic form of KL when it is available and automatically falls back to the monte-carlo estimate when it is not.** For example, if we later choose to replace the gaussian prior with some complicated nonparametric form, it would automatically fall back to the monte carlo estimate under the hood with no extra work in designing the model.
+
+**Implement `DiagonalGaussianLatent.analytic_kl()`.** Just like `sample_kl()`, it takes no inputs but instead uses the current values in `self.mean`, `self.log_var`, and `self.prior`. Your function should return a keras tensor with shape `(batch,)`. Hint: the `IsoGaussianPrior` class has mean `0` and covariance equal to the identity matrix. Using this, you should be able to compute KL using only `K.exp` and `K.sum`.
+
+If `self.prior` is not an instance of a class for which the analytic form is known, your `analytic_kl` method should raise a `TypeError` (in python you can check if `isinstance(self.prior, IsoGaussianPrior)`).
+
+Flexibly choosing between analytic and monte-carlo KL
+-----------------------------------------------------
+
+**Implement the "fallback" logic in `VAE.elbo`.** Since `analytic_kl` throws a `TypeError`, you can implement the "fall back" logic using `try: ... except TypeError: ...`.
+
+You may be worried that `try ... except ...` is either inelegant or slow. Regarding elegance, it is a surprisingly common pattern to see in python. Regarding speed, remember that _the `elbo` function is only ever called once._ This is the key difference between working with computation graphs like tensorflow or theano and working directly with data in numpy. Since this is a computation graph, `VAE.elbo()` simply builds a series of operations that are not _executed_ until later, so speed while building the operations is never a concern!
+
+Train a model
+-------------
+
+To train the model, `cd` into `02 - ELBO with analytic KL` and run `python run.py`.
+
+Bonus exercise(s)
+-----------------
+
+Compare the training time to reach a certain loss using the different KL methods.
diff --git a/02 - ELBO with analytic KL/latents.py b/02 - ELBO with analytic KL/latents.py
@@ -0,0 +1,72 @@
+from keras.engine.topology import Layer
+from priors import IsoGaussianPrior
+import keras.backend as K
+
+
+class Latent(Layer):
+    """Base class for VAE latents.
+    """
+
+    def __init__(self, dim, prior, **kwargs):
+        # Call Layer constructor
+        super(Latent, self).__init__(**kwargs)
+
+        # Record instance variables
+        self.dim = dim
+        self.prior = prior
+
+    def compute_output_shape(self, input_shape):
+        return tuple(input_shape[:-1]) + (self.dim,)
+
+
+class DiagonalGaussianLatent(Latent):
+    """DiagonalGaussianLatent expects flattened input with shape (batch, dim). Internally stores
+       2*d parameters: 'mean' and 'log_var' of each dimension of the posterior distribution that
+       are themselves each constructed as a dense connection from inputs. Output is (batch, d)
+       *sampled value* of each latent, where d is the dimensionality passed to the constructor.
+    """
+
+    def build(self, input_shape):
+        # Create trainable weights of this layer for the two dense connections to 'mean' and to
+        # 'log_var'.
+        input_dim = input_shape[-1]
+        self.dense_mean = self.add_weight(shape=(input_dim, self.dim),
+                                          name='latent_mean_kernel',
+                                          initializer='glorot_uniform')
+        self.dense_log_var = self.add_weight(shape=(input_dim, self.dim),
+                                             name='latent_log_var_kernel',
+                                             initializer='glorot_uniform')
+        self.built = True
+
+    def call(self, x):
+        # Apply matrix multiplication of inputs and the weights created in build() to get 'mean'
+        # and 'log_var' parameters.
+        self.mean = K.dot(x, self.dense_mean)
+        self.log_var = K.dot(x, self.dense_log_var)
+
+        # exp(log_var / 2) is standard deviation
+        std = K.exp(self.log_var / 2)
+
+        # Create (reparameterized) sample from the latent distribution
+        sample_shape = (K.shape(self.mean)[0], self.dim)
+        eps = K.random_normal(shape=sample_shape, mean=0., stddev=1.0)
+
+        # Shape of self.sample is (batch, dim)
+        self.sample = self.mean + eps * std
+
+        return self.sample
+
+    def log_prob(self, x):
+        # log gaussian probability = -1/2 sum[(x-mean)^2/variance]
+        variance = K.exp(self.log_var)
+        log_det = K.sum(self.log_var, axis=-1)
+        x_diff = x - self.mean
+        return -(K.sum((x_diff / variance) * x_diff, axis=-1) + log_det) / 2
+
+    def sample_kl(self):
+        # Monte carlo KL estimate is simply self.log_prob - prior.log_prob
+        return self.log_prob(self.sample) - self.prior.log_prob(self.sample)
+
+    def analytic_kl(self):
+        # YOUR CODE HERE
+        # you should check that self.prior is of the right type first and throw an error if it is not.
diff --git a/02 - ELBO with analytic KL/likelihoods.py b/02 - ELBO with analytic KL/likelihoods.py
@@ -0,0 +1,17 @@
+from keras.engine.topology import Layer
+import keras.backend as K
+
+
+class DiagonalGaussianLikelihood(Layer):
+
+    def __init__(self, mean, std):
+        # Mean must be a vector of shape (dim,). std may be a vector of the same shape or a scalar.
+        self.mean = mean
+        # If std is a scalar, this creates an array of [var, var, var, ...]. If it is already a
+        # vector, this does nothing.
+        self.var = K.ones_like(mean) * (std ** 2)
+
+    def log_prob(self, x):
+        # Determinant of the diagonal covariance matrix is the product of variances.
+        log_det = K.sum(K.log(self.var))
+        return -K.sum(K.square(x - self.mean) / (2 * self.var), axis=-1) - log_det / 2
diff --git a/02 - ELBO with analytic KL/models.py b/02 - ELBO with analytic KL/models.py
@@ -0,0 +1,55 @@
+from vae import VAE
+from latents import DiagonalGaussianLatent
+from priors import IsoGaussianPrior
+from likelihoods import DiagonalGaussianLikelihood
+from keras.layers import Input, Dense
+from data.my_mnist import img_pixels as mnist_pixels
+import os
+
+
+def fit_vae(vae, x_train, x_test=None, epochs=100, batch=100, weights_file=None, recompute=False, optimizer='adam'):
+    """Fit a vae object to the given dataset (for datasets that fit in memory). Both x_train and
+       x_test must have a number of data points divisible by the batch size.
+    """
+
+    # Load existing weights if they exist
+    if weights_file is not None:
+        if os.path.exists(weights_file) and not recompute:
+            vae.model.load_weights(weights_file)
+            return vae
+
+    # Train the model
+    vae.model.compile(loss=None, optimizer=optimizer)
+    if x_test is not None:
+        kwargs = {'validation_data': (x_test, None)}
+    else:
+        kwargs = {}
+    vae.model.fit(x_train, shuffle=True, epochs=epochs, batch_size=batch, **kwargs)
+
+    # Save trained model to a file if given
+    if weights_file is not None:
+        vae.model.save_weights(weights_file)
+
+
+def gaussian_mnist(latent_dim=2, pixel_std=.05):
+    # RECOGNITION MODEL
+    inpt = Input(shape=(mnist_pixels,))
+    q_hidden_1 = Dense(64, activation='relu')(inpt)
+    q_hidden_2 = Dense(64, activation='relu')(q_hidden_1)
+
+    # LATENT -- PRIOR
+    latent = DiagonalGaussianLatent(dim=latent_dim, prior=IsoGaussianPrior(latent_dim))
+    latent_sample = latent(q_hidden_2)
+
+    # GENERATIVE MODEL
+    gen_hidden_1 = Dense(64, activation='relu')(latent_sample)
+    gen_hidden_2 = Dense(64, activation='relu')(gen_hidden_1)
+    reconstruction = Dense(mnist_pixels, activation='sigmoid')(gen_hidden_2)
+
+    # LIKELIHOOD
+    # Note: in some models, pixel_std is not constant but is also an output of the model so that it
+    # can indicate its own uncertainty.
+    likelihood = DiagonalGaussianLikelihood(reconstruction, pixel_std)
+
+    # Combine the above parts into a single model
+    return VAE(inpt=inpt, latent=latent, reconstruction=reconstruction, likelihood=likelihood)
diff --git a/02 - ELBO with analytic KL/priors.py b/02 - ELBO with analytic KL/priors.py
@@ -0,0 +1,14 @@
+import keras.backend as K
+
+
+class Prior(object):
+    def __init__(self, dim):
+        self.d = dim
+
+
+class IsoGaussianPrior(Prior):
+    def log_prob(self, x):
+        return -K.sum(x * x, axis=-1) / 2
+
+    def sample(self, n):
+        return K.random_normal(shape=(n, self.d))
diff --git a/02 - ELBO with analytic KL/run.py b/02 - ELBO with analytic KL/run.py
@@ -0,0 +1,10 @@
+from models import gaussian_mnist, fit_vae
+from data.my_mnist import x_train, x_test
+from visualize import render_grid
+
+# Create and train the model
+vae = gaussian_mnist(latent_dim=2, pixel_std=.05)
+fit_vae(vae, x_train, x_test, epochs=100, weights_file='weights.h5')
+
+# Visualize results
+render_grid(vae.latent.sample, vae.reconstruction)
diff --git a/02 - ELBO with analytic KL/vae.py b/02 - ELBO with analytic KL/vae.py
@@ -0,0 +1,25 @@
+import keras.backend as K
+from keras.engine import Model
+
+
+class VAE(object):
+    def __init__(self, inpt, latent, reconstruction, likelihood):
+        # Create self.inpt, self.latent, self.reconstruction, and self.likelihood
+        self.__dict__.update(locals())
+
+        # 'Model' is a trainable keras object.
+        self.model = Model(inpt, reconstruction)
+        # To maximize ELBO, keras will minimize "loss" of -ELBO
+        self.model.add_loss(-self.elbo())
+
+    def elbo(self):
+        flat_input = K.batch_flatten(self.inpt)
+
+        # LL term is E_q(z|x) [ log p(x|z) ] and has shape (batch,)
+        self.ll = self.likelihood.log_prob(flat_input)
+
+        # YOUR CODE HERE
+        # self.kl = ...
+
+        # ELBO simply (LL - KL) and has shape (batch,)
+        return self.ll - self.kl