Skip to content

Commit

Permalink
added ldg_v3 files with new models and params
Browse files Browse the repository at this point in the history
  • Loading branch information
sungsujaing committed Aug 30, 2019
1 parent 727f5c8 commit 3a121b1
Show file tree
Hide file tree
Showing 29 changed files with 1,922 additions and 10 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@
emnist_data
(archive)vae_test_standford_dog_breed_dataset/best_weight_DCP_v1
crafted_images.pptx
__pycache__
__pycache__
ldg_v2.py
56 changes: 47 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,62 @@
# letter_digit_generator_VAE
This project aims to build a variational autoencoder (VAE) to generate arbitrary handwritten letters/digits based on the input. Based on the [EMNIST dataset](https://www.nist.gov/node/1298471/emnist-dataset), the VAE model is trained to encode the handwritten letters and digits into a latent vector space. With the random sampling or interpolation technique, imaginary letters and digits are obtained.
This project aims to build a conditional variational autoencoder (CVAE) to generate arbitrary handwritten letters/digits based on the keyboard input. Based on the [EMNIST dataset](https://www.nist.gov/node/1298471/emnist-dataset), the CVAE model is trained to encode the handwritten letters/digits into a latent vector space. With a random sampling or interpolation technique, imaginary letters and digits are generated.

### EMNIST data examples
<p align="center">
<img src="images/EMNIST_samples.png" width="55%"></p>
</p>

## LDG Version 2
### [ldg_v2_gen](https://github.com/sungsujaing/Artificial_Intelligence_Data_Science_Portfolio/blob/master/Letter_digit_generator/ldg_v2_gen.py):
A command-line letters/digits generator based on ldg_v2 Conv-CVAE model (details below). It simply loads the Conv-CVAE model and the corresponding best weights to produce results.
## LDG Version 3
* Loss: binary crossentropy
* Optimizer: Adam
* Latent dimension: 6
* Image normalization: [0, 1]
* Last activation function of the decoder: sigmoid
* Convolutional CVAE layers: [784,62]-[784]-[(28,28,1)]-[(14,14,16)]-[(7,7,32)]-[1568]-[64]-[6] // [6,62]-[64]-[1568]-[(7,7,32)]-[(14,14,32)]-[(28,28,16)]-[(28,28,1)]-[784]
* Multi-layer CVAE layers: [784,62]-[256]-[128]-[6] // [6,62]-[128]-[256]-[784]

### [ldg_v3_gen](https://github.com/sungsujaing/Artificial_Intelligence_Data_Science_Portfolio/blob/master/Letter_digit_generator/ldg_v3_gen.py)
A command-line letters/digits generator based on the ldg_v3 Conv-CVAE model (details below). It simply loads the Conv-CVAE model and the corresponding best weights to produce results.

<p align="center">
<img src="images/ldg_v2_gen_help.png" width="50%">
<img src="images/summary_ldg_v3_gen.png" width="100%">
</p>

### [letter_digit_generator_convolutional-CVAE](https://github.com/sungsujaing/letter_digit_generator_VAE/blob/master/letter_digit_generator_v3_conv-CVAE.ipynb) and [letter_digit_generator_multi-layer-CVAE](https://github.com/sungsujaing/letter_digit_generator_VAE/blob/master/letter_digit_generator_v3_CVAE.ipynb)
* label inputs to both encoder and decoder
### Training

<p align="center">
<img src="images/ldg_v2_gen_summary.png" width="100%">
<img src="images/summary_ldg_v3_training.png" width="65%"></p>
</p>

### [letter_digit_generator_convolutional-CVAE](https://github.com/sungsujaing/letter_digit_generator_VAE/blob/master/letter_digit_generator_v2_conv-CVAE.ipynb) and [letter_digit_generator_vanilla-CVAE](https://github.com/sungsujaing/letter_digit_generator_VAE/blob/master/letter_digit_generator_v2_CVAE.ipynb)
### Dataset reconstruction

<p align="center">
<img src="images/summary_ldg_v3_reconstruction.png" width="85%"></p>
</p>

### Generating new letters/digits (with/without arbitrary binary threshold filter)

<p align="center">
<img src="images/summary_ldg_v3_testing.png" width="100%"></p>
</p>

## LDG Version 2
* Loss: MSE
* Optimizer: Adam
* Latent dimension: 10
* Image normalization: [-1, 1]
* Last activation function of the decoder: tanh
* Convolutional CVAE layers: [784,62]-[784]-[(28,28,1)]-[(28,28,16)]-[(28,28,32)]-[(28,28,64)]-[12544]-[128]-[10] // [10,62]-[128]-[12544]-[(14,14,64)]-[(28,28,32)]-[(28,28,16)]-[(28,28,1)]-[784]
* Multi-layer CVAE layers: [784,62]-[512]-[256]-[10] // [10,62]-[256]-[512]-[784]

### [ldg_v2_gen](https://github.com/sungsujaing/Artificial_Intelligence_Data_Science_Portfolio/blob/master/Letter_digit_generator/ldg_v2_gen.py)
A command-line letters/digits generator based on ldg_v2 Conv-CVAE model (details below). It simply loads the Conv-CVAE model and the corresponding best weights to produce results.

### [letter_digit_generator_convolutional-CVAE](https://github.com/sungsujaing/letter_digit_generator_VAE/blob/master/letter_digit_generator_v2_conv-CVAE.ipynb) and [letter_digit_generator_multi-layer-CVAE](https://github.com/sungsujaing/letter_digit_generator_VAE/blob/master/letter_digit_generator_v2_CVAE.ipynb)
* label inputs to both encoder and decoder
### Training
### Training (direct comparison is difficult due to the difference in epochs)

<p align="center">
<img src="images/summary_ldg_v2_training.png" width="65%"></p>
Expand All @@ -30,7 +68,7 @@ A command-line letters/digits generator based on ldg_v2 Conv-CVAE model (details
<img src="images/summary_ldg_v2_reconstruction.png" width="85%"></p>
</p>

### Generating new letters/digits
### Generating new letters/digits (with/without arbitrary binary threshold filter)

<p align="center">
<img src="images/summary_ldg_v2_testing.png" width="100%"></p>
Expand Down
Binary file not shown.
Binary file not shown.
Binary file added generated_images/test_v3 with th.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added generated_images/test_v3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed images/ldg_v2_conv-cvae_output_results.PNG
Binary file not shown.
Binary file removed images/ldg_v2_conv-cvae_test_reconstruction.PNG
Binary file not shown.
Binary file removed images/ldg_v2_conv-cvae_training_plot.PNG
Binary file not shown.
Binary file removed images/ldg_v2_conv-cvae_training_reconstruction.PNG
Binary file not shown.
Binary file removed images/ldg_v2_cvae_output_results.PNG
Binary file not shown.
Binary file removed images/ldg_v2_cvae_test_reconstruction.PNG
Binary file not shown.
Binary file removed images/ldg_v2_cvae_training_plot.PNG
Binary file not shown.
Binary file removed images/ldg_v2_cvae_training_reconstruction.PNG
Binary file not shown.
Binary file removed images/ldg_v2_gen_help.png
Binary file not shown.
Binary file removed images/ldg_v2_gen_input_example_1.png
Binary file not shown.
Binary file removed images/ldg_v2_gen_input_example_2.png
Binary file not shown.
Binary file removed images/ldg_v2_gen_summary.png
Binary file not shown.
Binary file removed images/ldg_v2_output_input.PNG
Binary file not shown.
Binary file modified images/summary_ldg_v2_reconstruction.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/summary_ldg_v2_testing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/summary_ldg_v2_training.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/summary_ldg_v3_gen.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/summary_ldg_v3_reconstruction.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/summary_ldg_v3_testing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/summary_ldg_v3_training.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
158 changes: 158 additions & 0 deletions ldg_v3_gen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
from keras.layers import Lambda, Input, Dense, Flatten, Conv2D, Conv2DTranspose
from keras.layers import Activation, BatchNormalization, Reshape, Concatenate
from keras.models import Model
from keras.utils import to_categorical
from keras import backend as K
from keras import initializers
import numpy as np
import matplotlib.pyplot as plt
import os
import argparse

label_str = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
lebel_dict = {i: v for i, v in enumerate(label_str)}
n_label = len(label_str)

img_rows = 28
img_cols = 28
img_channel = 1
orig_dimension = img_rows * img_cols
image_shape = (img_rows, img_cols, img_channel)
latent_dim = 6


def sampling(arg):
arg = [z_mean, z_log_var]
dim = K.int_shape(z_mean)[1]
# reparameterization trick
epsilon = K.random_normal(
shape=(K.shape(z_mean)[0], dim), mean=0.0, stddev=1.0)
return z_mean + K.exp(0.5 * z_log_var) * epsilon


w_init = initializers.random_normal(stddev=0.02)
gamma_init = initializers.random_normal(mean=1.0, stddev=0.02)

# encoder
img_inputs = Input(shape=(orig_dimension,),name='image_input')
label_inputs = Input(shape=(n_label,),name='label_input')
encoder_inputs = Concatenate()([img_inputs,label_inputs])

x = Dense(orig_dimension,kernel_initializer=w_init,activation='relu')(encoder_inputs)

x = Reshape(image_shape)(x)
x = Conv2D(16,3,strides=2,padding='same',kernel_initializer=w_init)(x)
x = BatchNormalization(gamma_initializer=gamma_init)(x)
x = Activation('relu')(x)
x = Conv2D(32,3,strides=2,padding='same',kernel_initializer=w_init)(x)
x = BatchNormalization(gamma_initializer=gamma_init)(x)
x = Activation('relu')(x)

before_flatten_shape = K.int_shape(x)
x = Flatten()(x)
x = Dense(64,kernel_initializer=w_init,activation='relu')(x)

z_mean = Dense(latent_dim, name='z_mean')(x)
z_log_var = Dense(latent_dim, name='z_log_var')(x)

z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var])

encoder = Model([img_inputs,label_inputs],[z_mean,z_log_var,z],name='encoder')

# decoder
latent_inputs = Input(shape=(latent_dim,),name='latent_inputs')
decoder_inputs = Concatenate()([latent_inputs,label_inputs])

x = Dense(64,kernel_initializer=w_init,activation='relu')(decoder_inputs)
x = Dense(before_flatten_shape[1]*before_flatten_shape[2]*before_flatten_shape[3],activation='relu',kernel_initializer=w_init)(x)
x = Reshape((before_flatten_shape[1],before_flatten_shape[2],before_flatten_shape[3]))(x)

x = Conv2DTranspose(32,3,strides=2,padding='same',kernel_initializer=w_init)(x)
x = BatchNormalization(gamma_initializer=gamma_init)(x)
x = Activation('relu')(x)

x = Conv2DTranspose(16,3,strides=2,padding='same',kernel_initializer=w_init)(x)
x = BatchNormalization(gamma_initializer=gamma_init)(x)
x = Activation('relu')(x)

x = Conv2DTranspose(img_channel,3,activation='sigmoid',padding='same',kernel_initializer=w_init)(x)
outputs = Flatten()(x)

# instantiate decoder model
decoder = Model([latent_inputs,label_inputs],outputs,name='decoder')

# VAE
outputs = decoder([encoder([img_inputs, label_inputs])[2], label_inputs])
vae = Model([img_inputs, label_inputs], outputs)


def load_weights_vae(weight_name=None):
# load all the weights for encoder and decoder when loading for vae
vae.load_weights(os.path.join(saving_folder, weight_name))


def find_idx_from_label_dict(search_str):
for i, val in lebel_dict.items():
if val == search_str:
return i


def letter_digit_gen(input_str, th=None):
gap = 20
img = np.zeros((img_cols, gap * len(input_str) + (img_cols - gap)))
for idx, l in enumerate(input_str):
if l == ' ':
img[:, gap * idx:gap * idx +
img_cols] += np.zeros((img_cols, img_cols))
elif l not in label_str:
pass
else:
cls_idx = find_idx_from_label_dict(l)
latent = np.random.randn(latent_dim)
latent = np.expand_dims(np.random.randn(latent_dim), 0)
generated = decoder.predict(
[latent, np.expand_dims(to_categorical(cls_idx, n_label), 0)])
generated = generated.reshape(img_rows, img_cols)
generated = np.transpose(generated)
img[:, gap * idx:gap * idx + img_cols] += generated
if th != None:
for i, v in np.ndenumerate(img):
if v >= th:
img[i] = 1
else:
img[i] = 0
return img


def restricted_float(x):
x = float(x)
if x < 0.0 or x > 1.0:
raise argparse.ArgumentTypeError(
'{} is not in range [0.0, 1.0]'.format(x))
return x


parser = argparse.ArgumentParser()
parser.add_argument('input_str', help='input string to be converted')
parser.add_argument('-t', '--threshold', type=restricted_float,
help='binary threshold: float[0-1]')
parser.add_argument('-s', '--save',
help='save the image in .png')
args = parser.parse_args()

input_str = args.input_str

saving_folder = 'best_weight_ldg_v3_conv-cvae'
load_weights_vae(
weight_name='ldg_v3_conv-cvae-best-wiehgts-099-148.874-149.955.h5')

img = letter_digit_gen(input_str, th=args.threshold)
plt.figure(figsize=(5, 5))
plt.axis('off')
plt.imshow(1 - img, cmap='gray')

if args.save:
os.makedirs('generated_images', exist_ok=True)
plt.savefig('generated_images/{}.png'.format(args.save))

plt.show()
831 changes: 831 additions & 0 deletions letter_digit_generator_v3_CVAE.ipynb

Large diffs are not rendered by default.

884 changes: 884 additions & 0 deletions letter_digit_generator_v3_conv-CVAE.ipynb

Large diffs are not rendered by default.

0 comments on commit 3a121b1

Please sign in to comment.