This program performs image and bounding box recognition on a series of digtits ranging in squence from 1-5 using a Convolutional Neural Network (CNN). The MNIST and SVHN datasets are preprocessed and normalized then used to train a CNN consisting of convolutional, max pooling, dropout, fully connected, and output layers.
- Python 3.6
- I recommend installing Anaconda as it is alreay set up with standard machine learning libraries
- If unfamiliar with the command line there are graphical installs for macOS, Windows, and Linux
- PIL
pip install pillow
for python 3
- six
- TensorFlow
In this study the MNIST and SVHN datasets were used to create a combined dataset of hand drawn digits and house numbers in groupings of 1-5 digits. There are a total of roughly 320k images: 280k training images, 15k validation images, and 23k testing images.
The images are 32x32x1 grayscale format with 32 representing the pixel width and height and 1 representing the gray color dimension. Each image has a corresponding label which lists the numbers of digits in the image and digit themselves, including a label representing the absence of a digit in cases where there are less than 5 digits (the maximum number of digits in an image). The SVHN dataset also includes bounding box information which will be used in the second half of the project to determine digit location.
depth
- Alter the depths of the CNN layers using common memory sizesepochs
- number of training iterationsbatch_size
- set to highest number your machine has memory for during common memory sizeskeep_probability
- probability of keeping activation node in dropout layer
Run the files in the order specified below.
Command Line
python create MNIST_multi-digit-dataset.py
- Creates multi-digit MNIST 32x32 dataset
python create_bbox_SVHN_dataset.py
- Creates SVHN 32x32 dataset with bounding boxes
python create_combined_dataset.py
- Combines and randomizes the previous two dataset
python create_real_world_dataset.py
- Create a grayscaled images from real world pictures
python train_digit_recognition_CNN.py
- Trains network on the combined dataset and outputs loss and accuracy data into tensorboard files
To view the tensorboard loss and accuracy outputs, follow these instruntions from the tensorflow website.
train_bounding_box_CNN.py
- Trains the network on the SVHN bounding box dataset and outputs predicted bounding box examples on the real world dataset
The image_classification program is a public domain work, dedicated using CC0 1.0. I encourage you to use it, and enhance your understanding of CNNs and the deep learning concepts therein. :)