Skip to content
/ tfRAM Public

Tensorflow implementation of Recurrent Models of Visual Attention

Notifications You must be signed in to change notification settings

slundqui/tfRAM

Repository files navigation

Tensorflow implementation of Recurrent Models of Visual Attention (Mnih et al. 2014), with additional research. Code based off of https://github.com/zhongwen/RAM.

Results

60 by 60 Translated MNIST

Model Error
FC, 2 layers (64 hiddens each) 6.78%
FC, 2 layers (256 hiddens each) 2.65%
Convolutional, 2 layers 1.57%
RAM, 4 glimpses, 12 x 12, 3 scale 1.54%
RAM, 6 glimpses, 12 x 12, 3 scale 1.08%
RAM, 8 glimpses, 12 x 12, 3 scale 0.94%

60 by 60 Cluttered Translated MNIST

Model Error
FC, 2 layers (64 hiddens each) 29.13%
FC, 2 layers (256 hiddens each) 11.36%
Convolutional, 2 layers 8.37%
RAM, 4 glimpses, 12 x 12, 3 scale 5.15%
RAM, 6 glimpses, 12 x 12, 3 scale 3.33%
RAM, 8 glimpses, 12 x 12, 3 scale 2.63%

100 by 100 Cluttered Translated MNIST

Model Error
Convolutional, 2 layers 16.22%
RAM, 4 glimpses, 12 x 12, 3 scale 14.86%
RAM, 6 glimpses, 12 x 12, 3 scale 8.3%
RAM, 8 glimpses, 12 x 12, 3 scale 5.9%

60 by 60 Cluttered MNIST 6 Glimpses Examples

Solid square is first glimpse, line is path of attention, circle is last glimpse.
Mean output Sampled output
mean0 samp0
mean1 samp1
mean2 samp2
mean3 samp3
mean4 samp4

About

Tensorflow implementation of Recurrent Models of Visual Attention

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages