Enhancing Vision with Convolutional Neural Networks

1.3.1 Convolution and pooling

1.3.1.1 Convolution

Explain how convolution works:

For every pixel, take its value, and take a look at the value of its neighbors. If our filter is 3x3, then we can take a look at the immediate neighbor, so that you have a corresponding 3x3 grid. Then to get the new value for the pixel, we simply multiply each neighbor by the corresponding value in the filter.

So, for example, in this case, our pixel has the value 192, and its upper left neighbor has the value zero. The upper left value and the filter is negative one, so we multiply zero by negative one. Then we would do the same for the upper neighbor. Its value is 64 and the corresponding filter value was zero, so we'd multiply those out. Repeat this for each neighbor and each corresponding filter value, and would then have the new pixel with the sum of each of the neighbor values multiplied by the corresponding filter value, and that's a convolution. A technique to isolate features in images.

Convolution is used for emphasizing

The idea here is that some convolutions will change the image in such a way that certain features in the image get emphasized.

So, for example, if you look at this filter, then the vertical lines in the image really pop out:

With this filter, the horizontal lines pop out:

Convolution combined with pooling, they can become really powerful. Usually adding convolution layers and pooling layers will make model more accurate, empirically.

1.3.1.2 Pooling

Simply, pooling is a way of compressing an image. A technique to reduce the information in an image while maintaining features.

Of these four, pick the biggest value and keep just that.

Purpose of pooling

This will preserve the features that were highlighted by the convolution, while simultaneously quartering the size of the image. We have the horizontal and vertical axes.

1.3.2 Convolution & pooling in code

For convolution and pooling in code, we just need to add few layers before flatterning layer.

Conv2D -- First convolution layer, asking keras to generate 64 filters for us. Those filters are not random. They start with a set of known good filters in a similar way to the pattern (fitting that you saw earlier). The ones (that work) from that set (are learned over time). These filters are 3 by 3, their activation is relu, which means the negative values will be thrown way, and finally the input shape is as before, the 28 by 28. That extra 1 just means that we are tallying(计数理货) using a single byte for color depth. As we saw before our image is our gray scale, so we just use one byte.
MaxPooling2D -- First pooling layer. Max-pooling: take the maximum value. It's a two-by-two pool, so for every four pixels, the biggest one will survive as shown earlier.
Conv2D -- Second convolution layer.
MaxPooling2D -- Second pooling layer.

So, by the time the image gets to the flatten to go into the dense layers, it's already much smaller. It's being quartered, and then quartered again. So, its content has been greatly simplified.

The last 3 layers are same as before in 1.2.

1.3.3 model.summary()

Allows you to inspect the layers of the model, and see the journey of the image through the convolutions, and here is the output.

First line, the output shape isn't the data 28 by 28, so y is the output, 26 by 26. Because logically, the first pixel that you can do calculations on is this one, because this one of course has all eight neighbors that a three by three filter needs.

So the output of the convolution will be two pixels smaller on x, and two pixels smaller on y.

If your filter is five-by-five for similar reasons, your output will be four smaller on x, and four smaller on y. So, that's y with a three by three filter, our output from the 28 by 28 image, is now 26 by 26, we've removed that one pixel on x and y, and each of the borders.

Second line (the first pooling layer), remember we specified it to be two-by-two, thus turning four pixels into one, and having our x and y. So, now our output gets reduced from 26 by 26, to 13 by 13.
Forth line, input is 11 by 11, add another two-by-two max-pooling to have this rounding down, and went down, down to five-by-five images (pay attention is not 6x6).
Fifth line, the input is not just one compress five-by-five image instead of the original 28 by 28, there are a number of convolutions per image that we specified, in this case 64. So, there are 64 new images of five-by-five that had been fed in.

Flatten that out and you have (5x5=)25 pixels times 64, which is 1600 , as opposed to the 784(=28x28) that you had previously.

1.3.4 Try it yourself (Fashion MNIST)

Offical code

my code

Fashion MNIST Exercise 1: Try change 32 to 64 filters in first convolution layer. It takes longer time but the accuracy is better.

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

Fashion MNIST Exercise 3: add more convolution layers (3 layers). It takes longer time but the accuracy is worse.

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2, 2),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

1.3.5 More exercise about convolution and pooling

Official code

More introduction and explanation about image processing:

Introduction
Convolution
Blur
Gaussian Blur
Motion Blur
Find Edges
Sharpen
Emboss
Mean and Median Filter
Conclusion

My code

1.3.6 Exercise 3 - Handwriting recognition with CNN (MNIST)

Offical code

My code We can see after adding convolution and pooling layers, the accuracy is much better than 1.2 Exercise.

Output of 1.2 Simple neural network V.S. 1.3 Add convolution and pooling:

Epoch 1/20
60000/60000 [==============================] - 27s 444us/sample - loss: 0.1793 - acc: 0.9447
Epoch 2/20
60000/60000 [==============================] - 27s 443us/sample - loss: 0.0798 - acc: 0.9761
Epoch 3/20
60000/60000 [==============================] - 27s 445us/sample - loss: 0.0570 - acc: 0.9814
Epoch 4/20
60000/60000 [==============================] - 27s 443us/sample - loss: 0.0421 - acc: 0.9866
Epoch 5/20
60000/60000 [==============================] - 27s 456us/sample - loss: 0.0346 - acc: 0.9890
Epoch 6/20
59968/60000 [============================>.] - ETA: 0s - loss: 0.0293 - acc: 0.9908
Reached 99% accuracy so cancelling training!
60000/60000 [==============================] - 28s 463us/sample - loss: 0.0295 - acc: 0.9908
10000/10000 [==============================] - 1s 118us/sample - loss: 0.0937 - acc: 0.9766
Out[0]:
[0.09371699871601905, 0.9766]

_________________________________________________________________
Epoch 1/20
60000/60000 [==============================] - 141s 2ms/sample - loss: 0.1091 - acc: 0.9659
Epoch 2/20
60000/60000 [==============================] - 141s 2ms/sample - loss: 0.0421 - acc: 0.9870
Epoch 3/20
60000/60000 [==============================] - 140s 2ms/sample - loss: 0.0282 - acc: 0.9918
Epoch 4/20
60000/60000 [==============================] - 140s 2ms/sample - loss: 0.0236 - acc: 0.9929
Epoch 5/20
60000/60000 [==============================] - 141s 2ms/sample - loss: 0.0177 - acc: 0.9951
Epoch 6/20
60000/60000 [==============================] - 140s 2ms/sample - loss: 0.0142 - acc: 0.9958
Epoch 7/20
60000/60000 [==============================] - 140s 2ms/sample - loss: 0.0134 - acc: 0.9956
Epoch 8/20
60000/60000 [==============================] - 141s 2ms/sample - loss: 0.0116 - acc: 0.9965
Epoch 9/20
60000/60000 [==============================] - 140s 2ms/sample - loss: 0.0109 - acc: 0.9968
Epoch 10/20
59968/60000 [============================>.] - ETA: 0s - loss: 0.0070 - acc: 0.9980
Reached 99.8% accuracy so cancelling training!
60000/60000 [==============================] - 141s 2ms/sample - loss: 0.0070 - acc: 0.9980
10000/10000 [==============================] - 6s 607us/sample - loss: 0.0430 - acc: 0.9898
[0.04297752619532357, 0.9898]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.3cnn.md

1.3cnn.md

Enhancing Vision with Convolutional Neural Networks

1.3.1 Convolution and pooling

1.3.1.1 Convolution

Explain how convolution works:

Convolution is used for emphasizing

Convolution combined with pooling, they can become really powerful. Usually adding convolution layers and pooling layers will make model more accurate, empirically.

1.3.1.2 Pooling

Purpose of pooling

1.3.2 Convolution & pooling in code

1.3.3 model.summary()

1.3.4 Try it yourself (Fashion MNIST)

1.3.5 More exercise about convolution and pooling

1.3.6 Exercise 3 - Handwriting recognition with CNN (MNIST)

Files

1.3cnn.md

Latest commit

History

1.3cnn.md

File metadata and controls

Enhancing Vision with Convolutional Neural Networks

1.3.1 Convolution and pooling

1.3.1.1 Convolution

Explain how convolution works:

Convolution is used for emphasizing

Convolution combined with pooling, they can become really powerful. Usually adding convolution layers and pooling layers will make model more accurate, empirically.

1.3.1.2 Pooling

Purpose of pooling

1.3.2 Convolution & pooling in code

1.3.3 model.summary()

1.3.4 Try it yourself (Fashion MNIST)

1.3.5 More exercise about convolution and pooling

1.3.6 Exercise 3 - Handwriting recognition with CNN (MNIST)