Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Multi-Class Image Segmentation of Aerial Images:

Using U-Net

  • Training: image
  • Testing: image

This notebook is used to generate multi-class masks of Aerial/Satellite Images using Transfer Learning Technique, utilizing U-Net as the architechure of the Deep-Learning Model

Make sure you use GPU runtime for this notebook. For Google Colab, go to Runtime -> Change runtime type and select GPU as the hardware accelerator. As using only CPU runtime will be very slow for training model.

Table of Contents

U-Net

U-Net was developed by Olaf Ronneberger et al. for Bio Medical Image Segmentation. The architecture contains two paths. First path is the contraction path (also called as the encoder) which is used to capture the context in the image. The encoder is just a traditional stack of convolutional and max pooling layers. The second path is the symmetric expanding path (also called as the decoder) which is used to enable precise localization using transposed convolutions. Thus, it is an end-to-end fully convolutional network (FCN), i.e. it only contains Convolutional layers and does not contain any Dense layer because of which it can accept image of any size.

In the original paper, the UNet is described as follows:

UNet

U-Net architecture (example for 32x32 pixels in the lowest resolution). Each blue box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the lower left edge of the box. White boxes represent copied feature maps. The arrows denote the different operations.

UNET - Network Architecture

UNET is a U-shaped encoder-decoder network architecture, which consists of four encoder blocks and four decoder blocks that are connected via a bridge. The encoder network (contracting path) half the spatial dimensions and double the number of filters (feature channels) at each encoder block. Likewise, the decoder network doubles the spatial dimensions and half the number of feature channels.

  • Encoder Network The encoder network acts as the feature extractor and learns an abstract representation of the input image through a sequence of the encoder blocks. Each encoder block consists of two 3x3 convolutions, where each convolution is followed by a ReLU (Rectified Linear Unit) activation function. The ReLU activation function introduces non-linearity into the network, which helps in the better generalization of the training data. The output of the ReLU acts as a skip connection for the corresponding decoder block. Next, follows a 2x2 max-pooling, where the spatial dimensions (height and width) of the feature maps are reduced by half. This reduces the computational cost by decreasing the number of trainable parameters.
  • Skip Connections These skip connections provide additional information that helps the decoder to generate better semantic features. They also act as a shortcut connection that helps the indirect flow of gradients to the earlier layers without any degradation. In simple terms, we can say that skip connection helps in better flow of gradient while backpropagation, which in turn helps the network to learn better representation.
  • Bridge The bridge connects the encoder and the decoder network and completes the flow of information. It consists of two 3x3 convolutions, where each convolution is followed by a ReLU activation function.
  • Decoder Network The decoder network is used to take the abstract representation and generate a semantic segmentation mask. The decoder block starts with a 2x2 transpose convolution. Next, it is concatenated with the corresponding skip connection feature map from the encoder block. These skip connections provide features from earlier layers that are sometimes lost due to the depth of the network. After that, two 3x3 convolutions are used, where each convolution is followed by a ReLU activation function. The output of the last decoder passes through a 1x1 convolution with sigmoid activation. The sigmoid activation function gives the segmentation mask representing the pixel-wise classification.

Errors

If the following error occurs:

AttributeError: module 'keras.utils.generic_utils' has no attribute 'get_custom_objects'

Go to File: /usr/local/lib/python3.10/dist-packages/efficientnet/keras.py and change the following lines

  • from . import inject_keras_modules, init_keras_custom_objects ---> from . import inject_keras_modules, init_tfkeras_custom_objects
  • init_keras_custom_objects() ---> init_tfkeras_custom_objects()

Prediction with Smooth Blending

  • Make smooth predictions by blending image patches, such as for image segmentation, rather than jagged ones.
  • One challenge of using a U-Net for image segmentation is to have smooth predictions, especially if the receptive field of the neural network is a small amount of pixels.

Example

Training Curves

Training and Validation Loss

Training and Validation Loss Curve

Training and Validation IOU

Training and Validation IOU Curve

Output:

Original Input Image Ground Truth Multi-Class Segmented Mask
input image ground truth mask
No. of Epochs Ground Truth Mask Predicted Mask without Smooth Blending Predicted Mask with Smooth Blending
10 Epochs ground truth mask 10 epoch: predict without smooth blending 10 epoch: predict with smooth blending
20 Epochs ground truth mask 20 epoch: predict without smooth blending 20 epoch: predict with smooth blending
30 Epochs ground truth mask 30 epoch: predict without smooth blending 30 epoch: predict with smooth blending
40 Epochs ground truth mask 40 epoch: predict without smooth blending 40 epoch: predict with smooth blending
50 Epochs ground truth mask 50 epoch: predict without smooth blending 50 epoch: predict with smooth blending
60 Epochs ground truth mask 60 epoch: predict without smooth blending 60 epoch: predict with smooth blending
70 Epochs ground truth mask 70 epoch: predict without smooth blending 70 epoch: predict with smooth blending
80 Epochs ground truth mask 80 epoch: predict without smooth blending 80 epoch: predict with smooth blending
90 Epochs ground truth mask 90 epoch: predict without smooth blending 90 epoch: predict with smooth blending
100 Epochs ground truth mask 100 epoch: predict without smooth blending 100 epoch: predict with smooth blending
110 Epochs ground truth mask 110 epoch: predict without smooth blending 110 epoch: predict with smooth blending
120 Epochs ground truth mask 120 epoch: predict without smooth blending 120 epoch: predict with smooth blending
130 Epochs ground truth mask 130 epoch: predict without smooth blending 130 epoch: predict with smooth blending
140 Epochs ground truth mask 140 epoch: predict without smooth blending 140 epoch: predict with smooth blending
150 Epochs ground truth mask 150 epoch: predict without smooth blending 150 epoch: predict with smooth blending
160 Epochs ground truth mask 160 epoch: predict without smooth blending 160 epoch: predict with smooth blending
170 Epochs ground truth mask 170 epoch: predict without smooth blending 170 epoch: predict with smooth blending
180 Epochs ground truth mask 180 epoch: predict without smooth blending 180 epoch: predict with smooth blending
190 Epochs ground truth mask 190 epoch: predict without smooth blending 190 epoch: predict with smooth blending
200 Epochs ground truth mask 200 epoch: predict without smooth blending 200 epoch: predict with smooth blending

Training Metrics

No. of Epoch Loss Accuracy Jaccard Coefficient Validation Loss Validation Accuracy Validation Jaccard Coefficient
10 0.9504 0.7307 0.4767 0.9749 0.6644 0.3965
20 0.9263 0.8019 0.5960 0.9802 0.6807 0.4974
30 0.9152 0.8331 0.6464 0.9395 0.7677 0.6057
40 0.9061 0.8577 0.6873 0.9351 0.7916 0.6408
50 0.9039 0.8566 0.6968 0.9266 0.8124 0.6594
60 0.8964 0.8607 0.7068 0.9298 0.7972 0.6464
70 0.8849 0.8824 0.7476 0.9177 0.8356 0.6940
80 0.8769 0.8972 0.7743 0.9253 0.8187 0.6889
90 0.8751 0.9001 0.7794 0.9205 0.8237 0.6930
100 0.8748 0.9012 0.7831 0.9456 0.7970 0.6562
110 0.8692 0.9142 0.8067 0.9162 0.8466 0.7352
120 0.8650 0.9247 0.8320 0.9204 0.8391 0.7280
130 0.8644 0.9257 0.8338 0.9185 0.8419 0.7284
140 0.8625 0.9302 0.8442 0.9133 0.8522 0.7407
150 0.8608 0.9318 0.8472 0.9180 0.8515 0.7443
160 0.8588 0.9352 0.8537 0.9184 0.8467 0.7413
170 0.8564 0.9400 0.8643 0.9165 0.8532 0.7505
180 0.8561 0.9404 0.8650 0.9163 0.8544 0.7467
190 0.8546 0.9426 0.8707 0.9154 0.8577 0.7560
200 0.8554 0.9407 0.8656 0.9174 0.8531 0.7465

Video of Progress Over Epochs

References