Vehicle Detection Project
This project makes uses of a SVM classifier to detect vehicles in an image. Instead of directly providing the image pixel data as input to the classifer, an image processing pipeline extracts important features from the images and then feeds them into the classifier. Selecting the right features is key as superfluous features can slow down the pipeline without aiding detection. In contrast, training the classifier with very few features means the accuracy of the classifier is too low. The features considered for extraction were:
- Color histogram
A histogram analysis is performed on each of the three color components and the bin-counts are then flattened into an feature array. The choice of color format plays an important role in deciding which of the color components is useful for classification. The number of bins decide how much information is captured i.e., how big the feature array is. Here's the histogram for a test image.
- Spatially binned pixel data
This essentially is a resizing operation that provides some invariance to image size for purpose of classification.
- Histogram Of Gradients (HOG)
This by far appears to be the most important feature to classify cars. HOG captures the shape information of objects very well. This pipeline makes HOG vectors from all three color components.
The approach to selecting the features was to train the SVM classifier for all combination of features and choose the one with the best testing accuracy.
def get_configs():
while(True):
for color_space in ['RGB', 'HSV', 'LUV', 'HLS', 'YUV', 'YCrCb']:
for spatial_feat in [True, False]:
for hist_feat in [True, False]:
for hog_feat in [True, False]:
if spatial_feat == False and hist_feat == False and hog_feat == False:
continue
for hog_channel in ['ALL',0,1,2]:
for cell_per_block in [2,4]:
for pix_per_cell in [8,6,10]:
for orient in [9,8,10]:
yield [color_space, spatial_feat, hist_feat, hog_feat,\
hog_channel, cell_per_block, pix_per_cell, orient]
By gradually narrowing down the configurations, it was apparent that
- YUV, HSV and LUV performed the best
- Color Histogram had very little effect
- The accuracy was better with ALL hog channels.
- The rest of the parameters did not matter much.
The carefully selected features are concatenated into an array and normalized with a StandardScaler. A LinearSVC classifier forms the end of pipeline to detecting whether an image (or a window) is a car or not based on the feature array input. Training is performed with a data set containing 64x64 pixel, 3-channel car and non-car images. The data set has about 8792 cars and 8968 non-cars. Training was also one the primary considerations in selecting features.
Feature extraction takes 74.32s
Training takes 24.77s and results in an accuracy = 0.991
The feature extraction and classification pipeline operates on 64x64x3 sized images. However, the images from the video are 1280x720x3 sized. Also, another problem is that the classifier is trained to detect cropped single car images, but the image from the video can contain several cars, of different sizes and at different locations within the image. To solve this problem, a sliding window that extracts a portion of the image and feeds it to the pipeline is implemented. The windows that results in a True detection for a car are then further processed.
The classifier identifies sections(window) in an image that contain a car and outputs the co-ordinates of those windows. However, the detection has two major issues:
- False positives where a non-car part of the image identified as a car
- The classifier outputs multiple windows for a car.
False positives are handled by creating a heatmap of the image that accumulates detections and checking if the accumulated heat is higher than a threshold. The idea is that the classifier outputs a weak detection in the false positive area with fewer windows.
Duplicate detections are handled by using the scipy.ndimage.measurements.label() function that groups connected pixels in the heatmap as belonging to one label.
I also have running average filter that tries smooothen detections between frames. But this does not seem to be working great at this point.
Here's a link to my video result
This project is based on the code in Udacity lesson exercises. Selecting the features and optimzing the parameters for feature extraction required several rounds of iterating. Post-processing the detection to reject false detections and combine duplicates seemed to be the hardest part. Overall, the approach seem very sensitive to the choice of features, parameters and requires a fair amount of post-processing after detection.