This repo contains four projects for COMP5421 - Computer Vision@HKUST. And each of these four projects involves different knowledge background about computer vision.
- You may need UST CSE account to access some webpages.
The general idea of this project is that after loading the image, we finished the cost computation by implementing the cost function provided in the course webpage*. Every time when the user clicks a seed, we will update the path tree according to the Dijkstra's algorithm.
We provide three images, one background image and two foreground images.
Foreground Image One and its mask:
Foreground Image Two and its mask:
Background image:
The result is cool!!!
We implemented two whistles
for this project.
- SnapSeed
To snap the first seed near the edge, we find the nearest neighbor of the seed on the edge after the user clicks the first seed. We used CV::Canny
to get the edge and then use L2 distance
to meansure the space similarity. Update the position of the first seed each time when finding out a optimal point.
- Blurring
We include two gaussiFor blurring effect, and we use opencv Guassian filter. We adjust the Guassian filter kernel size and standard deviation to achieve different level of blurring effect. After opening the image, we could add this blurring effect using the interface of our IScissor.
Tool -> Gaussian 3*3
Tool -> Gaussian 5*5
After click on these two buttons, the image will be blurred. We provide a sample with our school HKUST. You can see that different level of blurring effect has been achieved. Comparing two filters, we could see that larger kernel size induces stronger blurring effect. See pics.an blurring filters in the projects such that the user could apply anyone of them to compare the difference before and after blurring. Also we notice that if the blurring effect gets stronger, the cost gets lower.
- Orignal Pic and its cost graph:
- Blurring with 3*3 gaussian filters
- Blurring with 5*5 gaussian filters
The course project of COMP5421 Computer Vision
The task given by this assignment is to implement a multi-scale sliding window face detector based on concepts presented in Dalal-Triggs 2005 and Viola-Jones 2001. The algorithm will be evaluated using a common benchmark for face detection (Caltech).
- Use HOG descriptor to genearte positive face images with cell size 3
- Horizontally flip face images, contrast original face images with darker faces images( img*0.8 ), use HOG descriptor to generate postive images from these images and add these images into the postive face samples
- Use HOG descriptor to generate negative cropped images (50000 images) from the database
- Train the linear SVM Classifier
- For each test image, for each position at each scale in the image, create a window and run the classifier to determine whether or not there is a face at that location
- Step 5 will result in many overlapping bounding boxes for the the same face, which must then be combined or suppressed into one final bounding box (non maximum suppression).
We conducted several experiments to indentify the best approach to detect the face.
- Cell Size 6, flip and contrast the faces, without hard negative mining
++The average accuracy is 0.840, and it is very quick.++
- Cell Size 3, flip and contrast the faces, without hard negative mining
++The average accuracy is 0.916, and it takes about 20 minutes.++
- Cell Size 3, without hard negative mining
++The average accuracy is 0.823, and it takes about 30-40 minutes.++
- Cell Size 3, flip and contrast the faces, with hard negative mining
++The average accuracy is 0.901, and it takes about 90 minutes.++
-
Notice that when the cell size gets smaller, the HOG descriptor can persent more details about the gradient and edge direction at each pixel, which leads to higher accuracy.
-
With more negative samples, the accuracy can be improved a bit, but at the same time, the detection speed becomes slow.
-
When we horizontally flip face images and make images darker (i.e. add more positive training data), we can largley improve the accuracy since we found that some faces in very dark background could not be detected and some side faces can not be recognized.
Train on the original datasets, collect images which are falsely detected as faces and add them into the negative samples. Retrain again.
- Horizontally Flip face images
- Generate contrast images (image * 0.8)
Implement Local Binary Pattern (LBP). Instead of using HOG descriptor, we use LBP to extract features from images.
- calculate the vanishing points by Bob collins' algorithm.
- Calculate the Projective matrix, where the scales are computed with reference points and vanishing points
- Use Homograph Matrix to get the texture map (for simplicity, we directly "hardcode" points this time)
- Mark Interesing points and get 3D coordinates
- Generate VMRL
Source Image
Texture Maps
Results
Source Image
Texture Maps
Results
Source Image
Texture Maps
Results
Source Image
Texture Maps
Results
In this part, we include five examples to demonstrate the multiple view modeling.
example 02
example 03
example 04
example 05
example 06
example 07
example 08
example 09
example 10