ML models for facial recognition, object detection and recognition and emotion detection.
In this repository, we are going to demonstrate some computer vision application of deep learning models. We will show how, these models can do Facial recognition, Object detection and Emotion detection. Yes, all of these tasks in real-time. But, let's first see what happens under the hood in these kind of systems.
For facial recognition, we have lots of different models out there which can do great job on generating face embedding for faces. Let's see how the whole system will work,
- The model is trained to generate face embedding vector for faces it feed during training. Now, we have a model which can generate face embedding vector if we feed face image to it. In above picture you can see Deep-Face model which can do this for us.
- Now, we generate face embeddings of user and put them with the name in database of our system.
- Whenever there is some unknown user in front of our camera, the model will generate face embedding for that unknown face.
- This unknown face embedding will be compraed with the embedding in database and using euclidean distance, we will measure whether the faces are same or not.
- If face is similar to someone from the database, then label the unknown face with the name of the user.
This is just a simple overview of what happens under the hood. There are few more things that the systems like FaceLock of Iphone uses like detecting whether the image fed is real human or just a fake one. Yeah, that is why it is next to impossible to fool the facelock system.
- Usecase
- Facelock for phones. Iphone has this feature.
- Facial recognition security systems. Just like BAIDU uses for entering into their HQ.
- surveillance systems. Government can use for protecting people and suspecting terrorist in public.
For emotion detection, we can use the same idea that we saw in face recognition. The only change is that instead of generating face embedding vector, we will generate facial landmarks, or key-points for face and then compute the distance between those points to find out the exact emotion of person. Below is the picture of woman with 68 landmarks detected on her face.
Once we got the distnaces corresponding to each emotion,we can train a deep learning model to find out the emotion based on the distance. The key steps will be
- Create dataset with landmarks distance and respective emotion.
- Train deep learning model on this data.
- Use face image as input and find out the landmarks, then calculate the distance and feed that into deep learning model to get the final output of emotion label.
- Usecase
- Customer behaviour analysis.
- Public behaviour analysis.
- Patient analysis in mental hospitals.
- In office work mood analysis of employees.
Object detection is one of the hardest thing in computer vision, which is now easy to do, thanks to the deep learning. Models like YOLO and SSDLite are so fast that object detection can be done in real time without any kind of problem. Below is the image from official YOLO paper on how it works.
YOLO uses a grid to divide the whole image into some smaller patches. Then it tries to predict whether there is object or not and which object it is. Once we got lots of anchor boxes, using non-max suppression we neglect the boxes which has low probability of having any meaningful objects. The above image demonstrate that scenario quite well.
There are other models like R-CNN, Fast-RCNN and different variants of it, but they are not quite fast and can not be used in real-time detection.
- Usecase
- Harmful object detection in public.
- Weapon detection in prison.
- Customer behaviour analysis.