-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #14 from iitmcvg/bugfix
Minor fixes before openhouse + Updating current projects
- Loading branch information
Showing
24 changed files
with
183 additions
and
5 deletions.
There are no files selected for viewing
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
layout: single | ||
title: "Image Based Census for Animals" | ||
description: "Detection of animals in pictures and then identifying separate individuals of the same species" | ||
--- | ||
|
||
**Goals:** | ||
|
||
- Create a software to detect animals in pictures and then identify separate individuals of the same species. | ||
- Focusing mainly on Chital (Spotted Deer), create a dataset for the animals and then run software over it. | ||
|
||
**Objective:** | ||
|
||
The aim is to help forest officials to take a census using images from motion trap cameras. | ||
|
||
**Method:** | ||
|
||
- Break up task into four steps: | ||
- Creating a dataset | ||
- Detection of animals in image | ||
- Identification of individual | ||
- Logging entry into database | ||
- Study relevant courses and topics like CS231n, PyTorch, etc. | ||
- Explore all previous work done in this field, look for resources that might aid the process. | ||
- Compare current detection algorithms and modify them to adapt to Chital. | ||
- Create an identification software that uses the individuals' spot patterns to distinguish them using neural networks. | ||
- Finally, package the software into a presentable media/format. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
--- | ||
layout: single | ||
title: "Deep nets analysis, exploration and visualization" | ||
description: "Using different methods and techniques to try to understand what exactly dictates the decision a neural network makes" | ||
--- | ||
|
||
**Goal:** | ||
|
||
Deep nets have gotten extremely effective at solving problems that we want them to, and as we use them to solve furthermore complex problems, we also use more and more complex architectures. With increase in complexity, even the architects of the net stop truly understanding what the net does. They just possess knowledge about how the net trains and what data it is trained on, but the network itself remains a black box. The aim of our project is to use different methods and techniques to try to understand what exactly dictates the decision a neural network makes. | ||
|
||
**Objective:** | ||
|
||
- The first way we plan on increasing our understanding of neural nets is to develop methods to visualize what a neural network sees. | ||
- The second part of the project involves using more human like methods in image classification to implement a top down approach to image classification. | ||
|
||
**Method:** | ||
|
||
- For the first part we are trying to visualize is a RESNET trained to classify flowers. There are 3 techniques we are implementing, those being: | ||
- **Activation Maximization** : In this method we create an image in which all the pixel values are variables, pass the image forward through the trained network and train the image to maximally activates a single neuron in the final layer. The image thus produced is a representation of what that neuron represents in the network and shows the network's abstraction of what the class is. This is done by maximizing the inner product of a one hot vector X0 and the output of a layer in the net after the variable image is passed through it. | ||
|
||
{% include figure image_path="/assets/images/projects/Deep_nets_2020/deep_1.png" text-align=center %} | ||
|
||
- **Caricaturization:** This method is like activation maximization, but instead of being restricted to just the final layer, this method allows the maximization of any neuron in the network. This gives more flexibility and allows us to see what individual characteristics make up the class that we were trying to reproduce. | ||
|
||
{% include figure image_path="/assets/images/projects/Deep_nets_2020/deep_2.png" caption="An image created by the deep dream algorithm by maximizing features" text-align=center %} | ||
|
||
- **Inversion:** In this method, the output class is represented as a function f(X0) of a standard image X0 where the function is the trained neural net. We then try to recreate X0 from the function by inverting it and passing the class as the input. Since most neural nets are not invertible, this gives a result different from the initial image. This new representation helps us understand what information was lost during the classification process, as some of it must be lost to generalize a class of images. For example, if all roses are to give the same output, some of the details about each rose image must be lost when the initial image is passed through the function. | ||
|
||
Of course, some of the images obtained using these methods are non-sensical. We restrict the space using regularizers to make the outputs only those that are "natural" and understandable to humans. | ||
|
||
- In the second part the human vision system works in a hierarchy in which we recognize overarching large patterns before observing the finer details. Traditional ConvNets do not use this technique. This kind of architecture is much easier to understand as it is more like how human brains work, and more robust against adversarial attacks. | ||
|
||
{% include figure image_path="/assets/images/projects/Deep_nets_2020/deep_3.png" caption= "The top down network involves passing a downscaled version of the initial image to the first layer, and the feeding in higher resolution versions of the image in the subsequent layers to imitate the sequential gathering of finer details" text-align=center %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
--- | ||
layout: single | ||
title: "Document Recovery with OCR and NLP" | ||
description: "Using OCR to extract text from documents and fill in the places where the words are missing or mistakes in OCR outputs using Natural Language Processing" | ||
--- | ||
|
||
**Goal:** | ||
|
||
To digitize and recover text from documents which are not in good shape | ||
|
||
**Objective:** | ||
|
||
To extract text from documents using OCR and fill in the places where the words are missing or mistakes in OCR outputs using Natural Language Processing | ||
|
||
**Method:** | ||
|
||
- Do a document layout segmentation i.e. understanding which parts of the document is heading, image, body text, etc. | ||
- Try out handwriting recognition. | ||
- Use OCR or handwriting recognition to extract the text from image. | ||
- Filling in the missing details by using an NLP model to understand the context and fill. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
--- | ||
layout: single | ||
title: "Exploration into RL algorithms through games" | ||
description: "Analyzing the performance of various RL algorithms on different games such as Minesweeper,Slither.io and Reconnaissance Blind Chess" | ||
--- | ||
|
||
**Goal:** | ||
|
||
To analyze the performance of various RL algorithms on different games such as Minesweeper, [Slither.io](http://slither.io/) and Reconnaissance Blind Chess. | ||
|
||
**Objective:** | ||
|
||
By working on multiplayer environments and incomplete information problems, we intend to find improvements of the current state-of-the-art methods which can find applications in sophisticated problems such as robotics and autonomous driving. | ||
|
||
**Method:** | ||
|
||
- The agent is made to interact with an emulated game environment. In case of slither, the OpenAI universe package is used to create a container image of the online version of the game while in case of Minesweeper, a pygame environment is used. | ||
- We stack 4-6 images as one training input to add a sense of direction to the game and pass this image to a CNN followed by DENSE LAYERED NEURAL NETWORK. The output is value of the different states or policy depending on the algorithm. (Value of a state tells how good or bad the state/snapshot/frame of the game is, and policy is the strategy based on which the bot takes actions) | ||
- The agent is trained using Q-value based and Policy based methods such as Deep Q-learning, Policy gradients and Actor-Critic methods (to get the best of both worlds) | ||
- The effects of reward shaping, priority experience replay queues, recurrent and LSTM memory layers (in case of Partially Observable MDPs) etc. on the performance of the agent are analyzed. | ||
- Document the training and compare the success of the different algorithms |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
layout: single | ||
title: "Smartcopter" | ||
description: " Achieving completely autonomous navigation in a GPS denied environment, by using completely vision-based systems" | ||
--- | ||
|
||
|
||
|
||
**Goal:** | ||
|
||
The project Smartcopter aims to achieve completely autonomous navigation in a GPS denied environment, by using completely vision-based systems such as normal cameras, depth cameras, tracking cameras, etc. and analyzing the depth information and other relevant data thus obtained. | ||
|
||
**Objective:** | ||
|
||
We intend to work on obstacle avoidance, experiment on path planning algorithms to achieve completely autonomous navigation in a GPS denied environment. | ||
|
||
**Method:** | ||
|
||
- Firstly, we shall try implementing basic obstacle avoidance to make sure that the drone is capable of safe flight. | ||
- We shall then try implementing traditional global path planning using Open Motion Planning Library (OMPL), Moveit and PX4 avoidance. | ||
- We then go a step ahead, for this is a "smart" copter, we plan on using Deep learning and reinforcement learning algorithms to implement path planning, which would have happened previously, the traditional way. | ||
- There are various applications to this project some of which are avoidance, surveying and swarm missions, delivery applications, etc. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
--- | ||
layout: single | ||
title: "Traffic Analysis using CV" | ||
description: "Developing a driver-assistance system that could assist the driver in real-time about Pedestrians,Traffic signs etc." | ||
--- | ||
|
||
**Goal:** | ||
|
||
We are motivated to develop a supportive eye that can assist the driver to mitigate an accident before it occurs to improve road-safety with the help of computer vision for Indian roads. | ||
|
||
**Objectives:** | ||
|
||
We are currently working on Pedestrian Protection System. We intend to work on several sub-divisions of Driver Assistance Systems like: | ||
|
||
- Traffic-sign recognition | ||
- Driver drowsiness detection | ||
- Lane-departure Warning | ||
- Lane-Change assistance | ||
|
||
etc., in future. | ||
|
||
**Method:** | ||
|
||
- End-to-end object detection models are capable of detecting objects like pedestrians from road map in real time. Currently we are using YOLOv5 Network for this purpose. Training YOLOv5 using custom datasets of road-users or existing pedestrian datasets like Caltech, KITTI,IDD(Indian Driving Dataset) can improve accuracy. | ||
|
||
{% include figure image_path="/assets/images/projects/Traffic_Analysis_2020/TA_1.jpg" caption="**Image Generated using custom_trained yolov5**" text-align=center %} | ||
|
||
- Either sensor-based approaches (like LIDAR or RADAR) or vision-based approaches (like stereo-cameras or monocular video) are to be used to get the depth of the objects. If monocular depth estimation works well the cost of such a system could be reduced a lot. | ||
- A network trained on human-pose can alert about direction of motion of pedestrians. For this purpose, labelled data of human images depending on their direction is to be used. | ||
- Similarly, transfer learning of CNN's can achieve the purpose of traffic sign recognition and driver drowsiness detection. Traffic sign recognition involves two phases one is detection and localization and the other is text description of the localized image and with driver-eye face monitoring and vehicle-lane position monitoring the state of the driver can be assessed. | ||
- A lane detection system used behind the lane departure warning system can be developed using the principle of Hough Transform and Canny edge detector to detect lane lines from real-time camera images from the front-end of automobile. | ||
- With the help of cameras and sensors driver could be alerted about the vehicles that are hovering from the blind-spot of the driver. | ||
- With the help of the above-mentioned approaches a system that could assist the driver in real-time to prevent collisions and mitigate accidents could be developed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
layout: single | ||
title: "VR Shooter with Dodging agent" | ||
description: "To make a VR shooter game with agents trained by reinforcement learning" | ||
--- | ||
|
||
**Goal:** | ||
|
||
To make a VR shooter game with agents trained by reinforcement learning | ||
|
||
**Objective:** | ||
|
||
Investigate factors like reward signals, training practices and environment design that favour cooperation or competition in a multi-agent RL setting. | ||
|
||
**Method:** | ||
|
||
- Using the ML-Agents framework in Unity3D, agents are trained with RL algorithms like Proximal Policy Optimization (PPO), Soft Actor Critic (SAC) and others. | ||
- Reinforcement Learning (RL) has mainly looked upon the single agent setting, akin to a control system. | ||
- The field of multi-agent RL is nascent, with papers being published nearly every day. This draws parallels with daily human life too. We are trained to choose between working as a team, and competing with one other, to achieve our goals, like hunting animals in stone-age or playing a game of football currently. | ||
- This project wants to find out what roles an agent can learn to take up in a multi-agent setting. | ||
- RL is known for it is highly unstable training phase, and subtle choices in the environment setup and reward signals give rise to different behaviour in agents. | ||
- We plan to implement a shooter game, like Counter-Strike, where a team of agents will have to take up different roles, like pursuit of an opponent, giving cover fire, sniper positions and so on. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters