Visual Question Answering

The system we propose can answer to questions like the ones shown in the picture.

Team Members

Ram Gunasekaran A 18BCE1234
Siddharth M Nair 18BCE1238
C K M Ganesan 18BCE1266

Introduction

Visual Question Answering is a research area about building an AI system to answer questions presented in a natural language about an image. A system that solves this task demonstrates a more general understanding of images: it must be able to answer completely different questions about an image, oftentimes even addressing different sections of the image. Consider the example,For the image, our AI system should be able to localize the subject being referenced to in the question and detect it and should have some common-sense knowledge to answer it.

Scope

When a image is loaded to the system and a corresponding question is given in natural language the system we built will be able to localize the part of the image for which the question is relevant and find its answer. We were able to build this system for a given set of training and validation sets.The built system will be able to understand questions in English language but not in any other languages. It is assumed that for a given image the user will only ask questions relevant to the image if some random question is asked the system will not be able to identify that it’s a wrong question and will answer something in random.

Dataset

The dataset can be downloaded from the link https://visualqa.org/download.html

Demo

Click on the image to watch the demo

Deploying in your machine

Follow the steps to make deploy it locally

Download the repository and push it on to your google drive
Open a google Colab notebook with GPU runtime
Mount your Google drive to the Colab notebook
Change Directory to the project folder
Install the requirements using the command

    pip install -r requirements.txt

Run the main file using the following command

    !python VQA_main.py

After executing all the above commands you should get output like this

Follow the running on link to check the output

Models Proposed

Two models namely Parallel Co-Attention and Alternating Co-Attention Model are proposed in this project

Parallel Co-Attention Model

The question and image will be processed in parallel and the answer for the given set of image and question is a result parallelized execution of the model.

Alternating Co-Attention Model

Iterate the above query-attention process using multiple attention layers
Extracts more fine-grained visual attention information for answer prediction

Results

Model	Accuracy
Baseline	57%
Alternating Co-Attention	60%
Parallel Co-Attention	63%

Out of the three models implemented Parallel Co-Attention model is found to give better results thereby clearly proving giving equal emphasis to question as well as the image is important.We were able to achieve a accuracy 63% using Parallel Co-Attention which can be further increased by increasing the number of epochs and a better dataset.

Conclusion and Future Work

The alternating co-attention and the parallel co-attention were implemeted using LSTM .
Among the three models(including baseline) parallel co-attention is found to be more accurate and is found to give better results than the others.
The final accuracy of the built model is 63%.

The accuracy can be still improved by increasing the number of epochs and training with morea and better datasets.

In the future with better datasets and increased training the better accuracy can be achieved. The model can then be used to build a Visual Assistance for Blind People. For example a blind person can take a picture and ask a question like “What is there in this Picture” which can be converted to text using the Speech to Text API and the corresponding answer will be sent back to the user using Text to Speech API,acting as visual support for them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Question Answering

Team Members

Table of Contents

Introduction

Scope

Dataset

Demo

Deploying in your machine

Models Proposed

Parallel Co-Attention Model

Alternating Co-Attention Model

Results

Conclusion and Future Work

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
models		models
page_template		page_template
pickles		pickles
utils		utils
COCO_train2014_000000000009.jpg		COCO_train2014_000000000009.jpg
COCO_train2014_000000000025.jpg		COCO_train2014_000000000025.jpg
COCO_train2014_000000000030.jpg		COCO_train2014_000000000030.jpg
COCO_train2014_000000000034.jpg		COCO_train2014_000000000034.jpg
COCO_train2014_000000000036.jpg		COCO_train2014_000000000036.jpg
COCO_train2014_000000000049.jpg		COCO_train2014_000000000049.jpg
COCO_train2014_000000000061.jpg		COCO_train2014_000000000061.jpg
COCO_train2014_000000000064.jpg		COCO_train2014_000000000064.jpg
COCO_train2014_000000000071.jpg		COCO_train2014_000000000071.jpg
COCO_train2014_000000000072.jpg		COCO_train2014_000000000072.jpg
COCO_train2014_000000000077.jpg		COCO_train2014_000000000077.jpg
COCO_train2014_000000000078.jpg		COCO_train2014_000000000078.jpg
COCO_train2014_000000000081.jpg		COCO_train2014_000000000081.jpg
COCO_train2014_000000000086.jpg		COCO_train2014_000000000086.jpg
COCO_train2014_000000000089.jpg		COCO_train2014_000000000089.jpg
COCO_train2014_000000000092.jpg		COCO_train2014_000000000092.jpg
COCO_train2014_000000000094.jpg		COCO_train2014_000000000094.jpg
COCO_train2014_000000000109.jpg		COCO_train2014_000000000109.jpg
COCO_train2014_000000000110.jpg		COCO_train2014_000000000110.jpg
COCO_train2014_000000000113.jpg		COCO_train2014_000000000113.jpg
COCO_train2014_000000000127.jpg		COCO_train2014_000000000127.jpg
COCO_train2014_000000000138.jpg		COCO_train2014_000000000138.jpg
COCO_train2014_000000000142.jpg		COCO_train2014_000000000142.jpg
COCO_train2014_000000000144.jpg		COCO_train2014_000000000144.jpg
COCO_train2014_000000000149.jpg		COCO_train2014_000000000149.jpg
COCO_train2014_000000000151.jpg		COCO_train2014_000000000151.jpg
COCO_train2014_000000000154.jpg		COCO_train2014_000000000154.jpg
COCO_train2014_000000000165.jpg		COCO_train2014_000000000165.jpg
COCO_train2014_000000000194.jpg		COCO_train2014_000000000194.jpg
COCO_train2014_000000000201.jpg		COCO_train2014_000000000201.jpg
COCO_train2014_000000000247.jpg		COCO_train2014_000000000247.jpg
COCO_train2014_000000000250.jpg		COCO_train2014_000000000250.jpg
COCO_train2014_000000000260.jpg		COCO_train2014_000000000260.jpg
COCO_train2014_000000000263.jpg		COCO_train2014_000000000263.jpg
COCO_train2014_000000000307.jpg		COCO_train2014_000000000307.jpg
COCO_train2014_000000000308.jpg		COCO_train2014_000000000308.jpg
COCO_train2014_000000000309.jpg		COCO_train2014_000000000309.jpg
COCO_train2014_000000000312.jpg		COCO_train2014_000000000312.jpg
COCO_train2014_000000000315.jpg		COCO_train2014_000000000315.jpg
COCO_train2014_000000000321.jpg		COCO_train2014_000000000321.jpg
COCO_train2014_000000000322.jpg		COCO_train2014_000000000322.jpg
COCO_train2014_000000000326.jpg		COCO_train2014_000000000326.jpg
COCO_train2014_000000000332.jpg		COCO_train2014_000000000332.jpg
COCO_train2014_000000000349.jpg		COCO_train2014_000000000349.jpg
COCO_train2014_000000000368.jpg		COCO_train2014_000000000368.jpg
COCO_train2014_000000000370.jpg		COCO_train2014_000000000370.jpg
COCO_train2014_000000000382.jpg		COCO_train2014_000000000382.jpg
COCO_train2014_000000000384.jpg		COCO_train2014_000000000384.jpg
COCO_train2014_000000000389.jpg		COCO_train2014_000000000389.jpg
COCO_train2014_000000000394.jpg		COCO_train2014_000000000394.jpg
COCO_train2014_000000000404.jpg		COCO_train2014_000000000404.jpg
COCO_train2014_000000000419.jpg		COCO_train2014_000000000419.jpg
COCO_train2014_000000000431.jpg		COCO_train2014_000000000431.jpg
COCO_train2014_000000000436.jpg		COCO_train2014_000000000436.jpg
COCO_train2014_000000000438.jpg		COCO_train2014_000000000438.jpg
COCO_train2014_000000000443.jpg		COCO_train2014_000000000443.jpg
COCO_train2014_000000000446.jpg		COCO_train2014_000000000446.jpg
COCO_train2014_000000000450.jpg		COCO_train2014_000000000450.jpg
COCO_train2014_000000000471.jpg		COCO_train2014_000000000471.jpg
COCO_train2014_000000000490.jpg		COCO_train2014_000000000490.jpg
COCO_train2014_000000000491.jpg		COCO_train2014_000000000491.jpg
COCO_train2014_000000000508.jpg		COCO_train2014_000000000508.jpg
COCO_train2014_000000000510.jpg		COCO_train2014_000000000510.jpg
COCO_train2014_000000000514.jpg		COCO_train2014_000000000514.jpg
COCO_train2014_000000000562.jpg		COCO_train2014_000000000562.jpg
LICENSE		LICENSE
README.md		README.md
VQA_main.py		VQA_main.py
alterncoattn.JPG		alterncoattn.JPG
parco.JPG		parco.JPG
requirements.txt		requirements.txt
runon.JPG		runon.JPG

License

ckmganesh/Visual-Question-Answering-Flask-Application

Folders and files

Latest commit

History

Repository files navigation

Visual Question Answering

Team Members

Table of Contents

Introduction

Scope

Dataset

Demo

Deploying in your machine

Models Proposed

Parallel Co-Attention Model

Alternating Co-Attention Model

Results

Conclusion and Future Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages