Learns an MLP for VQA
This code implements the VQA MLP basline from Revisiting Visual Question Answering Baselines.
Features/Methods | VQA Val Accuracy | VQA Test-dev Accuracy |
---|---|---|
MCBP | - | 66.4 |
Baseline - MLP | - | 64.9 |
Imagenet - MLP | 63.62 | 65.9 |
Readme is a work in progress......
The MLP is implemented in Torch, and depends on the following packages: torch/nn, torch/nngraph, torch/cutorch, torch/cunn, torch/image, torch/tds, lua-cjson, nninit, torch-word-emb, torch-hdf5, torchx
After installing torch, you can install / update these dependencies by running the following:
luarocks install nn
luarocks install nngraph
luarocks install image
luarocks install tds
luarocks install cutorch
luarocks install cunn
luarocks install lua-cjson
luarocks install nninit
luarocks install torch-word-emb
luarocks install torchx
Install torch-hdf5 by following instructions here
git clone --recursive https://github.com/arunmallya/simple-vqa.git
-
Create a data/ folder and symlink or place the following datasets: vqa -> VQA dataset root, coco -> COCO dataset root (coco is needed only if you plan to extract and use your own features, not required if using cached features below).
-
Download the Word2Vec model file from here. This is needed to encode sentences into vectors. Place the .bin file in the data/models folder.
-
Download cached resnet-152 imagenet features for the VQA dataset splits and place them in data/feats: features
-
Download VQA lite annotations and place then in data/vqa/Annotations/. These are required because the original VQA annotations do not fit in the 2GB limit of luajit.
-
Download MLP models trained on the VQA train set and place them in checkpoint/: models
-
At this point, your data folder should have models/, feats/, coco/ and vqa/ folders.
For example, to run the model trained on the VQA train set with Imagenet features, on the VQA val set:
th eval.lua -eval_split val \
-eval_checkpoint_path checkpoint/MLP-imagenet-train.t7
In general, the command is:
th eval.lua -eval_split (train/val/test-dev/test-final) \
-eval_checkpoint_path <model-path>
This will dump the results in checkpoint/ as a .json file as well as a results.zip file in case of test-dev and test-final. This results.zip can be uploaded to CodaLab for evaluation.
th train.lua -im_feat_types imagenet -im_feat_dims 2048