This page provides basic prerequisites to run OpenVQA, including the setups of hardware, software, and datasets.
A machine with at least 1 GPU (>= 8GB), 20GB memory and 50GB free disk space is required. We strongly recommend to use a SSD drive to guarantee high-speed I/O.
The following packages are required to build the project correctly.
- Python >= 3.5
- Cuda >= 9.0 and cuDNN
- PyTorch >= 0.4.1 with CUDA (PyTorch 1.x is also supported).
- SpaCy and initialize the GloVe as follows:
$ pip install -r requirements.txt
$ wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
$ pip install en_vectors_web_lg-2.1.0.tar.gz
The following datasets should be prepared before running the experiments.
Note that if you only want to run experiments on one specific dataset, you can focus on the setup for that and skip the rest.
- Image Features
The image features are extracted using the bottom-up-attention strategy, with each image being represented as an dynamic number (from 10 to 100) of 2048-D features. We store the features for each image in a .npz
file. You can prepare the visual features by yourself or download the extracted features from OneDrive or BaiduYun. The downloaded files contains three files: train2014.tar.gz, val2014.tar.gz, and test2015.tar.gz, corresponding to the features of the train/val/test images for VQA-v2, respectively.
All the image feature files are unzipped and placed in the data/vqa/feats
folder to form the following tree structure:
|-- data
|-- vqa
| |-- feats
| | |-- train2014
| | | |-- COCO_train2014_...jpg.npz
| | | |-- ...
| | |-- val2014
| | | |-- COCO_val2014_...jpg.npz
| | | |-- ...
| | |-- test2015
| | | |-- COCO_test2015_...jpg.npz
| | | |-- ...
- QA Annotations
Download all the annotation json
files for VQA-v2, including the train questions, val questions, test questions, train answers, and val answers.
In addition, we use the VQA samples from the Visual Genome to augment the training samples. We pre-processed these samples by two rules:
- Select the QA pairs with the corresponding images appear in the MS-COCO train and val splits;
- Select the QA pairs with the answer appear in the processed answer list (occurs more than 8 times in whole VQA-v2 answers).
We provide our processed vg questions and annotations files, you can download them from OneDrive or BaiduYun.
All the QA annotation files are unzipped and placed in the data/vqa/raw
folder to form the following tree structure:
|-- data
|-- vqa
| |-- raw
| | |-- v2_OpenEnded_mscoco_train2014_questions.json
| | |-- v2_OpenEnded_mscoco_val2014_questions.json
| | |-- v2_OpenEnded_mscoco_test2015_questions.json
| | |-- v2_OpenEnded_mscoco_test-dev2015_questions.json
| | |-- v2_mscoco_train2014_annotations.json
| | |-- v2_mscoco_val2014_annotations.json
| | |-- VG_questions.json
| | |-- VG_annotations.json