Skip to content

Latest commit

 

History

History
84 lines (57 loc) · 4.19 KB

INSTALL.md

File metadata and controls

84 lines (57 loc) · 4.19 KB

Installation

This page provides basic prerequisites to run OpenVQA, including the setups of hardware, software, and datasets.

Hardware & Software Setup

A machine with at least 1 GPU (>= 8GB), 20GB memory and 50GB free disk space is required. We strongly recommend to use a SSD drive to guarantee high-speed I/O.

The following packages are required to build the project correctly.

$ pip install -r requirements.txt
$ wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
$ pip install en_vectors_web_lg-2.1.0.tar.gz

Dataset Setup

The following datasets should be prepared before running the experiments.

Note that if you only want to run experiments on one specific dataset, you can focus on the setup for that and skip the rest.

VQA-v2

  • Image Features

The image features are extracted using the bottom-up-attention strategy, with each image being represented as an dynamic number (from 10 to 100) of 2048-D features. We store the features for each image in a .npz file. You can prepare the visual features by yourself or download the extracted features from OneDrive or BaiduYun. The downloaded files contains three files: train2014.tar.gz, val2014.tar.gz, and test2015.tar.gz, corresponding to the features of the train/val/test images for VQA-v2, respectively.

All the image feature files are unzipped and placed in the data/vqa/feats folder to form the following tree structure:

|-- data
	|-- vqa
	|  |-- feats
	|  |  |-- train2014
	|  |  |  |-- COCO_train2014_...jpg.npz
	|  |  |  |-- ...
	|  |  |-- val2014
	|  |  |  |-- COCO_val2014_...jpg.npz
	|  |  |  |-- ...
	|  |  |-- test2015
	|  |  |  |-- COCO_test2015_...jpg.npz
	|  |  |  |-- ...
  • QA Annotations

Download all the annotation json files for VQA-v2, including the train questions, val questions, test questions, train answers, and val answers.

In addition, we use the VQA samples from the Visual Genome to augment the training samples. We pre-processed these samples by two rules:

  1. Select the QA pairs with the corresponding images appear in the MS-COCO train and val splits;
  2. Select the QA pairs with the answer appear in the processed answer list (occurs more than 8 times in whole VQA-v2 answers).

We provide our processed vg questions and annotations files, you can download them from OneDrive or BaiduYun.

All the QA annotation files are unzipped and placed in the data/vqa/raw folder to form the following tree structure:

|-- data
	|-- vqa
	|  |-- raw
	|  |  |-- v2_OpenEnded_mscoco_train2014_questions.json
	|  |  |-- v2_OpenEnded_mscoco_val2014_questions.json
	|  |  |-- v2_OpenEnded_mscoco_test2015_questions.json
	|  |  |-- v2_OpenEnded_mscoco_test-dev2015_questions.json
	|  |  |-- v2_mscoco_train2014_annotations.json
	|  |  |-- v2_mscoco_val2014_annotations.json
	|  |  |-- VG_questions.json
	|  |  |-- VG_annotations.json

GQA

CLEVR