Data Preparation for Text-Image Retrieval

Annotation Files

We pre-processed and unified the annotations for various datasets to be in .json format to standardize them. These annotation files are stored under datasets/ directory of this repo. To use our inference code properly, you should also use the same annotation files, the detailed instructions are as follows:

Datasets list:

MSCOCO
FLICKR30K
DOCCI
IIW
ShareGPT4v
DCI
Urban1k

MSCOCO dataset

$coco/
|–– images/
|–––– val2017/
|–––––– 000000134722.jpg
|–––––– 000000177015.jpg
|–––––– ...
|–– annotations/
|–––– captions_val2017.json

Step 1. Download validation images from COCO 2017 Val Images, unzip them to coco/images/val2017

Step 2. Download the 2017 Val annotations, place it under coco/annotations/captions_val2017.json

FLCIKR30K dataset

$flickr30k-images/
|––  2217728745.jpg 
|––  2217728745.jpg
|––  ...
|––  flickr30k_val.json
|––  flickr30k_test.json

Step 1. Download flickr30k dataset, unzip them under flickr30k-images/, all the images and annotations files will be structured as above

DOCCI dataset

$docci/
|––  images/
|––––  test_01427.jpg
|––––  test_01428.jpg
|––––  ...
|––  annotations/
|–––– test_annotations.json

Step 1. Download DOCCI Images, unzip them under docci/images/, note that we only need the 5K test images here

Step 2. Directly copy the test_annotations.json in this repo and put it under docci/annotations. This annotation file documents the mapping between all test images with all fine-grained captions.

IIW dataset

$imageinwords/
|–– dci/
|–– docci/
|–– docci_aar/
|–– finegrained_annotations.json

Download human annotated data following IIW, including IIW-400, DCI-Test, DOCCI-Test:

Step 1: Download DCI to path_to_dci_dataset

Step 2: Download DOCCI images and AAR images from DOCCI dataset. Unzip the files to path_to_docci_dataset/images and path_to_docci_dataset/images_aar, respectively.

Step 3: Directly copy finegrained_annotations.json in this repo and put it under imageinwords\

ShareGPT4v dataset

$share4v/
|–– sa_000000/
|–––– images/
|–––––– sa_1.jpg
|–––––– sa_2.jpg
|–––––– ...
|–– sa_000001/
|–– ...

Step 1. Download tar files from SA-1B to share4v/

Step 2. Unzip all tar files

For the annotations, we have resaved the top 10k samples from share-captioner_coco_lcs_sam_1246k_1107.json in dataloaders/share4v/share4v_sam_10k.json

DCI dataset

$dci/
|–– densely_captioned_images/
|–––– annotations/
|–––– photos/
|–––– splits.json

Download data following DCI:

Step 1. Download dci.tar.gz and unzip the file in dci/densely_captioned_images

Step 2. Download the archive sa_000138.tar and extract the images to the dci/densely_captioned_images/photos folder.

Urban1k dataset

$Urban1k/
|––  images/
|––––  221.jpg
|––––  222.jpg
|––––  ...
|––  annotations/
|–––– annotations.json

Step 1. Download Urban1K, unzip them, only put the images(without the caption folder)under Urban1k/images/,

Step 2. Directly copy the annotations.json in this repo and put it under Urban1k/annotations. This annotation file documents the mapping between each image with its corresponding long caption.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EVAL_DATASETS.md

EVAL_DATASETS.md

Data Preparation for Text-Image Retrieval

Annotation Files

Datasets list:

MSCOCO dataset

FLCIKR30K dataset

DOCCI dataset

IIW dataset

ShareGPT4v dataset

DCI dataset

Urban1k dataset

Files

EVAL_DATASETS.md

Latest commit

History

EVAL_DATASETS.md

File metadata and controls

Data Preparation for Text-Image Retrieval

Annotation Files

Datasets list:

MSCOCO dataset

FLCIKR30K dataset

DOCCI dataset

IIW dataset

ShareGPT4v dataset

DCI dataset

Urban1k dataset