Skip to content

Latest commit

 

History

History
36 lines (26 loc) · 3.94 KB

coco_retrieval.md

File metadata and controls

36 lines (26 loc) · 3.94 KB

Samples from the COCO Caption dataset (Image credit: "https://arxiv.org/pdf/1504.00325.pdf").(Samples from the COCO Caption dataset. Image credit: "https://arxiv.org/pdf/1504.00325.pdf")

Microsoft COCO Dataset (Retrieval)

Description

Microsoft COCO dataset contains over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions are be provided for each image.

Task

Cross modal retrieval: (1) image-text: given an image as query, retrieve texts from a gallery; (2) text-image: given a text as query, retrieval images from a gallery.

Metrics

Common metrics are recall@k, denotes the recall score after k retrieval efforts.

We use TR to denote the image-text retrieval recall score and IR to denote text-image retrieval score.

Leaderboard

(Ranked by TR@1.)

Rank Model TR@1 TR@5 TR@10 IR@1 IR@5 IR@10 Resources
1 BLIP 82.4 95.4 97.9 65.1 86.3 91.8 paper, code, demo, blog
2 X-VLM 81.2 95.6 98.2 63.4 85.8 91.5 paper, code
3 ALBEF 77.6 94.3 97.2 60.7 84.3 90.5 paper, code, blog
3 ALIGN 77.0 93.5 96.9 59.9 83.3 89.8 paper
4 VinVL 75.4 92.9 96.2 58.8 83.5 90.3 paper, code
5 OSCAR 73.5 92.2 96.0 57.5 82.8 89.8 paper, code
6 UNITER 65.7 88.6 93.8 52.9 79.9 88.0 paper, code

Auto-Downloading

cd lavis/datasets/download_scripts && python download_coco.py

References

"Microsoft COCO Captions: Data Collection and Evaluation Server", Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Dollar, C. Lawrence Zitnick