Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 3.68 KB

flickr_retrieval.md

File metadata and controls

34 lines (25 loc) · 3.68 KB

Samples from Flickr30k dataset (Image credit: "https://bryanplummer.com/Flickr30kEntities/").Samples from Flickr30k dataset (Image credit: "https://bryanplummer.com/Flickr30kEntities/")

Flickr30K Dataset (Retrieval)

Description

Flickr30k dataset contains 31k+ images collected from Flickr, together with 5 reference sentences provided by human annotators.

Task

Cross modal retrieval: (1) image-text: given an image as query, retrieve texts from a gallery; (2) text-image: given a text as query, retrieval images from a gallery.

Metrics

Common metrics are recall@k, denotes the recall score after k retrieval efforts.

We use TR to denote the image-text retrieval recall score and IR to denote text-image retrieval score.

Leaderboard

(Ranked by TR@1.)

Rank Model TR@1 TR@5 TR@10 IR@1 IR@5 IR@10 Resources
1 BLIP 97.2 99.9 100.0 87.5 97.7 98.9 paper, code, demo, blog
2 X-VLM 97.1 100.0 100.0 86.9 97.3 98.7 paper, code
3 ALBEF 95.9 99.8 100.0 85.6 97.5 98.9 paper, code, blog
4 ALIGN 95.3 99.8 100.0 84.9 97.4 98.6 paper
5 VILLA 87.9 97.5 98.8 76.3 94.2 96.8 paper, code
6 UNITER 87.3 98.0 99.2 75.6 94.1 96.8 paper, code

Auto-Downloading

cd lavis/datasets/download_scripts && python download_flickr.py

References

Bryan A. Plummer, Liwei Wang, Christopher M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik, Flickr30K Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models, IJCV, 123(1):74-93, 2017. [paper]