ColPali: Efficient Document Retrieval with Vision Language Models 👀

[Model card] [ViDoRe Benchmark] [ViDoRe Leaderboard] [Demo] [Blog Post]

Tip

If you want to try the pre-trained ColPali on your own documents, you can use the vidore-benchmark repository. It comes with a Python package and a CLI tool for convenient evaluation. You can also use code provided in the model cards on the hub.

Associated Paper

This repository contains the code used for training the vision retrievers in the ColPali: Efficient Document Retrieval with Vision Language Models paper. In particular, it contains the code for training the ColPali model, which is a vision retriever based on the ColBERT architecture.

Setup

We used Python 3.11.6 and PyTorch 2.2.2 to train and test our models, but the codebase is expected to be compatible with Python >=3.9 and recent PyTorch versions.

The eval codebase depends on a few Python packages, which can be downloaded using the following command:

pip install colpali-engine

To keep a lightweight repository, only the essential packages were installed. In particular, you must specify the dependencies to use the training script for ColPali. You can do this using the following command:

pip install "colpali-engine[train]"

Warning

For ColPali versions above v1.0, make sure to install the colpali-engine package from source or with a version above v0.2.0.

Usage

The scripts/ directory contains scripts to run training and inference.

Inference

While there is an inference script in this repository, it's recommended to run inference using the vidore-benchmark package.

Training

All the model configs used can be found in scripts/configs/ and rely on the configue package for straightforward configuration. They should be used with the train_colbert.py script.

Example 1: Local training

USE_LOCAL_DATASET=0 python scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml

or using accelerate:

accelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml

Example 2: Training on a SLURM cluster

sbatch --nodes=1 --cpus-per-task=16 --mem-per-cpu=32GB --time=20:00:00 --gres=gpu:1  -p gpua100 --job-name=colidefics --output=colidefics.out --error=colidefics.err --wrap="accelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml"

sbatch --nodes=1  --time=5:00:00 -A cad15443 --gres=gpu:8  --constraint=MI250 --job-name=colpali --wrap="python scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml"

Paper result reproduction

To reproduce the results from the paper, you should checkout to the v0.1.1 tag or install the corresponding colpali-engine package release using:

pip install colpali-engine==0.1.1

Citation

ColPali: Efficient Document Retrieval with Vision Language Models

Authors: Manuel Faysse*, Hugues Sibille*, Tony Wu*, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo

(* Denotes Equal Contribution)

@misc{faysse2024colpaliefficientdocumentretrieval,
      title={ColPali: Efficient Document Retrieval with Vision Language Models}, 
      author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
      year={2024},
      eprint={2407.01449},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2407.01449}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.github/workflows		.github/workflows
colpali_engine		colpali_engine
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ColPali: Efficient Document Retrieval with Vision Language Models 👀

Associated Paper

Setup

Usage

Inference

Training

Paper result reproduction

Citation

About

Releases

Packages

Languages

License

sunholo-data/colpali

Folders and files

Latest commit

History

Repository files navigation

ColPali: Efficient Document Retrieval with Vision Language Models 👀

Associated Paper

Setup

Usage

Inference

Training

Paper result reproduction

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages