Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models

An official repo for the Spotlight Neurips 2023 paper :)

Arxiv: https://arxiv.org/abs/2305.19595

Environment

conda deactivate # deactivate any active environments
conda create -n dac python=3.8.13 # install the conda environment with conda dependencies
conda activate dac # activate the environment
conda install -c conda-forge libjpeg-turbo
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3.1 -c pytorch
pip install -r requirements.txt

Data Preparations and creation

Training Data

Download CC3M data

Download Conceptual Captions 3M training and validation splits from https://ai.google.com/research/ConceptualCaptions/download
After data preparation, place the data in DAC/CC3M_data/training and DAC/CC3M_data/validation

Download and place in DAC/CC3M_data/ train_with_cap.csv and val_with_cap.csv from https://drive.google.com/drive/folders/1WosT_kdam1ymWjVSK2ezyydLoqmm0LdX?usp=sharing

Evaluation data

Prepare vl checklist dataset as described in https://github.com/om-ai-lab/VL-CheckList/blob/main/DATASETS.md
Then move the vl dataset to DAC/vl_datasets/
If you followed the instructions correctly, you should have the following folders inside vl_datasets: 'hake', 'swig', 'vg'.

First, navigate to the src directory:

cd src

Create quality captions:

mkdir DAC/quality_captions/
python3 training/main.py --create_quality_captions --save_data --batch-size 1 --workers 0

Create Dense captions:

mkdir DAC/SAM_dense/
python3 training/main.py --create_SAM --save_data --batch-size 1 --workers 0 --model_SAM /path/to/sam_vit_h_4b8939.pth

mkdir DAC/LLM_dense/
python3 create_LLM_dense.py

Evaluation data

Prepare vl checklist dataset as described in https://github.com/om-ai-lab/VL-CheckList/blob/main/DATASETS.md
Then move the vl dataset to DAC/vl_checklist_images_root_folder/
If you followed the instructions correctly, you should have the following folders inside vl_datasets: 'hake', 'swig', 'vg'.

prepare aro dataset as described in https://github.com/mertyg/vision-language-models-are-bows Then move the aro dataset to DAC/aro/

Train with Quality and Dense data

Run the training script

The model will be saved in DAC/Outputs/exp_name/checkpoints

To train a network with quality captions and:

SAM density:

python3 training/main.py --epochs 5 --name exp_name --lora 4 --use_only_quality_captions --batch-size 32 --mil_dense_negs --vl_negs --neg_type rand_both --auto_neg_types NOUN ADP ADJ VERB --mil_batch 10 --pretrained openai --mil_dense ../SAM_dense/

LLM density:

python3 training/main.py --epochs 5 --name exp_name --lora 4 --use_only_quality_captions --batch-size 32 --mil_dense_negs --vl_negs --neg_type rand_both --auto_neg_types NOUN ADP ADJ VERB --mil_batch 10 --pretrained openai --mil_dense ../LLM_dense/

Evaluation

Run the evaluation script

####you can download our checkpoints of DAC_SAM and DAC_LLM from here: https://drive.google.com/drive/folders/1DmHeV8oWiMwtkaTH-nruMyjBiuJvcwnv?usp=sharing

All vl_checklist jsons will be saved in DAC/eval_jsons/clip/exp_name/ and the result will be printed. To prepare the vl checklist evaluate results for the experiment exp_name run the following command:

mkdir vl_checklist_accuracy_jsons_folder
python3 training/main.py  --lora 4 --pretrained openai --eval_vl_cklist --eval_only --resume /path/to/checkpoint --vl_checklist_images_root_folder DAC/vl_checklist_images_root_folder/

To print the aro evaluated results for the experiment exp_name run the following command:

python3 aro_clip_lora_eval.py  --lora 4 --resume /path/to/checkpoint

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
src		src
vl_checklist_annot_data		vl_checklist_annot_data
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py
simsunb.ttf		simsunb.ttf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models

An official repo for the Spotlight Neurips 2023 paper :)

Environment

Data Preparations and creation

Training Data

Download CC3M data

Evaluation data

Create quality captions:

Create Dense captions:

Evaluation data

Train with Quality and Dense data

Run the training script

Evaluation

Run the evaluation script

About

Releases

Packages

Languages

SivanDoveh/DAC

Folders and files

Latest commit

History

Repository files navigation

Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models

An official repo for the Spotlight Neurips 2023 paper :)

Environment

Data Preparations and creation

Training Data

Download CC3M data

Evaluation data

Create quality captions:

Create Dense captions:

Evaluation data

Train with Quality and Dense data

Run the training script

Evaluation

Run the evaluation script

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages