This repository contains the official code for CVPR 2024 paper "MICap: A Unified Model for Identity-aware Movie Descriptions".
Run the following commands to clone the repository
git clone https://github.com/katha-ai/MovieIdentityCaptioner-CVPR2024.git
cd MovieIdentityCaptioner-CVPR2024
Install the required conda environment by running the following command:
conda env create -f conda_env.yml
Create a micap_data folder and fill in the path to this folder at the data_dir
flag in the config_base.yaml file. Now for all the folders below, except for the SPICE Jar file and the Checkpoints, place them in the micap_data folder and put there relative paths into the config file at their specified locations in the instructions column.
Features | Instructions |
---|---|
Clip Features | The unzipped folder path should be filled in for the input_clip_dir flag in the config_base.yaml file |
Face Features | The unzipped folder path should be filled in for the input_arc_face_dir flag in the config_base.yaml file |
I3D Features | The unzipped folder path should be filled in for the input_fc_dir flag in the config_base.yaml file |
Face Clusters | The unzipped file path should be filled in for the input_arc_face_clusters flag in the config_base.yaml file |
MICap Json | The unzipped file path should be filled in for the input_json flag in the config_base.yaml file |
Bert Text Embeddings | The unzipped folder path (fillin_data/bert_text_gender_embedding ) should be filled in for the bert_embedding_dir flag in the config_base.yaml file |
H5 label file | The unzipped file path (LSMDC16_labels_fillin_new_augmented.h5 ) should be filled in for the input_label_h5 flag in the config_base.yaml file |
Tokenizer | The unzipped folder path should be filled in for the tokenizer_path flag in the config_base.yaml file |
SPICE Jar file | The unzipped file path should be placed in the iSPICE directory |
Checkpoints | Folder that contains the various checkpoints for full captioning and joint training full captioning (cider score) and fitb and joint training fitb (class accuracy) |
The run_type
flag in the config_base.yaml
file can be adjusted to determine the task (either fitb
, fc
only, or both) for training MICap.
Make sure the overfit
and checkpoint
flags are set to False
. Also, ensure the path relative to the features from the data directory is correctly set in the config_base.yaml
file.
Once the yaml file is set run the command:python train_mod.py
To evaluate a pretrained model, set the checkpoint
flag to True
in the config_base.yaml
file.
The run_type
flag in the config_base.yaml
file can be adjusted to specify the task for evaluation.
Once the yaml file is set run the command:python train_mod.py
Please consider citing our paper if the project helps your research with the following BibTex:
@inproceedings{raajesh2024micap,
title={MICap: A Unified Model for Identity-aware Movie Descriptions},
author={Raajesh, Haran and Desanur, Naveen Reddy and Khan, Zeeshan and Tapaswi, Makarand},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14011--14021},
year={2024}
}