This repo contains the source code for my project outside course scope at Pioneer Centre for Artificial Intelligence, Denmark.
Project title: Meme-text retrieval: a new dataset and a cross-model embedder
Main supervisor: Serge Belongie
Co-supervisor: Peter Ebert Christensen
Paper: Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes
The proposed dataset is split into training_set.json and validation_set.json. There is a link towards each meme.
We utilized CLIP and LlaVA-1.6 for our experiments. Please refer to their original repositories for details.
The following instructions are for Linux users.
- Clone this repository and navigate to meme_text_retrieval_p1 folder
git clone https://github.com/Seefreem/meme_text_retrieval_p1.git
cd meme_text_retrieval_p1
- Install Packages
conda create -n meme_text python=3.10 -y
conda activate meme_text
pip install --upgrade pip # enable PEP 660 support
pip install -e .
- Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
Run the following command for data annotation:
conda activate meme_text
cd data_annotation
python gpt_4o.py --start-id 0 --dataset meme_text_retrieval --prompt-type gpt-4o-all-data
When you have the responses from GPT-4o, you may use post_processing.ipynb to extract features and check the validity.
Usually, there will be some missing information. We recommend you filter them out and do annotation again.
Run the following command for filtering out templatic memes:
cd evaluation
python get_template_based_memes.py --dataset figmemes
After filtering, the code will generate a HTML file for visualizing the paired templates and instances.
Run the following command to fine-tune CLIP, without hyperparameter searching (you may set "sweep" as True to enable hyperparameter optimization):
cd fine_tune
python fine_tune_clip.py --epochs 20 --warmup-epochs 1 --sweep False
Run the following command to test fine-tuned CLIP on your target dataset:
python retrieval_test.py --test-data "the json file of your dataset" --root 'root directory of images' --text_type meme_captions