The following are the pseudo-language label generation module that generates pseudo-language labels in an unsupervised setting.
The single-source scenario includes a pseudo-template label derived from Pseudo-Q.
Multi-source scenario include pseudo-template labels, pseudo-relation labels, and pseudo-caption labels, which derived from Pseudo-Q, RelTR, CLIPCap / M2, respectively.
the generation of pseudo-template label are detailed provided in pseudo_template_label_generation.
First, you should complete the environment preparation of the RelTR model as instructed by RelTR README. Simultaneously, ensure that the visual grounding image data preparation is completed and download the split subset of Pseudo-Q according to the dataset split in Pseudo-Q README. As the above is complete, replace the dataset and output directory in inference_gen_pseudo_relation_label.py. Finally, run inference_gen_pseudo_relation_label.py by using following instruction:
python inference_gen_pseudo_relation_label.py
the generation of pseudo-caption labels are detailed provided in pseudo_relation_label_generation.
-
First, you should complete the environment preparation of the CLIPcap model as instructed by CLIP_prefix_caption README. Simultaneously, ensure that the visual grounding image data preparation is completed and download the split subset of Pseudo-Q according to the dataset split in Pseudo-Q README. As the above is complete, replace the dataset and output directory in clip_prefix_captioning_for_dataset.py. Finally, run clip_prefix_captioning_for_dataset.py by using following instruction:
python clip_prefix_captioning_for_dataset.py
-
Since the generated caption does not include bounding box information, we need to use a language parser such as Spacy to parse the generated caption and extract the subject. Then, we pair the subject with the object detection label which used in the pseudo-template label. If successful, we match the caption with the corresponding bounding box of the object detection label.
-
Therefore, First, replace file path with your dataset and output directory in caption_generation.py, and verify the correctness of the code according to one of the instructions in generate_caption_data_all.sh. After passing validation, use the following script instruction to pair captions and bounding box for all datasets:
bash generate_caption_data_all.sh
The implementation pipeline for M2 is the same as above.