An official repository for "A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks"
- FairFace dataset for debiasing embeddings with image inputs: Link
- FACET dataset for zero-shot classification: Link
- Flickr30K dataset for text-to-image retrieval: Link
- COCO2014 caption dataset for image captioning: Link
- Bias-in-bios dataset is included in the script with
dataset
package. - Profession list for text-to-image generation is included in
external/codi
directory.
Run
sh install_packages.sh
Run
sh embedding_preprocessing.sh
--target
: a list of component to debias. Listimage
andtext
for encoder-only structure (zero-shot classification, text-to-image retrieval), andencoder
anddecoder
for generative model (image captioning, text-to-image generaion).--t
: a confidence threshold to select low confidence sample.--{COMPONENT}_prune_num
: the number of features to be pruned for each component.50
to100
is recommended.--mode
: use--mode sfid
for using SFID. Use any other text for the baseline code without debiasing. (e.g.,--mode base
)
python src/run_zc_retrieval.py --target image text --base ViT-B/32 --mode sfid --image_prune_num 50 --text_prune_num 50 --t 0.7
python src/run_zc_retrieval.py --target image text --base RN50 --mode sfid --image_prune_num 50 --text_prune_num 50 --t 0.7
python src/run_clip_cap.py --target decoder --mode sfid --decoder_prune_num 50 --t 0.9
python src/run_blip.py --target decoder --mode sfid --decoder_prune_num 50 --t 0.9
python src/run_sd.py --target decoder --mode sfid --decoder_num 50 --t 0.5
python src/run_codi.py --target decoder --mode sfid --decoder_num 50 --t 0.5