A quality-diversity generative sampling (QDGS) implementation utilizing quality-diversity optimization to create synthetic data for learning representations. This framework uses CMA-MAEGA and text prompts for fine-grained guidance over the optimal quality objective and measures of diversity of synthetic data sampling, without re-parameterization or fine-tuning of generative models. With synthetic datasets generated by QDGS, we debias color-biased shape and facial recognition classifiers.
This project builds in Anaconda.
Once installed, create and activate the conda environment:
$ conda env create -f environment.yml
For reproducibility, a modified version of the Adaface repository is included. To run the facial recognition experiments, you should install the required dependencies:
$ cd facial_recognition
$ pip install -r AdaFace/requirements.txt
Shapes Domain: The pretrained generator weights are provided in the repository and will be automatically loaded.
Facial Recognition Domain:
To run the facial recognition experiments, you must first download stylegan2-ffhq-256x256.pkl
from the NVIDIA website. Place the .pkl
file in the folder facial_recognition/pretrained
.
The QDGS code builds on the LSI (StyleGAN2) experiments from the CMA-MAE repository, which includes dnnlib and torch_util from the StyleGAN2-Ada repository for replicability, the StyleGAN3+CLIP notebook and repository from the generative art community. We include a modified version of the facial recognition training code from the Adaface repository.
To generate data, activate the conda environment and run the generate_data.py
script with the desired task argument (shapes
or facial_recognition
):
$ conda activate qdgs_exps
$ python3 generate_data.py --task [task]
To train and evaluate the shapes classifier, enter the shapes directory, and run the training script. You will need to pass the name of the data directory as an argument---you can find this under shapes/data
after you have ran the data generation script.
$ cd shapes
$ sh train_eval.sh [data-directory]
The following instructions are adapted from the Adaface repository.
- Download the desired dataset from the Insightface links.
- Unzip the dataset to
facial_recognition/data/faces_real
. - For preprocessing run
$ cd facial_recognition
$ python AdaFace/convert.py --rec_path data/faces_real --make_validation_memfiles
To train and evaluate the facial recognition classifier, enter the facial recognition directory, and run the training script. You will need to pass the name of the data directory as an argument---you can find this under facial_recognition/data
after you have ran the data generation script.
$ cd facial_recognition
$ sh train_eval.sh [data-directory]
@inproceedings{chang2024quality,
title={Quality-Diversity Generative Sampling for Learning with Synthetic Data},
author={Chang, Allen and Fontaine, Matthew C and Booth, Serena and Matari{\'c}, Maja J and Nikolaidis, Stefanos},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2024}
}