Skip to content

Cylumn/qd-generative-sampling

Repository files navigation

Quality-Diversity Generative Sampling

A quality-diversity generative sampling (QDGS) implementation utilizing quality-diversity optimization to create synthetic data for learning representations. This framework uses CMA-MAEGA and text prompts for fine-grained guidance over the optimal quality objective and measures of diversity of synthetic data sampling, without re-parameterization or fine-tuning of generative models. With synthetic datasets generated by QDGS, we debias color-biased shape and facial recognition classifiers.

Quality-Diversity Generative Sampling

Installation

This project builds in Anaconda.

Once installed, create and activate the conda environment:

$ conda env create -f environment.yml

Facial Recognition Dependencies

For reproducibility, a modified version of the Adaface repository is included. To run the facial recognition experiments, you should install the required dependencies:

$ cd facial_recognition
$ pip install -r AdaFace/requirements.txt

Pretrained Models and Additional Code

Shapes Domain: The pretrained generator weights are provided in the repository and will be automatically loaded.

Facial Recognition Domain: To run the facial recognition experiments, you must first download stylegan2-ffhq-256x256.pkl from the NVIDIA website. Place the .pkl file in the folder facial_recognition/pretrained.

The QDGS code builds on the LSI (StyleGAN2) experiments from the CMA-MAE repository, which includes dnnlib and torch_util from the StyleGAN2-Ada repository for replicability, the StyleGAN3+CLIP notebook and repository from the generative art community. We include a modified version of the facial recognition training code from the Adaface repository.

Data Generation

To generate data, activate the conda environment and run the generate_data.py script with the desired task argument (shapes or facial_recognition):

$ conda activate qdgs_exps
$ python3 generate_data.py --task [task]

Shape Experiments

Training and Evaluation

To train and evaluate the shapes classifier, enter the shapes directory, and run the training script. You will need to pass the name of the data directory as an argument---you can find this under shapes/data after you have ran the data generation script.

$ cd shapes
$ sh train_eval.sh [data-directory]

Facial Recognition Experiments

Real Dataset Download

The following instructions are adapted from the Adaface repository.

  1. Download the desired dataset from the Insightface links.
  2. Unzip the dataset to facial_recognition/data/faces_real.
  3. For preprocessing run
$ cd facial_recognition
$ python AdaFace/convert.py --rec_path data/faces_real --make_validation_memfiles

Training and Evaluation

To train and evaluate the facial recognition classifier, enter the facial recognition directory, and run the training script. You will need to pass the name of the data directory as an argument---you can find this under facial_recognition/data after you have ran the data generation script.

$ cd facial_recognition
$ sh train_eval.sh [data-directory]

Citation

@inproceedings{chang2024quality,
  title={Quality-Diversity Generative Sampling for Learning with Synthetic Data},
  author={Chang, Allen and Fontaine, Matthew C and Booth, Serena and Matari{\'c}, Maja J and Nikolaidis, Stefanos},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published