Face editing with Style GAN 2 and facial segmentation (Ceteris Paribus Face Challenge Intercentrales 2022)
NOTE : This is work based on the winner team repository of Inter-Centrales 2022 AI competition: Ceteris Paribus Face Challenge: site of the competition plus the work of the second team composed by Thibault Le Sellier De Chezelles and Hédi Razgallah (original work can be found here.
This repository and the repositories it contains are licensed under the MIT license.
This repository uses third-party works:
- anycost-gan (MIT license)
- face-parsing.PyTorch (MIT license)
- DensePredictionTransformer / DPT (MIT license)
- encoder4editing
Licenses are provided in third_party_licenses
.
- Project images in latent space
- Modify bounds of the direction
- Find translations in latent space
- Saving pipeline for direction
- Solve bald issue
- Detect and improve bad translations
- Solve specific issues (manually or with other methods)
- Focused change by semantic segmentation
- Resolve some artifacts and background problems with depth estimation
- Refactor the code to make it more convenient
- Add a convenient config system
- Improve realism with GFP GAN
- Look for other repo to solve skin and age in data_challenge branch
- Fix artifact bugs (IN PROGRESS 🚧)
- Allow using GFP GAN and depth estimation on a subset of transformations/images
- Improve depth estimation speed with a lighter model
- Test GAN retraining and extract direction features
Note: you need a Nvidia GPU to run the processing. Only editor API with pre-computed latent vectors is available with CPU.
Clone this repository and pull the models required from postprocessing via LFS:
git clone [email protected]:valentingol/gan-face-editing.git
cd gan-face-editing
git lfs pull
Then, create a new virtual environment and install all the required packages:
pip install -e .
pip install -r requirements.txt
NOTE: to run on multiple GPUs, you should also install horovod (Mac or Linux only).
The original dataset of the competition is available here: drive dataset
Unzip the content in a folder called data/face_challenge
.
By default, a depth estimation is performed to correct artifacts of the backgrounds. You need download the model for depth estimation here: download and put it on postprocess/depth_segmentation/model/
.
Once the data are downloaded, you must compute the projected latent vectors of the images. It can take some time to compute as the script optimize the latent vector through multiple gradient descent steps but you can significantly reduce the time by reducing the number of iterations in configurations (0 iteration mean that you get the latent vector computed by the pre trained encoder). By default, it is 200 iterations.
python apps/project_images.py [--projection.n_iter=<n_iter>]
Optionally, can run following the script to launch the image editing API.
# if you have a Nvidia GPU:
FORCE_NATIVE=1 python editor_API.py
# otherwise:
python apps/editor.py
In this API you can visualize and edit the reconstructed images using attribute cursors to build the changes you want. Default translations are already available in this repository but you can edit your own with the button "Save trans". The translations will be saved at projection/run1/translations_vect
. You can also save the edited images with the button "Save img". The images will be saved in projection/run1/images_manual_edited
. You can have more information about creating your own translation in the "Save your own translations" section. Note that this repository already provide a lot of preconstructed translations.
You can now run the full pipeline to apply automatic translation on all the input images and apply three steps of postprocessing (in this order):
- encoder4editing GAN
- GFP GAN
- segmentation mixup (using ResNet)
- depth segmentation mixup (using ViT)
FORCE_NATIVE=1 python apps/run_pipeline.py
All steps of the pipeline can be run individually and the results after all steps are saved in res/run1
.
You can avoid using the pre-computed translation with --translation.use_precomputed=False
. Then, the only translations that will be used are the ones you have created with the editor API.
The pipeline of transformations include many transformations. All of them can be run individually even if the project was designed to work with apps/run_pipeline.py
in order to be consistent with custom configurations (see "Configurations" section).
IMPORTANT: To run encoder4editing you must download the pre trained model here: e4e_ffhq_encode.pt and put in in postprocess/encoder4editing/model/e4e_ffhq_encode.pt
.
The encoder4editing (e4e) encoder is specifically designed to complement existing image manipulation techniques performed over StyleGAN's latent space. It is based on the following paper: Designing an Encoder for StyleGAN Image Manipulation. It is used instead of AnycostGAN for some transformation if needed. By default, no transformation are made with this encoder.
You can run individually this step with:
python pipeline.encoder4editing.py
To enhance the quality of the areas of the face modified, you can incorporate a ready-to-use GAN, GFP-GAN. It was specifically trained to restore the quality of ancient images of faces, and is open-sourced : GFP GAN.
However, it only works well on faces, so you might want to use segmentation and use GFP-GAN only on the face part, or even the mask of the area of modifications.
In order to use it, you first have to download a trained model, with git lfs pull
or following this link, and store it, in postprocess/gpf_gan/model
. The code that calls it and runs this part of the pipeline is in pipeline/gpf_gan.py
.
To go further in the last idea, we apply a semantic segmentation on the original image and the edited images in order to find for each image and for each transformation the area where we expect to find the change. First you need to get the segmentation model by Git LFS (see Git LFS for details):
git lfs pull
Note than you can use an other model you want as long as it is compatible with the models used by face-parsing.PyTorch. You should modify the configurations to set the path of your new model.
Then you can run the following script to merge the previously edited images with the original ones by semantic segmentation:
python pipeline/segment.py
By default, the resulting images are in res/run1/output_images
Imortant: The translations need to have a valid prefix in their name to apply the segmentation. See the table in the section Save your own translations for more information.
In order to improve the segmentation, we perform depth estimation to correct artifacts of the backgrounds for the images coming from the segmentation. First, you need get the model for depth estimation. You can download it here: download and put it on postprocess/depth_segmentation/model/
.
The idea is to make a depth estimation with a transformer model (see DPT). Then we build the foreground mask with a K-means algorithm. This allows to extract a relevant foreground from the segmented image and to paste it on the background of the original image.
Then you can run the following script to merge the previously edited images with the original ones by depth estimation:
python pipeline/depth_segmentation.py
By default, the resulting images are in res/run1/output_images
This project use the rr-ml-config configuration system to configure all the main functions (in apps/
). You can find all the configurations on yaml files in the config/
folder. The default configs are in config/default/
folder and split on multiple files (creating multiple sub-configs) to make it easier to use. Then, you can create your own experiments by editing a file config/exp/my_exp.yaml
and add lines to overwriting some default configs (an example is provided in config/exp/base.yaml
, that is used by default). Then you can run the apps/
functions using your experiment configs. For instance:
FORCE_NATIVE=1 python apps/project_images.py --config config/exp/my_exp.yaml
FORCE_NATIVE=1 python apps/editor.py --config config/exp/my_exp.yaml
FORCE_NATIVE=1 python apps/run_pipeline.py --config config/exp/my_exp.yaml
You can also change all the configurations with command line arguments and combine the two. For instance:
python apps/project_images.py --config config/exp/my_exp.yaml --projection.n_iter=50 --projection.enc_reg_weight=0.5
FORCE_NATIVE=1 python apps/editor.py --config config/exp/my_exp.yaml --editor.n_style_to_change=8
FORCE_NATIVE=1 python apps/run_pipeline.py --config config/exp/my_exp.yaml --segmentation.margin=10
This make your experiments very convenient because you can set your main configurations in your configuration file and you no longer need to write all your configurations in the command line.
All your configurations are saved that allow full reproducibility of your experiments.
Each time you want to create a new experiments configuration, you need to overwrite the projection dir with the name of your "projection run" (e.g. projection/run_with_1000_iter
), the result dir (with the name of your "pipeline run" (e.g. res/test_with_domain_mixup
), and the save path with the name of your "global run", (e.g. configs/runs/1000_iter_proj_with_domain_mixup
). By default, it is "run1", "run1" and "run1&1" respectively.
You can modify your own dataset of images by create a new folder data/my_dataset_name
and add it in your configs:
# configs/exp/
data_dir: data/my_dataset_name
Then, you need to align the faces on your images following the FFHQ alignement and resize them to 512 * 512 pixels. First you need to install dlib and downoad the shape predictor, extract it, and place it in anycostgan/shape_predictor/shape_predictor_68_face_landmarks.dat
. Then you can run the following script. Note that data will be transformed in data_dir
in place so save a backup if you want before.
python apps/preprocess_images.py --config configs/exp/<exp_config_file>.yaml
Now you can project the images, use the editor and run your pipeline of transformations such as the section above.
To save the translations (= latent direction) you want in the projection/run1
folder, you can click on the "Save trans" button and set the name of the direction. In all cases, the pipeline will try to use your new translation to modify the images.
You can use this project for two purposes:
- apply automatically some changes in all the images with the same manner (by default)
- use already known characteristics of the images to adapt the changes (use
--translation.use_caracs_in_img=True
when running the pipeline or edit you configuration file, see "Configurations" section for details)
In this second case, the images should have a specific name indicating the characteristics of the image. The name should follow the rule presented here. Moreover, the translations vector should also have a specific name.
-
for 'cursor' transformation (min - max):
{carac}_{'min' or 'max'}
. ExampleBe_min
means cursor transformation to minimal bags under the eyes (Be_min). -
for default transformation (transformation that doesn't depend on the current value of the characteristic):
{carac}_{new_value}
. Example:Hc_0
means default transformation to black hair color (Hc_0) -
for specific transformation (transformation that depends on the current value of the characteristic) that will overwrite default transformation:
{carac}_{new_value}_fr_{old_value}
. ExampleA_0_fr_1
means specific transformation from intermediate age (A_1) to young age (A_0), 'fr' means 'from'.
All required transformations must be treated for all input characteristics to make a valid submission. Note that this repository contains enough default translations to make a submission.
Now, given an input image, the transformations to create characteristics that are not in the initial image can automatically be processed with the function pipeline/utils/translation/get_translations.py
.
Note that if you not used the characteristics in the image, you can name the translation vectors as you want.
Important: To use the semantic segmentation mixup, the algorithm should understand the part of the face you want to modify (and the part of the image you want to preserve from the original image). To do so, you need to add a particular prefix for the translation vectors: for instance for eyes change, you need to use the prefix 'N'. For instance N_0
is a valid translation that will only edit the eyes. The table of prefix is:
Part to modify | Prefix |
---|---|
All the face | A, Ch, Se or Sk |
Hair | B, Hc or Hs |
Nose | Pn |
Eyes | Bn or N |
Just under eyes | Be |
Lips | Bp |
If you don't use one of the prefix above, no segmentation mixup will be applied. You can use the prefix you want for custom transformations. More intuitive prefix will be available later (e.g 'eyes' for eyes, 'face' for all the face ...).
🚧
To retrain the GAN you need to install horovod (Mac or Linux only).
🚧