GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | ICLR'23

Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao | Zhejiang University, ByteDance

This repository is the official PyTorch implementation of our ICLR-2023 paper, in which we propose GeneFace for generalized and high-fidelity audio-driven talking face generation. The inference pipeline is as follows:

Our GeneFace achieves better lip synchronization and expressiveness to out-of-domain audios. Watch this video for a clear lip-sync comparison against previous NeRF-based methods. You can also visit our project page for more details.

🔥MimicTalk Released

We have released the code of MimicTalk (https://github.com/yerfor/MimicTalk/), which is a SOTA NeRF-based person-specific talking face method and achieves better visual quality and enables talking style control.

GeneFace++ Released

We have released the code of GeneFace++ (https://github.com/yerfor/GeneFacePlusPlus/), which is a upgraded version of GeneFace and achieves better lip-sync, video qaulity, and system efficiency.

Update:

2023.3.16 We release a big update in this release, a video demo is here. including: 1) RAD-NeRF-based renderer, which could infer in real-time and be trained in 10 hours. 2) pytorch-based deep3d_reconstruction module, which is easier to install and is 8x faster than the previous TF-based version. 3) pitch-aware audio2motion module which could generate more lip-sync landmark. 4) fix some bugs that cause large memory usage. 5) We will upload the paper about this update soon.
2023.2.22 We release a 1 minute-long demo video, in which GeneFace is driven by a Chinese song generated by DiffSinger.
2023.2.20 We release a stable 3D landmark post-processing strategy in inference/nerfs/lm3d_nerf_infer.py, which improve the stability and quality of the final results by a large margin.

Quick Started!

We provide pre-trained models and processed datasets of GeneFace in this release to enable a quick start. In the following, we show how to infer the pre-trained models in 4 steps. If you want to train GeneFace on your own target person video, please reach to the following sections (Prepare Environments, Prepare Datasets, and Train Models).

Step1. Create a new python env named geneface following the guide in docs/prepare_env/install_guide.md.
Step2. Download the lrs3.zip and May.zip in the release and unzip it into the checkpoints directory.
Step3. Process the dataset of May.mp4 following the guide in docs/process_data/process_target_person_video.md. Then you can see a output file named data/binary/videos/May/trainval_dataset.npy.

After the above steps, the structure of your checkpoints and data directory should look like this:

> checkpoints
    > lrs3
        > lm3d_vae_sync
        > syncnet
    > May
        > lm3d_postnet_sync
        > lm3d_radnerf
        > lm3d_radnerf_torso
> data
    > binary
        > videos
            > May
                trainval_dataset.npy

Step4. Run the scripts below:

bash scripts/infer_postnet.sh
bash scripts/infer_lm3d_radnerf.sh
# bash scripts/infer_radnerf_gui.sh # you can also use GUI provided by RADNeRF

You can find a output video named infer_out/May/pred_video/zozo.mp4.

Prepare Environments

Please follow the steps in docs/prepare_env.

Prepare Datasets

Please follow the steps in docs/process_data.

Train Models

Please follow the steps in docs/train_models.

Train GeneFace on other target person videos

Apart from the May.mp4 provided in this repo, we also provide 8 target person videos that were used in our experiments. You can download them at this link. To train on a new video named <video_id>.mp4, you should place it into the data/raw/videos/ directory, then create a new folder at egs/datasets/videos/<video_id> and edit config files, according to the provided example folder egs/datasets/videos/May.

You can also record your own video and train a unique GeneFace model for yourself!

Citation

@article{ye2023geneface,
  title={GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis},
  author={Ye, Zhenhui and Jiang, Ziyue and Ren, Yi and Liu, Jinglin and He, Jinzheng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2301.13430},
  year={2023}
}

Acknowledgements

Our codes are based on the following repos:

NATSpeech (For the code template)
AD-NeRF (For NeRF-related data preprocessing and vanilla NeRF implementation)
RAD-NeRF (For RAD-NeRF implementation)
Deep3DFaceRecon_pytorch (For 3DMM parameters extraction)

Name	Name	Last commit message	Last commit date
Latest commit yerfor Update README.md Oct 18, 2024 15ff4e5 · Oct 18, 2024 History 173 Commits
assets	assets	update	Mar 16, 2023
checkpoints	checkpoints	init	Feb 3, 2023
data	data	update data	Feb 3, 2023
data_gen	data_gen	Reduce memory footprint,Change prompt statement, and adapt resolution	Jun 21, 2024
data_util	data_util	update doc, fix a bug	Mar 18, 2023
deep_3drecon	deep_3drecon	clean code	Mar 15, 2023
docker	docker	add dockerfile	May 7, 2023
docs	docs	Reduce memory footprint,Change prompt statement	Dec 16, 2023
egs	egs	update config	Mar 29, 2023
inference	inference	fix a smoothing bug that may affect lip-sync	Apr 7, 2023
modules	modules	update	Mar 14, 2023
scripts	scripts	update script	Mar 22, 2023
tasks	tasks	update postnet_ds num_workers	Mar 30, 2023
utils	utils	fix a bug in lm_visualizer	Mar 24, 2023
.gitignore	.gitignore	remove TF-based deep3d_recon	Mar 15, 2023
LICENSE	LICENSE	add license	Feb 14, 2023
README-zh.md	README-zh.md	remove redundant files	Mar 20, 2023
README.md	README.md	Update README.md	Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | ICLR'23

Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao | Zhejiang University, ByteDance

🔥MimicTalk Released

GeneFace++ Released

Update:

Quick Started!

Prepare Environments

Prepare Datasets

Train Models

Train GeneFace on other target person videos

Citation

Acknowledgements

About

Releases 2

Packages

Contributors 4

Languages

License

yerfor/GeneFace

Folders and files

Latest commit

History

Repository files navigation

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | ICLR'23

Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao | Zhejiang University, ByteDance

🔥MimicTalk Released

GeneFace++ Released

Update:

Quick Started!

Prepare Environments

Prepare Datasets

Train Models

Train GeneFace on other target person videos

Citation

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

Packages