EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Zhiyuan Chen^* Jiajiong Cao^* Zhiquan Chen Yuming Li Chenguang Ma

*Equal Contribution.

Terminal Technology Department, Alipay, Ant Group.

📣 📣 Updates

24.07.12

(1) Gradio or Webui version

There are numerous developers actively engaged in projects centered around EchoMimic, and we are compelled to express our profound gratitude for their invaluable contributions. In acknowledgment of their efforts, we are pleased to highlight a selection of exemplary repositories below. These repositories have significantly augmented the capabilities of EchoMimic, thereby enhancing its potency and versatility in application.

WebUi version from @greengerong : https://github.com/greengerong/EchoMimic

Gradio UI commit from @Robin021 : https://github.com/BadToBest/EchoMimic/blob/main/webgui.py

Code contribution in issue from @O-O1024 : antgroup#22

(2) Our Paper is Released!

Arxiv link:https://arxiv.org/abs/2407.08136

Gallery

Audio Driven (Sing)

s_01.mp4

s_02.mp4

s_03.mp4

Audio Driven (English)

en_01.mp4

en_03.mp4

en_05.mp4

Audio Driven (Chinese)

ch_02.mp4

ch_03.mp4

ch_04.mp4

Landmark Driven

po_01.mp4

po_02.mp4

po_03.mp4

Audio + Selected Landmark Driven

ap_04.mp4

ap_05.mp4

ap_06.mp4

（Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.）

Installation

Download the Codes

  git clone https://github.com/BadToBest/EchoMimic
  cd EchoMimic

Python Environment Setup

Tested System Environment: Centos 7.2/Ubuntu 22.04, Cuda >= 11.7
Tested GPUs: A100(80G) / RTX4090D (24G) / V100(16G)
Tested Python Version: 3.8 / 3.10 / 3.11

Create conda environment (Recommended):

  conda create -n echomimic python=3.8
  conda activate echomimic

Install packages with pip

  pip install -r requirements.txt

Download ffmpeg-static

Download and decompress ffmpeg-static, then

export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static

Download pretrained weights

git lfs install
git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights

The pretrained_weights is organized as follows.

./pretrained_weights/
├── denoising_unet.pth
├── reference_unet.pth
├── motion_module.pth
├── face_locator.pth
├── sd-vae-ft-mse
│   └── ...
├── sd-image-variations-diffusers
│   └── ...
└── audio_processor
    └── whisper_tiny.pt

In which denoising_unet.pth / reference_unet.pth / motion_module.pth / face_locator.pth are the main checkpoints of EchoMimic. Other models in this hub can be also downloaded from it's original hub, thanks to their brilliant works:

Audio-Drived Algo Inference

Run the python inference script:

  python -u infer_audio2vid.py
  python -u infer_audio2vid_pose.py

Audio-Drived Algo Inference On Your Own Cases

Edit the inference config file ./configs/prompts/animation.yaml, and add your own case:

test_cases:
  "path/to/your/image":
    - "path/to/your/audio"

The run the python inference script:

  python -u infer_audio2vid.py

Run the Gradio UI

Thanks to the contribution from @greengerong:

python @Robin021 --server_port=3000

Release Plans

Status	Milestone	ETA
✅	The inference source code of the Audio-Driven algo meet everyone on GitHub	9th July, 2024
✅	Pretrained models trained on English and Mandarin Chinese to be released	9th July, 2024
🚀	The inference source code of the Pose-Driven algo meet everyone on GitHub	15th July, 2024
✅	Pretrained models with better pose control to be released	13th July, 2024
🚀	Pretrained models with better sing performance to be released	TBD
🚀	Accelerated models to be released	TBD
🚀	Large-Scale and High-resolution Chinese-Based Talking Head Dataset	TBD

Acknowledgements

We would like to thank the contributors to the AnimateDiff, Moore-AnimateAnyone and MuseTalk repositories, for their open research and exploration.

We are also grateful to V-Express and hallo for their outstanding work in the area of diffusion-based talking heads.

If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.

Citation

If you find our work useful for your research, please consider citing the paper :

@misc{chen2024echomimic,
  title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
  author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
  year={2024},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
configs		configs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
infer_audio2vid.py		infer_audio2vid.py
requirements.txt		requirements.txt
webgui.py		webgui.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

📣 📣 Updates

24.07.12

(1) Gradio or Webui version

(2) Our Paper is Released!

Gallery

Audio Driven (Sing)

Audio Driven (English)

Audio Driven (Chinese)

Landmark Driven

Audio + Selected Landmark Driven

Installation

Download the Codes

Python Environment Setup

Download ffmpeg-static

Download pretrained weights

Audio-Drived Algo Inference

Audio-Drived Algo Inference On Your Own Cases

Run the Gradio UI

Release Plans

Acknowledgements

Citation

About

Releases

Packages

Languages

License

wangyichen191/EchoMimic

Folders and files

Latest commit

History

Repository files navigation

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

📣 📣 Updates

24.07.12

(1) Gradio or Webui version

(2) Our Paper is Released!

Gallery

Audio Driven (Sing)

Audio Driven (English)

Audio Driven (Chinese)

Landmark Driven

Audio + Selected Landmark Driven

Installation

Download the Codes

Python Environment Setup

Download ffmpeg-static

Download pretrained weights

Audio-Drived Algo Inference

Audio-Drived Algo Inference On Your Own Cases

Run the Gradio UI

Release Plans

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages