python == 3.7
- build folders like:
audio2pose ├── codes │ └── audio2pose ├── datasets │ ├── trinity │ └── s2g └── outputs └── audio2pose ├── custom └── wandb
- download the framework scripts from BEAT to
codes/audio2pose/
- run
pip install -r requirements.txt
in the path./codes/audio2pose/
- download trinity dataset to
datasets/trinity
- bulid data cache and calculate mean and std by given
number of joints
,FPS
,speakers
using/dataloader/preprocessing.ipynb
- put
disco.py
under./audio2pose/model/
and customizedisco_trainer.py
for contrastive learning. - run
python train.py -c ./configs/disco_trinity_ae.yaml
for pretrained_ae for FID calculation. - run
python train.py -c ./configs/disco_trinity.yaml
for training. - run
python test.py -c ./configs/disco_trinity.yaml
for inference. - load
./outputs/audio2pose/custom/exp_name/epoch_number/xxx.bvh
into blender to visualize the test results.
- refer
train and test DisCo
for bvh cache - set
dataset: trinity
in.yaml
DisCo is established for the following research project.
@inproceedings{liu2022disco,
title={DisCo: Disentangled Implicit Content and Rhythm Learning for Diverse Co-Speech Gestures Synthesis},
author={Liu, Haiyang and Iwamoto, Naoya and Zhu, Zihao and Li, Zhengqing and Zhou, You and Bozkurt, Elif and Zheng, Bo},
booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
pages={3764--3773},
year={2022}
}