ACMMM '22: Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging

This is the officical PyTorch implementation for the paper Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging.

The main paper of this work is published on ACMMM 2022 as a full paper [acmdl][arxiv]. Please refer to our supplementary material for more information about this paper.

Released Models

We release low resolution and mid resolution versions of SF2F. Our baseline model, voice2face (paper) which generates low resolution images by default, is also released. All released implementations are designed and experimented on HQ-VoxCeleb dataset.

Model	Output Resolution	VGGFace Score
voice2face	64 $\times$ 64	15.47 $\pm$ 0.67
SF2F (no fuser)	64 $\times$ 64	18.59 $\pm$ 0.87
SF2F	64 $\times$ 64	19.49 $\pm$ 0.59
SF2F (no fuser)	128 $\times$ 128	19.31 $\pm$ 0.65
SF2F	128 $\times$ 128	20.10 $\pm$ 0.47

Instructions on the training and testing of above models are introduced in GETTING_STARTED.

Implementation Details

To provide the users of this repo a better understand of our implementation, we hereby introduces the implementation of key modules.

Voice Encoders. The baseline voice encoder from voice2face is implemented as V2F1DCNN in models/voice_encoders.py. As mention in our main paper, we designed and implemented Inception1DBlock to improve the performance of voice encoder. When parameter inception_mode is set to True, V2F1DCNN is automatically built up with Inception1DBlock, which results in our proposed 1D-Inception based voice encoder. (Jump to code)

Face Decoders. The baseline face decoder is implemented as V2FDecoder in models/face_decoders.py. Our enhanced face decoder is implemented as FaceGanDecoder in the same file. (Jump to code)

Embedding Fuser. Our proposed attention fuser is implemented as AttentionFuserV1 in models/fusers.py. A graphical demonstration of embedding fuser is shown below. (Jump to code)

Generative Models. All generative models in this repo are implemented as EncoderDecoder in models/encoder_decoder.py. Encoder, decoder, and fusers will be initialized as attributes of EncoderDecoder class. (Jump to code)

FaceNet Perceptual Loss. FaceNet perceptual loss is implemented as FaceNetLoss in models/perceptual.py. (Jump to code)

VGGFace Score. VGGFace Score is implemented in scripts/compute_vggface_score.py. (Jump to code)

Retrieval Metrics. Retrieval metrics are implemented in utils/s2f_evaluator.py. (Jump to code)

Getting Started

To learn about environment setup, data preparation, launch of training, visualization, and evaluation, please refer to GETTING_STARTED.

Citation

If you find this project useful in your research, please consider cite:

@inproceedings{bai2022speech,
  title={Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging},
  author={Bai, Yeqi and Ma, Tao and Wang, Lipo and Zhang, Zhenjie},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  pages={2042--2050},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
datasets		datasets
images		images
models		models
options		options
scripts		scripts
utils		utils
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
README.md		README.md
infer.py		infer.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACMMM '22: Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging

Released Models

Implementation Details

Getting Started

Citation

About

Releases 1

Packages

Contributors 3

Languages

License

BAI-Yeqi/SF2F_PyTorch

Folders and files

Latest commit

History

Repository files navigation

ACMMM '22: Speech Fusion to Face: Bridging the Gap Between Human's Vocal Characteristics and Facial Imaging

Released Models

Implementation Details

Getting Started

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages