Skip to content

Latest commit

 

History

History
54 lines (45 loc) · 1.4 KB

File metadata and controls

54 lines (45 loc) · 1.4 KB

Speaker-independent-emotional-voice-conversion-based-on-conditional-VAW-GAN-and-CWT

This is the implementation of the Interspeech 2020 paper "Converting anyone's emotion: towards speaker-independent emotional voice conversion". Please kindly cite our paper if you are using the codes.

Getting Started

Prerequisites

  • Ubuntu 16.04
  • Python 3.6
    • Tensorflow-gpu 1.5.0
    • PyWorld
    • librosa
    • soundfile
    • numpy 1.14.0
    • sklearn
    • glob
    • sprocket-vc
    • pycwt
    • scipy

Usage

  1. Prepare your dataset.
Please follow the file structure:

training_dir: ./data/wav/training_set/*/*.wav

evaluation_dir ./data/wav/evaluation_set/*/*.wav

For example: "./data/wav/training_set/Angry/0001.wav"
  1. Activate your virtual enviroment.
source activate [your env]
  1. Train VAW-GAN for prosody.
./train_f0.sh
# Remember to change the source and target dir in "architecture-vawgan-vcc2016.json"
  1. Train VAW-GAN for spectrum.
./train_sp.sh
# Remember to change the source and target dir in "architecture-vawgan-vcc2016.json"
  1. Generate the converted emotional speech.
./convert.sh

Note: The codes are based on VAW-GAN Voice Conversion: https://github.com/JeremyCCHsu/vae-npvc/tree/vawgan