#Tacotron 2
Pytorch implementation of DeepMind's Tacotron-2 : Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions
└───tacotron2 ├───content │ └───tacotron2 │ └───filelists ├───filelists ├───outdir │ └───logdir ├───text │ ├───data_prepare │ └───__pycache__ └───waveglow
-
Step (0): Get your dataset; for persain lauguge the only open source dataset is Mozilla common voice.
-
Step (0.1):note you can use our own dataset too here is kaggle link
-
Step (1): add your own test and train data parameters in
filelists/
. because mozilla audio is more than 211 h of audio we procced only small portion of it, convert to wave and remove files more than 10 seconds in length, you can see them in filelists. -
Step (2): Install python requirements or build docker image
- Install python requirements:
pip install -r requirements.txt
- Install python requirements:
-
Step (3): Install cuda and pytorch 1.0 .
-
Step (4): Train the model using this command.
python train.py --output_directory='/content/tts-engine/gdrive/My Drive/outdir' --log_directory='/content/tts-engine/gdrive/My Drive/logdir'
- Step (5): Synthesize audio using
tts-engine/tacotron2/inference.ipynb
.
I listed some of audio the model genarated you can listen them in soundcloud.
The model described by the authors can be divided in two parts:
- Spectrogram prediction network
- Wavenet vocoder