Skip to content

nimamoradi/tts-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#Tacotron 2

Pytorch implementation of DeepMind's Tacotron-2 : Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions

Folder Structure

└───tacotron2
    ├───content
    │   └───tacotron2
    │       └───filelists
    ├───filelists
    ├───outdir
    │   └───logdir
    ├───text
    │   ├───data_prepare
    │   └───__pycache__
    └───waveglow
    

Setup

  • Step (0): Get your dataset; for persain lauguge the only open source dataset is Mozilla common voice.

  • Step (0.1):note you can use our own dataset too here is kaggle link

  • Step (1): add your own test and train data parameters in filelists/. because mozilla audio is more than 211 h of audio we procced only small portion of it, convert to wave and remove files more than 10 seconds in length, you can see them in filelists.

  • Step (2): Install python requirements or build docker image

    • Install python requirements: pip install -r requirements.txt
  • Step (3): Install cuda and pytorch 1.0 .

  • Step (4): Train the model using this command.

python train.py --output_directory='/content/tts-engine/gdrive/My Drive/outdir' --log_directory='/content/tts-engine/gdrive/My Drive/logdir'
  • Step (5): Synthesize audio using tts-engine/tacotron2/inference.ipynb.

Audio samples

I listed some of audio the model genarated you can listen them in soundcloud.

Model

The model described by the authors can be divided in two parts:

  • Spectrogram prediction network
  • Wavenet vocoder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published