Skip to content

ROS Package to Generate very natural sounding speech from text (text-to-speech, TTS). This package utilizes the tacotron2 deep learning model from google.

Notifications You must be signed in to change notification settings

IRES-ZC/tacotron2ros

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tacotron2ros

cover

Description

ROS Package to Generate very natural sounding speech from include.text (text-to-speech, TTS). This package utilizes the tacotron2 deep learning model from the Google AI research lab DeepMind,read more here. You can used to give your Robot a human like voice and its completely offline.

Demo

Installation

Assuming you have Ubuntu 18 and ROS1 Melodic installed with a catkin_ws configured.

We need to create isolated environment to install and run the package dependencies.

This will allow you to install multiple python modules eg. multiple tensorFlow/torch versions to use it with with your ROS distro.

You have two options:

If you use linux-x86_64 based systems

Normal laptop or PC then you can use miniconda.

1. Install Dependencies Set up the Conda Environment First, install the miniconda and create a new Python 3.6 environment:

$ sudo apt-get install libportaudio2

$ cd ~/catkin_ws/src

$ git clone https://github.com/IRES-ZC/tacotron2ros

$ catkin build

$ source ~/catkin_ws/devel/setup.bash 

$ cd ~/catkin_ws/src/tacotron2ros

$ conda env create -f environment.yml

OR

$ conda create -n tacotron2ros environment --file req.txt

2. Configure Dependencies

Now we need the ROS node tacotron2ros.py to use the python interpreter from our virtual environment we created.

First, activate your env

$ conda activate tacotron2ros

$ whereis python

you will find multiple version in your system.

/home/amer/miniconda3/envs/tacotron2ros/bin/python3.6

Next, change the hashbang (or shebang) line which indicates which interpreter should process the in tacotron2ros.py.

#! /usr/bin/env python to the one you used in the tacotron2ros environment e.g ```/home/amer/miniconda3/envs/tacotron2ros/bin/python3.6

If you use linux-aarch64 based systems

Nvidia Jetson Kits and Raspberry PIs

You will need to use virtual environment or miniforge scince anaconda don't have linux-aarch64 yet see this issue.

1. Install Dependencies

Set up the Virtual Environment

First, install the virtualenv package and create a new Python 3.6 virtual environment:

$ sudo apt-get install libportaudio2

$ sudo apt-get install virtualenv

$ cd ~/catkin_ws/src

$ git clone https://github.com/IRES-ZC/tacotron2ros

$ catkin build

$ source ~/catkin_ws/devel/setup.bash 

$ cd ~/catkin_ws/src/tacotron2ros

$ python3 -m virtualenv -p python3.6 tacotron2ros


Next, activate the virtual environment:

$ source tacotron2ros/bin/activate

$ pip3 install -r requirements.txt

Deactivate the Virtual Environment

$ deactivate

2. Configure Dependencies

Now we need the ROS node tacotron2ros.py to use the python interpreter from our virtual environment we created.

First, activate your env

$ source tacotron2ros/bin/activate

$ whereis python

You will find multiple versions in your system.

Next, change the hashbang (or shebang) line which indicates which interpreter should process the in tacotron2ros.py.

#! /usr/bin/env python to the one you used in the tacotron2ros environment e.g ```#! tacotron2ros/bin/python3.6

Get the Pertained Models

  1. Download Nvidia published Tacotron 2 model
  2. Download Nvidia published WaveGlow model
  3. add these models to ~/catkin_ws/src/tacotron2ros/src/models

Run

$ cd ~/catkin_ws/src

$ source ~/catkin_ws/devel/setup.bash 

$ roscore

$ rosrun tacotron2ros tacotron2ros.py 

In another terminal publish the text you want to synthesis

$ rostopic pub /text2voice std_msgs/String "data: 'Hello there!, Nice day'"

you should here female voice with the same text from your speaker.

If you want to close and release resources after usage:

Kill ROS 
$ rosnode kill -a & killall -9 rosmaster  

Kill GPU Processes 
$ sudo fuser -v /dev/nvidia*
$ kill -9 <<the-python-PID>>

Notes

  1. No need to activate any conda or virtualenv during running ROS the pkg.
  2. Make sure you locate the hashbang of your python interpreter properly.
  3. Added the the pretend models
  4. If you want a custom model/language/voice refer to acknowledgements section.
  5. This implementation uses Nvidia Cuda to accelerate the inference and it's performance depends on your hardware.
  6. To test tactron2 without ROS use the inference.ipynb notebook don't forget to activate your env and select the proper kernel.

Acknowledgements

This work is based on Nvidia implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions founded here and built as a part of Nour social robot project founded here

About

ROS Package to Generate very natural sounding speech from text (text-to-speech, TTS). This package utilizes the tacotron2 deep learning model from google.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published