ROS Package to Generate very natural sounding speech from include.text (text-to-speech, TTS). This package utilizes the tacotron2 deep learning model from the Google AI research lab DeepMind,read more here. You can used to give your Robot a human like voice and its completely offline.
Assuming you have Ubuntu 18 and ROS1 Melodic installed with a catkin_ws configured.
We need to create isolated environment to install and run the package dependencies.
This will allow you to install multiple python modules eg. multiple tensorFlow/torch versions to use it with with your ROS distro.
Normal laptop or PC then you can use miniconda.
1. Install Dependencies Set up the Conda Environment First, install the miniconda and create a new Python 3.6 environment:
$ sudo apt-get install libportaudio2
$ cd ~/catkin_ws/src
$ git clone https://github.com/IRES-ZC/tacotron2ros
$ catkin build
$ source ~/catkin_ws/devel/setup.bash
$ cd ~/catkin_ws/src/tacotron2ros
$ conda env create -f environment.yml
OR
$ conda create -n tacotron2ros environment --file req.txt
2. Configure Dependencies
Now we need the ROS node tacotron2ros.py
to use the python interpreter from our virtual environment we created.
First, activate your env
$ conda activate tacotron2ros
$ whereis python
you will find multiple version in your system.
/home/amer/miniconda3/envs/tacotron2ros/bin/python3.6
Next, change the hashbang (or shebang) line which indicates which interpreter should process the in tacotron2ros.py
.
#! /usr/bin/env python
to the one you used in the tacotron2ros environment e.g ```/home/amer/miniconda3/envs/tacotron2ros/bin/python3.6
Nvidia Jetson Kits and Raspberry PIs
You will need to use virtual environment or miniforge scince anaconda don't have linux-aarch64 yet see this issue.
1. Install Dependencies
Set up the Virtual Environment
First, install the virtualenv package and create a new Python 3.6 virtual environment:
$ sudo apt-get install libportaudio2
$ sudo apt-get install virtualenv
$ cd ~/catkin_ws/src
$ git clone https://github.com/IRES-ZC/tacotron2ros
$ catkin build
$ source ~/catkin_ws/devel/setup.bash
$ cd ~/catkin_ws/src/tacotron2ros
$ python3 -m virtualenv -p python3.6 tacotron2ros
Next, activate the virtual environment:
$ source tacotron2ros/bin/activate
$ pip3 install -r requirements.txt
Deactivate the Virtual Environment
$ deactivate
2. Configure Dependencies
Now we need the ROS node tacotron2ros.py
to use the python interpreter from our virtual environment we created.
First, activate your env
$ source tacotron2ros/bin/activate
$ whereis python
You will find multiple versions in your system.
Next, change the hashbang (or shebang) line which indicates which interpreter should process the in tacotron2ros.py
.
#! /usr/bin/env python
to the one you used in the tacotron2ros environment e.g ```#! tacotron2ros/bin/python3.6
- Download Nvidia published Tacotron 2 model
- Download Nvidia published WaveGlow model
- add these models to
~/catkin_ws/src/tacotron2ros/src/models
$ cd ~/catkin_ws/src
$ source ~/catkin_ws/devel/setup.bash
$ roscore
$ rosrun tacotron2ros tacotron2ros.py
In another terminal publish the text you want to synthesis
$ rostopic pub /text2voice std_msgs/String "data: 'Hello there!, Nice day'"
you should here female voice with the same text from your speaker.
If you want to close and release resources after usage:
Kill ROS
$ rosnode kill -a & killall -9 rosmaster
Kill GPU Processes
$ sudo fuser -v /dev/nvidia*
$ kill -9 <<the-python-PID>>
- No need to activate any conda or virtualenv during running ROS the pkg.
- Make sure you locate the hashbang of your python interpreter properly.
- Added the the pretend models
- If you want a custom model/language/voice refer to acknowledgements section.
- This implementation uses Nvidia Cuda to accelerate the inference and it's performance depends on your hardware.
- To test tactron2 without ROS use the
inference.ipynb
notebook don't forget to activate your env and select the proper kernel.
This work is based on Nvidia implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions founded here and built as a part of Nour social robot project founded here