Python text-to-speech library with built-in voice effects and support for multiple TTS engines.
| GitHub | Documentation 📘 | Audio Samples 🔉 |
# Example: Use gTTS with a vocoder effect to speak in a robotic voice
from voicebox import SimpleVoicebox
from voicebox.tts import gTTS
from voicebox.effects import Vocoder, Normalize
voicebox = SimpleVoicebox(
tts=gTTS(),
effects=[Vocoder.build(), Normalize()],
)
voicebox.say('Hello, world! How are you today?')
pip install voicebox-tts
- Install the
PortAudio
library for audio playback.- On Debian/Ubuntu:
sudo apt install libportaudio2
- On Debian/Ubuntu:
- Install dependencies for whichever TTS engine(s) you want to use (see section below).
Classes for supported TTS engines are located in the
voicebox.tts
package.
Amazon Polly 🌐
Online TTS engine from AWS.
- Class:
voicebox.tts.AmazonPolly
- Setup:
pip install "voicebox-tts[amazon-polly]"
ElevenLabs 🌐
Online TTS engine with very realistic voices and support for voice cloning.
- Class:
voicebox.tts.ElevenLabsTTS
- Setup:
eSpeak NG 🌐
Offline TTS engine with a good number of options.
- Class:
voicebox.tts.ESpeakNG
- Setup:
- On Debian/Ubuntu:
sudo apt install espeak-ng
- On Debian/Ubuntu:
Google Cloud Text-to-Speech 🌐
Powerful online TTS engine offered by Google Cloud.
- Class:
voicebox.tts.GoogleCloudTTS
- Setup:
pip install "voicebox-tts[google-cloud-tts]"
gTTS 🌐
Online TTS engine used by Google Translate.
- Class:
voicebox.tts.gTTS
- Setup:
pip install "voicebox-tts[gtts]"
- Install ffmpeg or libav for
pydub
(docs)
🤗 Parler TTS 🌐
Offline TTS engine released by Hugging Face that uses a promptable deep learning model to generate speech.
- Class:
voicebox.tts.ParlerTTS
- Setup:
pip install git+https://github.com/huggingface/parler-tts.git
Very basic offline TTS engine.
- Class:
voicebox.tts.PicoTTS
- Setup:
- On Debian/Ubuntu:
sudo apt install libttspico-utils
- On Debian/Ubuntu:
pyttsx3 🌐
Offline TTS engine wrapper with support for the built-in TTS engines on Windows (SAPI5) and macOS (NSSpeechSynthesizer), as well as espeak on Linux. By default, it will use the most appropriate engine for your platform.
- Class:
voicebox.tts.Pyttsx3TTS
- Setup:
pip install "voicebox-tts[pyttsx3]"
- On Debian/Ubuntu:
sudo apt install espeak
Built-in effect classes are located in the
voicebox.effects
package,
and can be imported like:
from voicebox.effects import CoolEffect
Here is a non-exhaustive list of fun effects:
Glitch
creates a glitchy sound by randomly repeating small chunks of audio.RingMod
can be used to create choppy, Doctor Who Dalek-like effects.Vocoder
is useful for making monotone, robotic voices.
There is also support for all the awesome audio plugins in
Spotify's pedalboard
library
using the special PedalboardEffect
wrapper, e.g.:
from voicebox import SimpleVoicebox
from voicebox.effects import PedalboardEffect
import pedalboard
voicebox = SimpleVoicebox(
effects=[
PedalboardEffect(pedalboard.Reverb()),
...,
]
)
# PicoTTS is used to say "Hello, world!"
from voicebox import SimpleVoicebox
voicebox = SimpleVoicebox()
voicebox.say('Hello, world!')
Some pre-built voiceboxes are available in the
voicebox.examples
package.
They can be imported into your own code, and you can run them to demo:
# Voice of GLaDOS from the Portal video game series
python -m voicebox.examples.glados "optional message"
# Voice of the OOM-9 command battle droid from Star Wars: Episode I
python -m voicebox.examples.battle_droid "optional message"
# Use eSpeak NG at 120 WPM and en-us voice as the TTS engine
from voicebox import reliable_tts
from voicebox.tts import ESpeakConfig, ESpeakNG, gTTS
# Wrap multiple TTSs in retries and caches
tts = reliable_tts(
ttss=[
# Prefer using online TTS first
gTTS(),
# Fall back to offline TTS if online TTS fails
ESpeakNG(ESpeakConfig(speed=120, voice='en-us')),
],
)
# Add some voice effects
from voicebox.effects import Vocoder, Glitch, Normalize
effects = [
Vocoder.build(), # Make a robotic, monotone voice
Glitch(), # Randomly repeat small sections of audio
Normalize(), # Remove DC and make volume consistent
]
# Build audio sink
from voicebox.sinks import Distributor, SoundDevice, WaveFile
sink = Distributor([
SoundDevice(), # Send audio to playback device
WaveFile('speech.wav'), # Save audio to speech.wav file
])
# Build the voicebox
from voicebox import ParallelVoicebox
from voicebox.voiceboxes.splitter import SimpleSentenceSplitter
# Parallel voicebox doesn't block the main thread
voicebox = ParallelVoicebox(
tts,
effects,
sink,
# Split text into sentences to reduce time to first speech
text_splitter=SimpleSentenceSplitter(),
)
# Speak!
voicebox.say('Hello, world!')
# Wait for all audio to finish playing before exiting
voicebox.wait_until_done()
python -m voicebox -h # Print command help
python -m voicebox "Hello, world!" # Basic usage