Speech engines

Introduction

Many of the original text to speech engines are no longer maintained as the technology has moved almost completely to cloud implementations. Instructions provided here are for guidance only as few people have installed them recently. If you find any errors, please let us know.

Flite

The simplest and fastest TTS engine is flite, available from here . After compiling, point to where the flite binary is with the voice_text_file and voice_text = flite parameters in mh.ini . See here for how to install flite on Raspberry Pi.

Festival

The Festival Speech engine is available from here . There are various voices and languages available. Instructions for how to install Festival on Ubuntu and Raspberry pi can be found here. Compiling Festival can be a little tricky, so if you can, you will probably want to use the RPM files.

Once you have downloaded in and installed or compiled Festival, you can test it with the following commands:

  echo 'Hello from Festival' | ./festival --tts
  ./festival --tts ../examples/example.sable

  ./festival --server &
  echo '(SayText "Hello from the festival client")' | ./festival_client

You can also run the client, or a simple telnet, from a different box, but you first have to create a /usr/lib/festival/lib/siteinit.scm file with a list of boxes that you want to give authority to. (e.g. (set! server_access_list '("localhost" "house\\.isl\\.net")) ). See the festival documentation for more details.

Once you have the festival server running, you can enable MisterHouse to use it with mh.ini parameters voice_text and festival_port.

To set the default voices, find your siteinit.scm file and have fun with the following:

  (set! voice_default 'voice_us1_mbrola)
  (set! voice_male1   'voice_kal_diphone)
  (set! voice_male2   'voice_us2_mbrola)
  (set! voice_female  'voice_kal_diphone)

This depends upon which voices you have installed on your system. Some voices are don_diphone, kal_diphone, ked_diphone, rab_diphone, us1_mbrola, us2_mbrola, and us3_mbrola.

AT&T Natural Voices

AT&T Natural Voices is now distributed by wizzardsoftware If you have the Linux binary, use the voice_text_naturalvoice parameter to point to where you have it installed and set voice_text=naturalvoice.

If you only have the Windows binary, you can now use Wine to run it from Linux. On a 1.2 GHz Celeron, time-to-speech is about 1 second, -vs- about .4 seconds for the native Linux binary. See bin/mh.ini for examples on these parameters:

voice_text=NaturalVoiceWine 
voice_text_naturalvoice=path_to_windows_voices
wine_path=path_to_wine

Other speech engines and voices

Free Speech Engines

MBROLA has a very nice US english male and female voice.
Festival voices

Paid for Speech Engines

Cepstral. Set mh.ini parameters voice_text=swift and modify voice_text_swift to point to the swift binary.
IBM ViaVoice is no longer available, however instructions can be found here for legacy implementations.

Tips and known bugs

Ricky Buchanan reports ESD and festival --server does not work, and suggests to instead edit /etc/festival.scm and add these lines to the top:

 (Parameter.set 'Audio_Method 'Audio_Command)
 (Parameter.set 'Audio_Command "/usr/bin/esdcat -m -r $SR $FILE")

Making sure that /usr/bin/esdcat points to the right spot for the esdcat program.

Main page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly