rec.py
contains a basic SpeechRecorder
clone written in Python which should be system-agnostic. This tool can be used
to record a set of utterances provided in a script file, and comes with a sample
script as used in the CMU Arctic speech database.
This repo is intended to support students on the Speech Synthesis course at the University of Edinburgh as they work through building a unit selection voice from their own recordings.
The original version of this tool was written by Tim Loderhose.
To get the code for SpeechRecorder, you can either download and unzip this
repository by clicking the green Code > Download ZIP button above, or clone it
directly using git
:
git clone https://github.com/dan-wells/speechrecorder
On Windows, you will probably need to install Git for Windows and then run the command above in a Git Bash or Git CMD terminal.
Python is required to run rec.py
, along with some non-standard packages. We
have tested most of the recent versions of Python 3, so it shouldn't matter too
much which you have available or choose to install.
Note: rec.py
is not compatible with Python 2.
Click below to expand Python installation instructions for different platforms.
Windows
To install Python on Windows, download an installer from the Python website.
The easiest option is to follow the standard installation process. If you want to customise the install, make sure you tick the checkbox to install tcl/tk and IDLE – this provides the graphical user interface libraries used by SpeechRecorder.
You may also want to check Add Python 3.x to PATH, so that you can easily launch Python programmes from the command line.
Linux
Python 3 is probably already installed on your system, and should have the
tkinter
GUI package available. If not, you may need to install it through your
package manager, possibly alongside the PortAudio library for audio handling.
For example, the required packages on Ubuntu might be:
sudo apt install python3-tk libportaudio2
Mac
Mac users may prefer to use the original SpeechRecorder! If not, please follow these instructions to install Python 3.
If you installed pip
alongside Python, then installing all the necessary
dependencies for SpeechRecorder could be as simple as running:
pip install -r requirements.txt
It might be preferable not to install these dependencies globally, however. For additional instructions on creating a Python virtual environment to keep your system tidy, click below.
Virtual environments in Python
Virtual environments are a way of encapsulating Python packages so that projects with different requirements (for example two projects which use different versions of the same package) do not conflict with each other.
After installing Python, you can create a virtual environment using the standard
venv
module:
python3 -m venv sr-env
This will create a new directory sr-env
containing a local copy of the Python
interpreter and space to install new packages. To use this local Python, we must
activate the environment:
- Windows:
sr-env\Scripts\activate.bat
- Linux/Mac:
source sr-env/bin/activate
If you see (sr-env)
somewhere in your command prompt, then it worked! You can
now run the pip
install command listed above to install the required Python
packages in your new virtual environment, leaving the system Python unchanged.
Once you're finished with SpeechRecorder, you can run the deactivate
command
to exit the Python virtual environment.
Note: You will need to run the activate
command whenever you want to use
this particular Python environment, after navigating in the terminal to wherever
you created the sr-env
directory. In general, you might want to keep all your
virtual environments in one place, or perhaps create this one inside the
directory containing the code for SpeechRecorder.
Run SpeechRecorder like python rec.py
(possibly after activating your virtual
environment). You will be presented with a screen showing the first utterance in
the file utts.data
. Recorded audio will be saved to
recordings/${prompt_label}_${take}.wav
.
The following commands are available:
up
anddown
to move between utterancesspace
to start/stop recording. Multiple recordings will produce multiple takesdown
while recording to immediately record the next utterance (without stopping in between).- Warning: This may lead to bad utterance segmentations
p
to listen to the recorded audio (plays the latest take)q
to quit.
Run python rec.py --help
to see additional options.
The tool should parse any script file matching the format of the provided
utts.data
, as described in the
unit selection voice building recipe.
To record your own script, create a new file using the same format and pass it
to SpeecRecorder: python rec.py --script my_script
You might want to split the provided utts.data
into multiple files and use
this method to record prompts in multiple sessions. This will help to overcome
the limited interface provided by this version of SpeechRecorder, so that you
don't have to scroll through the first 100 prompts to pick up where you left
off!
If you want to save recorded audio files to a different output directory,
specify it using the --recdir
option.
On Windows, once you've set up your Python environment and downloaded SpeechRecorder, everything might Just Work. On Linux, you may need to do some additional configuration so that SpeechRecorder knows which audio devices it should use.
To list available audio devices, run python rec.py --show-devices
. The
required in_device
index is marked by >
and out_device
by <
. If the same
device handles both input and output, it will be marked with *
. When you're
ready to record, pass the appropriate device indices through the --audio-in
and --audio-out
options (both default to 0). You can also pass the device
names instead of indices. If the input and output audio device are the same, you
can also use the convenience option --audio-device
with the device name and the
script figures out the indices for you.
By default, recorded audio will be sampled at 44.1 kHz. If you need higher/lower
quality recordings, pass the desired sampling rate through the --sr
option.
Audio is recorded in mono by default. Pass --channels 2
for stereo recording.
The default bit depth is 16-bit, and can be set via --bits
to either 16 or 24
bits. You should test 24-bit on your particular setup because the support
is hardware and platform dependent.
You can run the following command in a Python interpreter to list
alternative formats: soundfile.available_subtypes('WAV')
. Select one of the
keys from the resulting dictionary and pass it to the subtype
argument of the
sf.SoundFile
object created in the rec()
function of rec.py
in case the
current implementation does not support the format you need.