Skip to content

Latest commit

 

History

History
189 lines (135 loc) · 7.42 KB

README.md

File metadata and controls

189 lines (135 loc) · 7.42 KB

Basic SpeechRecorder clone

rec.py contains a basic SpeechRecorder clone written in Python which should be system-agnostic. This tool can be used to record a set of utterances provided in a script file, and comes with a sample script as used in the CMU Arctic speech database.

This repo is intended to support students on the Speech Synthesis course at the University of Edinburgh as they work through building a unit selection voice from their own recordings.

The original version of this tool was written by Tim Loderhose.

Setup

Downloading SpeechRecorder

To get the code for SpeechRecorder, you can either download and unzip this repository by clicking the green Code > Download ZIP button above, or clone it directly using git:

git clone https://github.com/dan-wells/speechrecorder

On Windows, you will probably need to install Git for Windows and then run the command above in a Git Bash or Git CMD terminal.

Installing Python

Python is required to run rec.py, along with some non-standard packages. We have tested most of the recent versions of Python 3, so it shouldn't matter too much which you have available or choose to install.

Note: rec.py is not compatible with Python 2.

Click below to expand Python installation instructions for different platforms.

Windows

To install Python on Windows, download an installer from the Python website.

The easiest option is to follow the standard installation process. If you want to customise the install, make sure you tick the checkbox to install tcl/tk and IDLE – this provides the graphical user interface libraries used by SpeechRecorder.

You may also want to check Add Python 3.x to PATH, so that you can easily launch Python programmes from the command line.

Linux

Python 3 is probably already installed on your system, and should have the tkinter GUI package available. If not, you may need to install it through your package manager, possibly alongside the PortAudio library for audio handling.

For example, the required packages on Ubuntu might be:

sudo apt install python3-tk libportaudio2
Mac

Mac users may prefer to use the original SpeechRecorder! If not, please follow these instructions to install Python 3.

Python dependencies

If you installed pip alongside Python, then installing all the necessary dependencies for SpeechRecorder could be as simple as running:

pip install -r requirements.txt

It might be preferable not to install these dependencies globally, however. For additional instructions on creating a Python virtual environment to keep your system tidy, click below.

Virtual environments in Python

Virtual environments are a way of encapsulating Python packages so that projects with different requirements (for example two projects which use different versions of the same package) do not conflict with each other.

After installing Python, you can create a virtual environment using the standard venv module:

python3 -m venv sr-env

This will create a new directory sr-env containing a local copy of the Python interpreter and space to install new packages. To use this local Python, we must activate the environment:

  • Windows: sr-env\Scripts\activate.bat
  • Linux/Mac: source sr-env/bin/activate

If you see (sr-env) somewhere in your command prompt, then it worked! You can now run the pip install command listed above to install the required Python packages in your new virtual environment, leaving the system Python unchanged.

Once you're finished with SpeechRecorder, you can run the deactivate command to exit the Python virtual environment.

Note: You will need to run the activate command whenever you want to use this particular Python environment, after navigating in the terminal to wherever you created the sr-env directory. In general, you might want to keep all your virtual environments in one place, or perhaps create this one inside the directory containing the code for SpeechRecorder.

Usage

Run SpeechRecorder like python rec.py (possibly after activating your virtual environment). You will be presented with a screen showing the first utterance in the file utts.data. Recorded audio will be saved to recordings/${prompt_label}_${take}.wav.

The following commands are available:

  • up and down to move between utterances
  • space to start/stop recording. Multiple recordings will produce multiple takes
  • down while recording to immediately record the next utterance (without stopping in between).
    • Warning: This may lead to bad utterance segmentations
  • p to listen to the recorded audio (plays the latest take)
  • q to quit.

Run python rec.py --help to see additional options.

Recording scripts

The tool should parse any script file matching the format of the provided utts.data, as described in the unit selection voice building recipe. To record your own script, create a new file using the same format and pass it to SpeecRecorder: python rec.py --script my_script

You might want to split the provided utts.data into multiple files and use this method to record prompts in multiple sessions. This will help to overcome the limited interface provided by this version of SpeechRecorder, so that you don't have to scroll through the first 100 prompts to pick up where you left off!

If you want to save recorded audio files to a different output directory, specify it using the --recdir option.

Audio device configuration

On Windows, once you've set up your Python environment and downloaded SpeechRecorder, everything might Just Work. On Linux, you may need to do some additional configuration so that SpeechRecorder knows which audio devices it should use.

To list available audio devices, run python rec.py --show-devices. The required in_device index is marked by > and out_device by <. If the same device handles both input and output, it will be marked with *. When you're ready to record, pass the appropriate device indices through the --audio-in and --audio-out options (both default to 0).

By default, recorded audio will be sampled at 44.1 kHz. If you need higher/lower quality recordings, pass the desired sampling rate through the --sr option. Audio is recorded in mono by default. Pass --channels 2 for stereo recording.

The default bit depth is 16-bit, and is not so easily configurable. If you need to change this, first run the following command in a Python interpreter to list alternative formats: soundfile.available_subtypes('WAV'). Select one of the keys from the resulting dictionary and pass it to the subtype argument of the sf.SoundFile object created in the rec() function of rec.py.