SileroVAD-ELAN integrates the voice activity detection methods offered by Silero-Vad (Silero Team 2021) into ELAN, allowing users to apply voice activity detection to multimedia sources linked to ELAN transcripts directly from within ELAN's user interface.
SileroVAD-ELAN makes use of several of other open-source applications and utilities:
SileroVAD-ELAN is written in Python 3, and also depends on the following Python packages, all of which should be installed in a virtual environment:
- Silero-VAD, installed with all
of its dependencies. This can be done with
pip
and a clone of the current Silero-VAD GitHub repository. - soundfile (for Windows 10 only)
Under Windows 10, the following commands can be used to fetch and install the necessary Python packages:
git clone https://github.com/l12maro/SileroVAD-Elan
cd SileroVAD-Elan
python3 -m virtualenv venv-silerovad
source ./venv-silerovad/Scripts/activate
git clone https://github.com/snakers4/silero-vad.git
pip install silero
pip install -q torchaudio
Once all of these tools and packages have been installed, SileroVAD-Elan can be made available to ELAN as follows:
-
Edit the file
SileroVAD-elan.sh
to specify a Unicode-friendly language and locale (ifen_US.UTF-8
isn't available on your computer). -
To make SileroVAD-ELAN available to ELAN, move your SileroVAD-ELAN directory into ELAN's
extensions
directory. This directory is found in different places under different operating systems:- Under macOS, right-click on
ELAN_6.4
in your/Applications
folder and select "Show Package Contents", then copy yourSileroVAD-ELAN
folder intoELAN_6.4.app/Contents/app/extensions
. - Under Linux, copy your
SileroVAD-ELAN
folder intoELAN_6-4/app/extensions
. - Under Windows, copy your
SileroVAD-ELAN
folder intoC:\Users\AppData\Local\ELAN_6-4\app\extensions
.
- Under macOS, right-click on
Once ELAN is restarted, it will now include 'Silero voice activity detection' in the list of Recognizers found under the 'Recognizer' tab in Annotation Mode. The user interface for this recognizer allows users to enter the settings needed to apply voice activity detection to a selected WAV audio recording that hasx been linked to this ELAN transcript. Additional settings (e.g., the speech vs. non-speech threshold, constant adjustments to the start and end-times of recognized speech segments, etc.) can be configured through the recognizer interface, as well.
Once these settings have been entered in SileroVAD-ELAN, pressing the Start
button will begin applying Voxseg's voice activity detection to the selected
audio recording. Once that process is complete, if no errors occurred, ELAN
will allow the user to load the resulting tier with the automatically
recognized speech segments into the current transcript.
This is an alpha release of Silero-VAD-ELAN, and has only been tested under Windows (10) with Python 3.9.