-
Notifications
You must be signed in to change notification settings - Fork 4
EC2 Installation Walkthrough
In this guide I will explain how to setup OpenDcd with Kaldi on EC and decode open source models based on Librispeech corpus. For this walkthrough I used a large instance with four cores and 15GB of memory. OpenDcd is very memory efficient for both decoding and graph construction and this is easily enough to decode the large 4-gram model.
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install -y gcc-4.9 g++-4.9 cpp-4.9 subversion make zlib1g-dev automake libtool autoconf libatlas3-base
Due a bug in gcc 4.8 we installed gcc 4.9 and set hard links as the system default
sudo ln -s /usr/bin/g++-4.9 /usr/bin/g++
sudo ln -s /usr/bin/gcc-4.9 /usr/bin/gcc
sudo ln -s -f bash /bin/sh
svn co https://svn.code.sf.net/p/kaldi/code/trunk kaldi
cd kaldi/tools
make
cd ../src
./configure
For descent runtime performance it is essential to edit the kaldi.mk file and add the -O2 switch. Now just type make to build the Kaldi and optionally specify the number of cores.
make -j4
git clone https://github.com/edobashira/opendcd.git
cd opendcd/3rdparty
make
cd ../src/bin
make -j4
There are two graph construction methods, in the first we take a set of Kaldi component transducers as the input to the build process. In the second method we take raw language model and lexicon and build everything from scratch. In this recipe we will use the pre-built models from kaldi-asr.org and use the first method.
We need three sets of the models the language model and lexicon, the acoustic model and the models used in the iVector extractor.
In modern neural network based speech recognition the decoding pipeline consists of three steps: feature extraction, state like computation and the search algorithm.
First we will grab a set of utterance from openslr.
wget http://www.openslr.org/resources/12/test-clean.tar.gz
tar -zxf test-clean.tar.gz