-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #4847 from sendream/master
Add recipe of Tibetan Amdo dialect
- Loading branch information
Showing
28 changed files
with
1,418 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
About the XBMU-AMDO31 corpus XBMU-AMDO31 is an open-source Amdo Tibetan speech corpus published by Northwest Minzu University. | ||
|
||
XBMU-AMDO31 dataset is a speech recognition corpus of Tibetan Amdo dialect. The open source corpus contains 31 hours of speech data and resources related to build speech recognition systems,including transcribed texts and a Tibetan pronunciation lexicon. (The lexicon is a Tibetan lexicon of the Lhasa dialect, which has been reused for the Amdo dialect because of the uniformity of the Tibetan language) The dataset can be used to train a model for Amdo Tibetan Automatic Speech Recognition (ASR). | ||
|
||
The database can be downloaded from openslr: | ||
http://www.openslr.org/133/ | ||
|
||
For more details, please visit: | ||
https://huggingface.co/datasets/syzym/xbmu_amdo31 | ||
|
||
This recipe includes some different ASR models trained with XBMU-AMDO31. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
%WER 46.16 [ 15522 / 33628, 380 ins, 2208 del, 12934 sub ] exp/mono/decode_test/wer_10_0.0 | ||
%WER 24.60 [ 8274 / 33628, 330 ins, 860 del, 7084 sub ] exp/tri1/decode_test/wer_13_0.0 | ||
%WER 24.42 [ 8213 / 33628, 323 ins, 847 del, 7043 sub ] exp/tri2/decode_test/wer_13_0.0 | ||
%WER 22.93 [ 7712 / 33628, 336 ins, 814 del, 6562 sub ] exp/tri3a/decode_test/wer_12_0.0 | ||
%WER 20.17 [ 6783 / 33628, 275 ins, 764 del, 5744 sub ] exp/tri4a/decode_test/wer_15_0.0 | ||
%WER 19.03 [ 6400 / 33628, 292 ins, 667 del, 5441 sub ] exp/tri5a/decode_test/wer_14_0.0 | ||
%WER 15.45 [ 5196 / 33628, 229 ins, 646 del, 4321 sub ] exp/nnet3/tdnn_sp/decode_test/wer_16_0.0 | ||
%WER 15.57 [ 5235 / 33628, 244 ins, 575 del, 4416 sub ] exp/chain/tdnn_1a_sp/decode_test/wer_11_0.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# you can change cmd.sh depending on what type of queue you are using. | ||
# If you have no queueing system and want to run on a local machine, you | ||
# can change all instances 'queue.pl' to run.pl (but be careful and run | ||
# commands one by one: most recipes will exhaust the memory on your | ||
# machine). queue.pl works with GridEngine (qsub). slurm.pl works | ||
# with slurm. Different queues are configured differently, with different | ||
# queue names and different ways of specifying things like memory; | ||
# to account for these differences you can create and edit the file | ||
# conf/queue.conf to match your queue's configuration. Search for | ||
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information, | ||
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl. | ||
|
||
export train_cmd="queue.pl --mem 2G" | ||
export decode_cmd="queue.pl --mem 4G" | ||
export mkgraph_cmd="queue.pl --mem 8G" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
beam=11.0 # beam for decoding. Was 13.0 in the scripts. | ||
first_beam=8.0 # beam for 1st-pass decoding in SAT. | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
--use-energy=false # only non-default option. | ||
--sample-frequency=16000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# config for high-resolution MFCC features, intended for neural network training. | ||
# Note: we keep all cepstra, so it has the same info as filterbank features, | ||
# but MFCC is more easily compressible (because less correlated) which is why | ||
# we prefer this method. | ||
--use-energy=false # use average of log energy, not energy. | ||
--sample-frequency=16000 # Switchboard is sampled at 8kHz | ||
--num-mel-bins=40 # similar to Google's setup. | ||
--num-ceps=40 # there is no dimensionality reduction. | ||
--low-freq=40 # low cutoff frequency for mel bins | ||
--high-freq=-200 # high cutoff frequently, relative to Nyquist of 8000 (=3800) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# configuration file for apply-cmvn-online, used when invoking online2-wav-nnet3-latgen-faster. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--sample-frequency=16000 | ||
--simulate-first-pass-online=true | ||
--normalization-right-context=25 | ||
--frames-per-chunk=10 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
--sample-frequency=16000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/run_tdnn_1a.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
#!/usr/bin/env bash | ||
|
||
# This script is based on run_tdnn_7h.sh in swbd chain recipe. | ||
|
||
set -e | ||
|
||
# configs for 'chain' | ||
affix= | ||
stage=0 | ||
train_stage=-10 | ||
get_egs_stage=-10 | ||
dir=exp/chain/tdnn_1a # Note: _sp will get added to this | ||
decode_iter= | ||
|
||
# training options | ||
num_epochs=4 | ||
initial_effective_lrate=0.001 | ||
final_effective_lrate=0.0001 | ||
max_param_change=2.0 | ||
final_layer_normalize_target=0.5 | ||
num_jobs_initial=1 | ||
num_jobs_final=2 | ||
minibatch_size=128 | ||
frames_per_eg=150,110,90 | ||
remove_egs=true | ||
common_egs_dir= | ||
xent_regularize=0.1 | ||
|
||
# End configuration section. | ||
echo "$0 $*" # Print the command line for logging | ||
|
||
. ./cmd.sh | ||
. ./path.sh | ||
. ./utils/parse_options.sh | ||
|
||
if ! cuda-compiled; then | ||
cat <<EOF && exit 1 | ||
This script is intended to be used with GPUs but you have not compiled Kaldi with CUDA | ||
If you want to use GPUs (and have them), go to src/, and configure and make on a machine | ||
where "nvcc" is installed. | ||
EOF | ||
fi | ||
|
||
# The iVector-extraction and feature-dumping parts are the same as the standard | ||
# nnet3 setup, and you can skip them by setting "--stage 8" if you have already | ||
# run those things. | ||
|
||
dir=${dir}${affix:+_$affix}_sp | ||
train_set=train_sp | ||
ali_dir=exp/tri5a_sp_ali | ||
treedir=exp/chain/tri6_7d_tree_sp | ||
lang=data/lang_chain | ||
|
||
|
||
# if we are using the speed-perturbed data we need to generate | ||
# alignments for it. | ||
#local/nnet3/run_ivector_common.sh --stage $stage || exit 1; | ||
|
||
if [ $stage -le 7 ]; then | ||
# Get the alignments as lattices (gives the LF-MMI training more freedom). | ||
# use the same num-jobs as the alignments | ||
nj=$(cat $ali_dir/num_jobs) || exit 1; | ||
steps/align_fmllr_lats.sh --nj $nj --cmd "$train_cmd" data/$train_set \ | ||
data/lang exp/tri5a exp/tri5a_sp_lats | ||
rm exp/tri5a_sp_lats/fsts.*.gz # save space | ||
fi | ||
|
||
if [ $stage -le 8 ]; then | ||
# Create a version of the lang/ directory that has one state per phone in the | ||
# topo file. [note, it really has two states.. the first one is only repeated | ||
# once, the second one has zero or more repeats.] | ||
rm -rf $lang | ||
cp -r data/lang $lang | ||
silphonelist=$(cat $lang/phones/silence.csl) || exit 1; | ||
nonsilphonelist=$(cat $lang/phones/nonsilence.csl) || exit 1; | ||
# Use our special topology... note that later on may have to tune this | ||
# topology. | ||
steps/nnet3/chain/gen_topo.py $nonsilphonelist $silphonelist >$lang/topo | ||
fi | ||
|
||
if [ $stage -le 9 ]; then | ||
# Build a tree using our new topology. This is the critically different | ||
# step compared with other recipes. | ||
steps/nnet3/chain/build_tree.sh --frame-subsampling-factor 3 \ | ||
--context-opts "--context-width=2 --central-position=1" \ | ||
--cmd "$train_cmd" 5000 data/$train_set $lang $ali_dir $treedir | ||
fi | ||
|
||
if [ $stage -le 10 ]; then | ||
echo "$0: creating neural net configs using the xconfig parser"; | ||
|
||
num_targets=$(tree-info $treedir/tree |grep num-pdfs|awk '{print $2}') | ||
learning_rate_factor=$(echo "print (0.5/$xent_regularize)" | python) | ||
|
||
mkdir -p $dir/configs | ||
cat <<EOF > $dir/configs/network.xconfig | ||
input dim=100 name=ivector | ||
input dim=43 name=input | ||
# please note that it is important to have input layer with the name=input | ||
# as the layer immediately preceding the fixed-affine-layer to enable | ||
# the use of short notation for the descriptor | ||
fixed-affine-layer name=lda input=Append(-1,0,1,ReplaceIndex(ivector, t, 0)) affine-transform-file=$dir/configs/lda.mat | ||
# the first splicing is moved before the lda layer, so no splicing here | ||
relu-batchnorm-layer name=tdnn1 dim=625 | ||
relu-batchnorm-layer name=tdnn2 input=Append(-1,0,1) dim=625 | ||
relu-batchnorm-layer name=tdnn3 input=Append(-1,0,1) dim=625 | ||
relu-batchnorm-layer name=tdnn4 input=Append(-3,0,3) dim=625 | ||
relu-batchnorm-layer name=tdnn5 input=Append(-3,0,3) dim=625 | ||
relu-batchnorm-layer name=tdnn6 input=Append(-3,0,3) dim=625 | ||
## adding the layers for chain branch | ||
relu-batchnorm-layer name=prefinal-chain input=tdnn6 dim=625 target-rms=0.5 | ||
output-layer name=output include-log-softmax=false dim=$num_targets max-change=1.5 | ||
# adding the layers for xent branch | ||
# This block prints the configs for a separate output that will be | ||
# trained with a cross-entropy objective in the 'chain' models... this | ||
# has the effect of regularizing the hidden parts of the model. we use | ||
# 0.5 / args.xent_regularize as the learning rate factor- the factor of | ||
# 0.5 / args.xent_regularize is suitable as it means the xent | ||
# final-layer learns at a rate independent of the regularization | ||
# constant; and the 0.5 was tuned so as to make the relative progress | ||
# similar in the xent and regular final layers. | ||
relu-batchnorm-layer name=prefinal-xent input=tdnn6 dim=625 target-rms=0.5 | ||
output-layer name=output-xent dim=$num_targets learning-rate-factor=$learning_rate_factor max-change=1.5 | ||
EOF | ||
steps/nnet3/xconfig_to_configs.py --xconfig-file $dir/configs/network.xconfig --config-dir $dir/configs/ | ||
fi | ||
|
||
if [ $stage -le 11 ]; then | ||
if [[ $(hostname -f) == *.clsp.jhu.edu ]] && [ ! -d $dir/egs/storage ]; then | ||
utils/create_split_dir.pl \ | ||
/export/b0{5,6,7,8}/$USER/kaldi-data/egs/aishell-$(date +'%m_%d_%H_%M')/s5c/$dir/egs/storage $dir/egs/storage | ||
fi | ||
|
||
steps/nnet3/chain/train.py --stage $train_stage \ | ||
--cmd "$decode_cmd" \ | ||
--feat.online-ivector-dir exp/nnet3/ivectors_${train_set} \ | ||
--feat.cmvn-opts "--norm-means=false --norm-vars=false" \ | ||
--chain.xent-regularize $xent_regularize \ | ||
--chain.leaky-hmm-coefficient 0.1 \ | ||
--chain.l2-regularize 0.00005 \ | ||
--chain.apply-deriv-weights false \ | ||
--chain.lm-opts="--num-extra-lm-states=2000" \ | ||
--egs.dir "$common_egs_dir" \ | ||
--egs.stage $get_egs_stage \ | ||
--egs.opts "--frames-overlap-per-eg 0" \ | ||
--egs.chunk-width $frames_per_eg \ | ||
--trainer.num-chunk-per-minibatch $minibatch_size \ | ||
--trainer.frames-per-iter 1500000 \ | ||
--trainer.num-epochs $num_epochs \ | ||
--trainer.optimization.num-jobs-initial $num_jobs_initial \ | ||
--trainer.optimization.num-jobs-final $num_jobs_final \ | ||
--trainer.optimization.initial-effective-lrate $initial_effective_lrate \ | ||
--trainer.optimization.final-effective-lrate $final_effective_lrate \ | ||
--trainer.max-param-change $max_param_change \ | ||
--cleanup.remove-egs $remove_egs \ | ||
--feat-dir data/${train_set}_hires \ | ||
--tree-dir $treedir \ | ||
--lat-dir exp/tri5a_sp_lats \ | ||
--dir $dir || exit 1; | ||
fi | ||
|
||
if [ $stage -le 12 ]; then | ||
# Note: it might appear that this $lang directory is mismatched, and it is as | ||
# far as the 'topo' is concerned, but this script doesn't read the 'topo' from | ||
# the lang directory. | ||
utils/mkgraph.sh --self-loop-scale 1.0 data/lang_test $dir $dir/graph | ||
fi | ||
|
||
graph_dir=$dir/graph | ||
if [ $stage -le 13 ]; then | ||
for test_set in dev test; do | ||
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 \ | ||
--nj 5 --cmd "$decode_cmd" \ | ||
--online-ivector-dir exp/nnet3/ivectors_$test_set \ | ||
$graph_dir data/${test_set}_hires $dir/decode_${test_set} || exit 1; | ||
done | ||
fi | ||
|
||
exit; |
Oops, something went wrong.