Phone based LF-MMI training #19

csukuangfj · 2021-08-23T04:59:45Z

Phone based LF-MMI training is easier than wordpiece based LF-MMI training,
so I would like to get a working version of MMI training based on phone first.

danpovey · 2021-08-23T05:23:19Z

Cool!

…

csukuangfj · 2021-09-07T12:23:36Z

Here are the results I have so far with this pull request:

HLG decoding (1best, no LM rescoring)

( with model averaging from epoch-43.pt to epoch-49.pt)

[test-clean-no_rescore] %WER 3.69% [1941 / 52576, 263 ins, 137 del, 1541 sub ]
[test-other-no_rescore] %WER 7.35% [3849 / 52343, 522 ins, 269 del, 3058 sub ]

HLG decoding (1best) + 4-gram LM rescoring

( with model averaging from epoch-42.pt to epoch-49.pt)

[test-clean-lm_scale_1.1] %WER 3.33% [1753 / 52576, 309 ins, 92 del, 1352 sub ]
[test-other-lm_scale_1.2] %WER 6.77% [3542 / 52343, 601 ins, 207 del, 2734 sub ]

The plans for the following days are:

(1) Training with attention decoder.

Unlike training with BPE units, where a word has only one pronunciation, there may have multiple pronunciations
for a word in phone-based units. My plan is to choose only the first pronunciation if there is more than one.

(2) Instead of training a TDNN-LSTM model as a force alignment model, integrate the changes from lhotse
lhotse-speech/lhotse#379 to use the alignment information contained in the supervision.

(3) Replace phone-based MMI training with BPE based MMI training

danpovey · 2021-09-07T12:26:13Z

OK, that's great.
I'm hoping that once we incorporate the alignment information, we'll find that the BPE-based LF-MMI training starts to converge. Fingers crossed!

csukuangfj · 2021-09-11T06:23:35Z

Now it supports using attention-decoder along with MMI training.

Tensorboard log of the below command is available at

https://tensorboard.dev/experiment/Wd049TyrRdyvOkcOiD32FQ/#scalars&_smoothingWeight=0

export CUDA_VISIBLE_DEVICES="0,1,2,3"

./conformer_mmi_phone/train.py \
  --full-libri 1 \
  --max-duration 200 \
  --bucketing-sampler 1 \
  --concatenate-cuts 0 \
  --world-size 4 \
  --bucketing-sampler 1

danpovey · 2021-09-11T07:13:43Z

Wow, great progress!

csukuangfj added 2 commits August 23, 2021 12:52

Add phone based LF-MMI training.

27a0c80

Fix style issues.

7b267e8

More style issue fixes.

4d849cf

csukuangfj mentioned this pull request Aug 27, 2021

BucketingSampler more randomness? lhotse-speech/lhotse#364

Closed

csukuangfj added 5 commits September 9, 2021 15:15

Fix decode.py

ce9b233

Merge remote-tracking branch 'dan/master' into mmi-phone

31b3e5b

Remove optional silence (SIL).

5390ced

Convert word IDs in a transcript to token IDs

78e1fdc

Support using attention decoder in MMI training.

4f3a53f

Fix decoding.

d4440b4

Lzhang-hub mentioned this pull request Oct 20, 2021

CUDA out of memory in decoding #70

Open

danpovey mentioned this pull request Nov 27, 2021

Decoding error 'Fsa' object doesn't support assignment. #133

Open

ahazned mentioned this pull request Apr 13, 2022

Illegal memory error when training with multi-GPU #247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phone based LF-MMI training #19

Phone based LF-MMI training #19

csukuangfj commented Aug 23, 2021

danpovey commented Aug 23, 2021 via email

csukuangfj commented Sep 7, 2021

danpovey commented Sep 7, 2021

csukuangfj commented Sep 11, 2021

danpovey commented Sep 11, 2021

Phone based LF-MMI training #19

Are you sure you want to change the base?

Phone based LF-MMI training #19

Conversation

csukuangfj commented Aug 23, 2021

danpovey commented Aug 23, 2021 via email

csukuangfj commented Sep 7, 2021

HLG decoding (1best, no LM rescoring)

HLG decoding (1best) + 4-gram LM rescoring

danpovey commented Sep 7, 2021

csukuangfj commented Sep 11, 2021

danpovey commented Sep 11, 2021