Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phone based LF-MMI training #19

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

csukuangfj
Copy link
Collaborator

Phone based LF-MMI training is easier than wordpiece based LF-MMI training,
so I would like to get a working version of MMI training based on phone first.

@danpovey
Copy link
Collaborator

danpovey commented Aug 23, 2021 via email

@csukuangfj
Copy link
Collaborator Author

Here are the results I have so far with this pull request:

HLG decoding (1best, no LM rescoring)

( with model averaging from epoch-43.pt to epoch-49.pt)

[test-clean-no_rescore] %WER 3.69% [1941 / 52576, 263 ins, 137 del, 1541 sub ]
[test-other-no_rescore] %WER 7.35% [3849 / 52343, 522 ins, 269 del, 3058 sub ]

HLG decoding (1best) + 4-gram LM rescoring

( with model averaging from epoch-42.pt to epoch-49.pt)

[test-clean-lm_scale_1.1] %WER 3.33% [1753 / 52576, 309 ins, 92 del, 1352 sub ]
[test-other-lm_scale_1.2] %WER 6.77% [3542 / 52343, 601 ins, 207 del, 2734 sub ]

The plans for the following days are:

(1) Training with attention decoder.

Unlike training with BPE units, where a word has only one pronunciation, there may have multiple pronunciations
for a word in phone-based units. My plan is to choose only the first pronunciation if there is more than one.

(2) Instead of training a TDNN-LSTM model as a force alignment model, integrate the changes from lhotse
lhotse-speech/lhotse#379 to use the alignment information contained in the supervision.

(3) Replace phone-based MMI training with BPE based MMI training

@danpovey
Copy link
Collaborator

danpovey commented Sep 7, 2021

OK, that's great.
I'm hoping that once we incorporate the alignment information, we'll find that the BPE-based LF-MMI training starts to converge. Fingers crossed!

@csukuangfj
Copy link
Collaborator Author

Now it supports using attention-decoder along with MMI training.

Tensorboard log of the below command is available at

https://tensorboard.dev/experiment/Wd049TyrRdyvOkcOiD32FQ/#scalars&_smoothingWeight=0

export CUDA_VISIBLE_DEVICES="0,1,2,3"

./conformer_mmi_phone/train.py \
  --full-libri 1 \
  --max-duration 200 \
  --bucketing-sampler 1 \
  --concatenate-cuts 0 \
  --world-size 4 \
  --bucketing-sampler 1

@danpovey
Copy link
Collaborator

Wow, great progress!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants