Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the output of LSTM #13

Open
wqn628 opened this issue Jun 20, 2016 · 9 comments
Open

the output of LSTM #13

wqn628 opened this issue Jun 20, 2016 · 9 comments

Comments

@wqn628
Copy link

wqn628 commented Jun 20, 2016

First, thanks for your help all the time. And I have been being confused by the modeled units all the time .For instance : The unit.txt
image
And I wonder why we should model the first phone and the second phone ,Actually,both of them don't exist in my training label.Can I delete them and not model them ?
any help would be appreciated.

@wqn628
Copy link
Author

wqn628 commented Jun 20, 2016

@yajiemiao

@wqn628
Copy link
Author

wqn628 commented Jun 20, 2016

why should we add noise to the lexicon(noises phonemes to the units) ?

@yajiemiao
Copy link
Owner

if they truly don't exist in your training data, you can safely delete them
but caution that by default, Eesen maps OOV words in your training transcripts to

@wqn628
Copy link
Author

wqn628 commented Jun 23, 2016

Thanks a lot.
In addition, the Essen makes model for mono-phone directly, can tri-phones be the model units in essen ? @yajiemiao .
As the previous acoustic model(GMM_HMM DNN/LSTM_HMM),the tri-phone have outperformed a lot than mono-phone.

@chenzhehuai
Copy link

chenzhehuai commented Jun 23, 2016

using tri-phone as the model unit in essen is possible, u might further generate context label (fstcomposecontext) as in HMM system, and replace tri-phone label in T.fst.
The final WFST changes into T\circ C\circ LG

@wqn628
Copy link
Author

wqn628 commented Jun 23, 2016

sorry ,i don't got it. you mean that I should generate the tri-phone by the hybird pipeline or by the commmand ----"fstcomposecontext".the first or the second ?@chenzhehuai

@chenzhehuai
Copy link

clustered tri-phone should be generated from hybrid system through clustering; while context in WFST can be generated by fstcomposecontext with extra mapping from tri-phone to clustered tri-phone

@yajiemiao
Copy link
Owner

An even simpler way is to generate forced alignment with the GMM-HMM, and take the CD states as CI CTC labels. With this, there is no need to consider context dependency in decoding.
I didn't do such an experiment, so not sure how this could work in practice.

@wqn628
Copy link
Author

wqn628 commented Jun 28, 2016

hello,in the stage of decoding, the problem occur as follows:

image
can you tell what had happened and how I can solve them?
thanks a lot.
@yajiemiao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants