-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KWS with CTCloss training and CTC prefix beam search detection. #135
Conversation
…CTC model's result in README; For now CTC model runtime is not supported yet.
good job! Thank you so much for this valuable update, and we will appreciate if you could help fix the lint errors such as the trailing whitespace. |
Here is a Demo, all models are trained with ctc loss, and in this demo, the detection is performed in streaming fashion. |
Thanks for your constructive contribution to this project. Could you please
make some revisions to the code, following the flake8 format check?
[image: image.png]
Jean Du ***@***.***> 于2023年6月28日周三 18:55写道:
… Here is a Demo, all models are trained with ctc loss, and in this demo,
the detection is performed in streaming fashion.
https://www.modelscope.cn/studios/thuduj12/KWS_Nihao_Xiaojing/summary
—
Reply to this email directly, view it on GitHub
<#135 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA4OP3XREIL4WQVGRQVGDDXNQEQRANCNFSM6AAAAAAYZD3MQ4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Best regards,
Jingyong
Northwestern Polytechnical University
Phone: +1 425 394 3659, +86 18392375265
Wechat: zaixialalala
|
examples/hi_xiaowen/s0/run_ctc.sh
Outdated
--checkpoint $score_checkpoint \ | ||
--score_file $result_dir/score.txt \ | ||
--num_workers 8 \ | ||
--keywords 嗨小问,你好问问 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it better to replace Chinese char with Latin char in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has been done in the latest commit.
examples/hi_xiaowen/s0/run_ctc.sh
Outdated
--lexicon_file data/lexicon.txt | ||
|
||
python wekws/bin/compute_det_ctc.py \ | ||
--keywords 嗨小问,你好问问 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it better to replace Chinese char with Latin char in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
examples/hi_xiaowen/s0/README.md
Outdated
| DS_TCN(spec_aug) | CTC | 0.056574 | 0.056856 | | ||
|
||
|
||
Comparison between DS_TCN(Pretrained with Wenetspeech, 23 epoch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to release the pre-trained model, so people can reproduce the experiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretrained models are released in the latest commit(README).
Here I do some more experiments on a private datasets
The results are as following:
As we can see CTC-KWS model outperform Maxpooling model with DS_TCN backbone. |
`import librosa def kws_process(wav_path):
if name == 'main': 这个自带的音频都不能唤醒,是不是有啥问题? |
You can check the model you used, In kws_demo, I gave two models, one is for "Hi_XiaoWen", the other is for "Nihao_Xiaojing", check it first. And this demo has web server, you can verify it in https://www.modelscope.cn/studios/thuduj12/KWS_Nihao_Xiaojing/summary |
I got 2 model, one is 23.pt (key word is 你好问问, or 嗨小问), another is avg_30.pt (key word is 你好晓静), am i right? i want to reproduce the result like your web server on my computer, so please help, thanks a lot! |
Here are two models you can use. https://github.com/duj12/kws_demo/tree/master/model, the code is also here. |
Great job! let's merge and refine in the future. |
Hi~,when I run run_fsmn_ctc.sh I encountered the following problem: |
right now the import of fsmn onnx may still have issue, and ctc model runtime is not support yet.
杜靖
***@***.***
… 在 2023年8月24日,13:14,Dapannnnn ***@***.***> 写道:
Here I do some more experiments on a private datasets
positive(hello_xiaojing) negative(noise)
train 18 speakers,2219 segments 55 hours
dev 2 speakers,248 segments 12 hours
test 4 speakers,474 segments 24 hours
The results are as following:
backbone loss 1-FRR(%) FAR(/24h) Threshold
ds_tcn maxpooling 81.1 2 0.88
ds_tcn ctc 89.7 1 0.02
fsmn ctc 93.3 2 0.018
As we can see CTC-KWS model outperform Maxpooling model with DS_TCN backbone. Since the fsmn and ds_tcn use different data to do pretraining (also, different feature pipeline, and with different epoch), it's hard to say which backbone is better. But the pretrained fsmn(alibaba released in https://modelscope.cn/models/damo/speech_charctc_kws_phone-xiaoyun/summary ) is better than the ds_tcn model I pretrained (23.pt in https://modelscope.cn/datasets/thuduj12/mobvoi_kws_transcription/files ). So I recommend you use the pretrained FSMN model to train your models.
Hi~,when I run run_fsmn_ctc.sh I encountered the following problem:
This is an error reported when exporting onnx, I think it should be related to the input set, but I don't know how to set the input correctly for the fsmn model.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.
|
Hi! Could you provide a paper that is implemented in this pull request? I have another question: how can I transfer it to the Google speech command? Thanks in advance. |
Fow now the lexicon.txt and tokens,txt in Hi_XiaoWen‘s fsmn_CTC model only support Chinese KWS. You need to build a new lexicon and token dict, maybe you can use some resoures in English ASR projects. The tokens.txt is the CTC model's output units, and the lexicon.txt is the map between the words and the output units. |
Hi, could you please tell us the detailed implement steps of create new Customized words? |
Yes, I also want to know how to quickly add custom Chinese keywords based on the kew_demo. It would be great if the author could briefly explain that. |
This PR is for KWS training with CTC loss and detection with CTC prefix beam search.
Aim to improve the robustness of KWS model, and support customized keywords with limited data.
For now only hi_xiaowen data has runing scripts.
I redo the experiment of ds_tcn with max-pooling loss.
Then I add the ds_tcn model with CTC loss, and add a FSMN backbone.
Finally I add a streaming scoring script, to simulate the real detection case of a CTC model.
All results can be found in README.md.
Note, CTC model can be export to onnx(the output is softmax of logits), but runtime is not support now,
I decide to develop a python script in runtime streaming fashion(do online feature extraction and so on) first, and then the onnx c++ and pybind...
When runtime is ready, I will create a new PR.