-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add document for Mandarin model. #364
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please provide some tips for Mandarin training, especially for those might be different from English training. e.g. data preparation, language model configuration?
deep_speech_2/README.md
Outdated
@@ -398,7 +398,7 @@ For more information about the DeepSpeech2 training on PaddleCloud, please refer | |||
|
|||
## Training for Mandarin Language | |||
|
|||
TODO: to be added | |||
The steps of training, evaluation and inference for Mandarin ASR model is same with English ASR model. We have provided an example for Mandarin data which using Aishell dataset and you can find it in ```examples/aishell```. As mentioned above, you can execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, test and inference correspondingly. We have also tuned a setting to get better model performance (not the best), and you can execute ```sh run_infer_golden.sh``` to show some speech-to-text decoding results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is same with --> is the same to
- for Mandarin data which using Aishell dataset and you cdan find it in --> for Mandarin training with Aishell in
- you can execute --> please execute
- test --> testing
- We have also tuned a setting to get better model performance .... ---> We have also prepared a pre-trained model (downloaded in
./models/aishell/download_model.sh
) for users to try withsh run_infer_golden.sh
andsh run_test_golden.sh
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM.
deep_speech_2/README.md
Outdated
@@ -398,7 +398,7 @@ For more information about the DeepSpeech2 training on PaddleCloud, please refer | |||
|
|||
## Training for Mandarin Language | |||
|
|||
TODO: to be added | |||
The key steps of training for Mandarin Language are same to that of English Language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, test and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character based and please run ```tools/tune.py``` to find an optimal setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Language --> language
test --> testing
character based --> character-based
Fix #361