Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add document for Mandarin model. #364

Merged
merged 4 commits into from
Nov 3, 2017
Merged

Conversation

pkuyym
Copy link
Contributor

@pkuyym pkuyym commented Oct 11, 2017

Fix #361

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please provide some tips for Mandarin training, especially for those might be different from English training. e.g. data preparation, language model configuration?

@@ -398,7 +398,7 @@ For more information about the DeepSpeech2 training on PaddleCloud, please refer

## Training for Mandarin Language

TODO: to be added
The steps of training, evaluation and inference for Mandarin ASR model is same with English ASR model. We have provided an example for Mandarin data which using Aishell dataset and you can find it in ```examples/aishell```. As mentioned above, you can execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, test and inference correspondingly. We have also tuned a setting to get better model performance (not the best), and you can execute ```sh run_infer_golden.sh``` to show some speech-to-text decoding results.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is same with --> is the same to
  2. for Mandarin data which using Aishell dataset and you cdan find it in --> for Mandarin training with Aishell in
  3. you can execute --> please execute
  4. test --> testing
  5. We have also tuned a setting to get better model performance .... ---> We have also prepared a pre-trained model (downloaded in ./models/aishell/download_model.sh) for users to try with sh run_infer_golden.sh and sh run_test_golden.sh.

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM.

@@ -398,7 +398,7 @@ For more information about the DeepSpeech2 training on PaddleCloud, please refer

## Training for Mandarin Language

TODO: to be added
The key steps of training for Mandarin Language are same to that of English Language and we have also provided an example for Mandarin training with Aishell in ```examples/aishell```. As mentioned above, please execute ```sh run_data.sh```, ```sh run_train.sh```, ```sh run_test.sh``` and ```sh run_infer.sh``` to do data preparation, training, test and inference correspondingly. We have also prepared a pre-trained model (downloaded by ./models/aishell/download_model.sh) for users to try with ```sh run_infer_golden.sh``` and ```sh run_test_golden.sh```. Notice that, different from English LM, the Mandarin LM is character based and please run ```tools/tune.py``` to find an optimal setting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language --> language
test --> testing
character based --> character-based

@pkuyym pkuyym merged commit bab3be4 into PaddlePaddle:develop Nov 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants