-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TTS] Add tutorial for TTS data prep scripts #6922
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. LGTM, added some neat-picks.
"source": [ | ||
"In this tutorial, we will prepare a dataset using our [TTS Dataset Processing Scripts](https://github.com/NVIDIA/NeMo/tree/main/scripts/dataset_processing/tts) and use it for training a FastPitch model.\n", | ||
"\n", | ||
"**This tutorial uses a different workflow than all other existing TTS tutorials. The scripts and classes used are all experimental and not yet ready for production**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing a period at the end of the sentence.
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"# Dataset Prepration" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Prepration/Preparation/
"cell_type": "code", | ||
"source": [ | ||
"import IPython.display as ipd\n", | ||
"from matplotlib.pyplot import imshow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove since no other places use it.
"source": [ | ||
"We can use [create_speaker_map.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/dataset_processing/tts/create_speaker_map.py) to easily create a mapping from speaker ID strings to integer indices that will be used at training time.\n", | ||
"\n", | ||
"The script will simply sort the speaker IDs and assign them numbers [0, num_speakers) in alphabetical order." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/[0, num_speakers)/[0, num_speakers)
/
{ | ||
"cell_type": "markdown", | ||
"source": [ | ||
"Before training FastPitch, we need to compute some features for every audio file. The default [config file](https://github.com/NVIDIA/NeMo/blob/main/examples/tts/conf/feature/feature_44100.yaml) we will use has parameters for computing the **pitch** and **energy** of every audio frame." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I saw inconsistent font formats for pitch and energy, and sometimes pitch and energy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the tutorial to try to the formatting more consistent.
- Use bold when it is the first time an important vocab term is mentioned.
- Use
code
when it refers to specific code, variable name, file, etc. - Use italics to emphasize any other key words.
"For training it is beneficial for us to *normalize* our features. The most standard approach is to apply **mean-variance normalization** so that each feature has a mean of 0 and variance of 1. To do this we need to compute the *dataset statistics* with the mean and variance of each feature.\n", | ||
"\n", | ||
"For TTS it also helps\n", | ||
"* Normalize features using speaker-level statistics\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing a period.
Signed-off-by: Ryan <[email protected]>
Signed-off-by: Ryan <[email protected]>
207a14f
to
c3c0b5b
Compare
Signed-off-by: Ryan <[email protected]>
What does this PR do ?
Add a tutorial demonstrating how to do the end to end data preparation and training with the new TTS preprocessing scripts and data loader.
Collection: [TTS]
Changelog
Before your PR is "Ready for review"
Pre checks:
PR Type: