Skip to content

Commit

Permalink
[TTS] Update TTS tutorials, Simplification of testing Mixer-TTS and F…
Browse files Browse the repository at this point in the history
…astPitch (#3680)

* update notebooks

Signed-off-by: Oktai Tatanov <[email protected]>

* small fix in FastPitch_Finetuning.ipynb

Signed-off-by: Oktai Tatanov <[email protected]>

* update notebooks

Signed-off-by: Oktai Tatanov <[email protected]>

* fix in Inference_ModelSelect.ipynb

Signed-off-by: Oktai Tatanov <[email protected]>

* fix librosa

Signed-off-by: Oktai Tatanov <[email protected]>

* fix style

Signed-off-by: Oktai Tatanov <[email protected]>

* update jenkinsfile, remove unnecessary line in fastpitch

Signed-off-by: Oktai Tatanov <[email protected]>
  • Loading branch information
Oktai15 authored Feb 16, 2022
1 parent dc2ae7f commit 7231aca
Show file tree
Hide file tree
Showing 9 changed files with 170 additions and 99 deletions.
8 changes: 6 additions & 2 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -2304,7 +2304,9 @@ pipeline {
model.input_fft.n_layer=2 \
model.output_fft.d_inner=384 \
model.output_fft.n_layer=2 \
~trainer.check_val_every_n_epoch'
~trainer.check_val_every_n_epoch \
~model.text_normalizer \
~model.text_normalizer_call_kwargs'
}
}
stage('Mixer-TTS') {
Expand All @@ -2320,7 +2322,9 @@ pipeline {
model.train_ds.dataloader_params.num_workers=1 \
model.validation_ds.dataloader_params.batch_size=4 \
model.validation_ds.dataloader_params.num_workers=1 \
~trainer.check_val_every_n_epoch'
~trainer.check_val_every_n_epoch \
~model.text_normalizer \
~model.text_normalizer_call_kwargs'
}
}
stage('Hifigan') {
Expand Down
2 changes: 0 additions & 2 deletions nemo/collections/tts/models/fastpitch.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,8 +197,6 @@ def parser(self):
def parse(self, str_input: str, normalize=True) -> torch.tensor:
if self.training:
logging.warning("parse() is meant to be called in eval mode.")
if str_input[-1] not in [".", "!", "?"]:
str_input = str_input + "."

if normalize and self.text_normalizer_call is not None:
str_input = self.text_normalizer_call(str_input, **self.text_normalizer_call_kwargs)
Expand Down
2 changes: 1 addition & 1 deletion nemo/collections/tts/torch/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -736,7 +736,7 @@ def __init__(
json. Each line should contain the following:
"audio_filepath": <PATH_TO_WAV>,
"duration": <Duration of audio clip in seconds> (Optional),
"mel_filepath": <PATH_TO_LOG_MEL_PT> (Optional)
"mel_filepath": <PATH_TO_LOG_MEL> (Optional, can be in .npy (numpy.save) or .pt (torch.save) format)
sample_rate (int): The sample rate of the audio. Or the sample rate that we will resample all files to.
n_segments (int): The length of audio in samples to load. For example, given a sample rate of 16kHz, and
n_segments=16000, a random 1 second section of audio from the clip will be loaded. The section will
Expand Down
211 changes: 140 additions & 71 deletions tutorials/tts/FastPitch_Finetuning.ipynb

Large diffs are not rendered by default.

22 changes: 11 additions & 11 deletions tutorials/tts/FastPitch_MixerTTS_Training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@
"3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n",
"4. Run this cell to set up dependencies# .\n",
"\"\"\"\n",
"BRANCH = 'main'\n",
"# # If you're using Colab and not running locally, uncomment and run this cell.\n",
"# !apt-get install sox libsndfile1 ffmpeg\n",
"# !pip install wget unidecode\n",
"# BRANCH = 'main'\n",
"# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]"
]
},
Expand Down Expand Up @@ -91,7 +91,7 @@
"\n",
"FastPitch is non-autoregressive model for mel-spectrogram generation based on FastSpeech, conditioned on fundamental frequency contours. For more details about model, please refer to the original [paper](https://arxiv.org/abs/2006.06873). NeMo re-implementation of FastPitch additionally uses unsupervised speech-text [aligner](https://arxiv.org/abs/2108.10447) which was originally implemented in [FastPitch 1.1](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/FastPitch).\n",
"\n",
"### MixerTTS\n",
"### Mixer-TTS\n",
"\n",
"Mixer-TTS is another non-autoregressive model for mel-spectrogram generation. It is structurally similar to FastPitch: duration prediction, pitch prediction, unsupervised TTS alignment framework, but the main difference is that Mixer-TTS is based on the [MLP-Mixer](https://arxiv.org/abs/2105.01601) architecture adapted for speech synthesis.\n",
"\n",
Expand Down Expand Up @@ -226,9 +226,9 @@
"\n",
"# additional files\n",
"!mkdir -p tts_dataset_files && cd tts_dataset_files \\\n",
"&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/tts_dataset_files/cmudict-0.7b_nv22.01 \\\n",
"&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/scripts/tts_dataset_files/heteronyms-030921 \\\n",
"&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/nemo_text_processing/text_normalization/en/data/whitelist_lj_speech.tsv \\\n",
"&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/scripts/tts_dataset_files/cmudict-0.7b_nv22.01 \\\n",
"&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/scripts/tts_dataset_files/heteronyms-030921 \\\n",
"&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/nemo_text_processing/text_normalization/en/data/whitelist_lj_speech.tsv \\\n",
"&& cd .."
]
},
Expand All @@ -251,10 +251,10 @@
"metadata": {},
"outputs": [],
"source": [
"!wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/tts/fastpitch.py\n",
"!wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/tts/fastpitch.py\n",
"\n",
"!mkdir -p conf && cd conf \\\n",
"&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/tts/conf/fastpitch_align_v1.05.yaml \\\n",
"&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/tts/conf/fastpitch_align_v1.05.yaml \\\n",
"&& cd .."
]
},
Expand Down Expand Up @@ -392,10 +392,10 @@
"metadata": {},
"outputs": [],
"source": [
"!wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/tts/mixer_tts.py\n",
"!wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/tts/mixer_tts.py\n",
"\n",
"!mkdir -p conf && cd conf \\\n",
"&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/tts/conf/mixer-tts.yaml \\\n",
"&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/tts/conf/mixer-tts.yaml \\\n",
"&& cd .."
]
},
Expand Down Expand Up @@ -533,7 +533,7 @@
"id": "2d9745fc",
"metadata": {},
"source": [
"### MixerTTS\n",
"### Mixer-TTS\n",
"\n",
"Now we are ready for training our model! Let's try to train Mixer-TTS.\n",
"\n",
Expand Down Expand Up @@ -601,7 +601,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
"version": "3.8.6"
}
},
"nbformat": 4,
Expand Down
6 changes: 3 additions & 3 deletions tutorials/tts/Inference_DurationPitchControl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,11 @@
"3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n",
"4. Run this cell to set up dependencies.\n",
"\"\"\"\n",
"BRANCH = 'main'\n",
"# # If you're using Google Colab and not running locally, uncomment and run this cell.\n",
"# !apt-get install sox libsndfile1 ffmpeg\n",
"# !pip install wget unidecode\n",
"# BRANCH = 'main'\n",
"# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[tts]"
"# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]"
]
},
{
Expand Down Expand Up @@ -504,7 +504,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
"version": "3.8.6"
}
},
"nbformat": 4,
Expand Down
6 changes: 3 additions & 3 deletions tutorials/tts/Inference_ModelSelect.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,11 @@
"3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n",
"4. Run this cell to set up dependencies.\n",
"\"\"\"\n",
"BRANCH = 'main'\n",
"# # If you're using Google Colab and not running locally, uncomment and run this cell.\n",
"# !apt-get install sox libsndfile1 ffmpeg\n",
"# !pip install wget unidecode\n",
"# BRANCH = 'main'\n",
"# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[tts]"
"# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]"
]
},
{
Expand Down Expand Up @@ -410,4 +410,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
6 changes: 3 additions & 3 deletions tutorials/tts/Tacotron2_Training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
"# # If you're using Colab and not running locally, uncomment and run this cell.\n",
"# !apt-get install sox libsndfile1 ffmpeg\n",
"# !pip install wget unidecode\n",
"# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[tts]"
"# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]"
]
},
{
Expand Down Expand Up @@ -316,7 +316,7 @@
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -330,7 +330,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
"version": "3.8.6"
}
},
"nbformat": 4,
Expand Down
6 changes: 3 additions & 3 deletions tutorials/tts/TalkNet_Training.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@
"3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n",
"4. Run this cell to set up dependencies# .\n",
"\"\"\"\n",
"BRANCH = 'main'\n",
"# # If you're using Colab and not running locally, uncomment and run this cell.\n",
"# !apt-get install sox libsndfile1 ffmpeg\n",
"# !pip install wget unidecode pysptk\n",
"# BRANCH = 'main'\n",
"# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[all]"
]
},
Expand Down Expand Up @@ -496,7 +496,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -510,7 +510,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.11"
"version": "3.8.6"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 7231aca

Please sign in to comment.