-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add wavernn example pipeline #749
Conversation
bits = 16 if self.mode == 'MOL' else self.n_bits | ||
|
||
x = (x + 1.) * (2 ** bits - 1) / 2 | ||
x = torch.clamp(x, min=0, max=2 ** bits - 1) | ||
|
||
return mel.squeeze(0), x.int().squeeze(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This converts representation of a waveform from [-1, 1] to 16-bit integer representation. For instance, this is done in load_wav already. Since this is an important step and can be generalized, let's make this into a function within torchaudio. One point of discussion is whether we add that directly in WaveRNN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has been added as normalized_waveform_to_bits
function in processing.py
.
Codecov Report
@@ Coverage Diff @@
## master #749 +/- ##
==========================================
+ Coverage 89.87% 89.88% +0.01%
==========================================
Files 34 34
Lines 2666 2660 -6
==========================================
- Hits 2396 2391 -5
+ Misses 270 269 -1
Continue to review full report at Codecov.
|
210945c
to
9429ff0
Compare
btw, can you add a README.md to discuss the pipeline? |
It'd be nice to get a baseline by comparing the error you get here to the output obtained by Griffin-Lim, say, and in other norms too L^1, L^2 for instance. |
Since this is not the original WaveRNN model, I'd recommend renaming it as "FatchordWaveRNN" or something similar. |
9429ff0
to
bfbf39f
Compare
dc0fd1b
to
04bfe24
Compare
79b653b
to
6f8660a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Minor things to address:
- fix jit in wavernn in separate pull request Fix output type of upsampling #801
- change two command line parameters
- fix default format in docstring of wavernn model Update form of default value in docstring #802
…ials Fix formatting and clean up tutorial on quantized transfer learning
Co-authored-by: Shen Li <[email protected]>
This is a reference example using WaveRNN model to train on LJSpeech. The structure will be inspired by #632 and WaveRNN.
There are at least a few more things to do:
Add torchaudio transforms on mel-spectrogram.Related to #446
Stack:
Add MelResNet Block #705, #751Add Upsampling Block #724Add WaveRNN Model #735Add example pipeline with WaveRNN #749
Remove underscore of wavernn model #810
cc @cpuhrsch @zhangguanheng66
internal