Add MelResNet Block #705

jimchen90 · 2020-06-08T14:05:53Z

This MelResNet block is part of WaveRNN model . Now the test is to validate the output dimensions of this block. Other tests will be added after other blocks are combined.
Related to #446

Stack:
1. Add MelResNet Block #705 #751
~~2. Add Upsampling Block #724~~
~~3. Add WaveRNN Model #735~~
4. Add example pipeline with WaveRNN #749

codecov · 2020-06-08T14:15:48Z

Codecov Report

Merging #705 into master will increase coverage by 0.09%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #705      +/-   ##
==========================================
+ Coverage   89.05%   89.15%   +0.09%     
==========================================
  Files          28       29       +1     
  Lines        2467     2489      +22     
==========================================
+ Hits         2197     2219      +22     
  Misses        270      270

Impacted Files	Coverage Δ
torchaudio/models/__init__.py	`100.00% <100.00%> (ø)`
torchaudio/models/_wavernn.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab733e7...9eaefd5. Read the comment docs.

mthrok

I know this is draft but added some comments before I forget.
Good work!

mthrok · 2020-06-10T14:51:28Z

torchaudio/models/wavernn.py

+class ResBlock(nn.Module):
+    r"""
+    Args:
+        num_dims (int, optional): Number of compute dimensions in ResBlock. (Default: ``128``)


I do not see the default value declared in function signature. Did you forget?

Thanks for the suggestion. I will update the file with comments. The default value will be added.

mthrok · 2020-06-10T14:51:30Z

test/test_models.py

+    @pytest.mark.parametrize('batch_size', [2])
+    @pytest.mark.parametrize('num_features', [200])
+    @pytest.mark.parametrize('input_dims', [100])
+    @pytest.mark.parametrize('output_dims', [128])


Since you are not really parameterizing the variables, it's better to put them inside of test definition.
If you add more parameters, please use parameterized.parameterized_expand, instead of pytest.mark.parametrize.

I will check it. Thanks.

torchaudio/models/wavernn.py

vincentqb · 2020-06-11T20:46:18Z

For the docstring, let's follow transformer as an example.

jimchen90 · 2020-06-12T13:21:09Z

Updates:

Use sequential container in the model.
Update test function by putting variables inside the function.
Add docstring in each class and forward function.

vincentqb

I've pointed to a few nits, but this looks good to me overall :)

torchaudio/models/wavernn.py

vincentqb · 2020-06-12T14:45:04Z

Can you rebase? I see a few unrelated errors that have been fixed on master.

E       RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

torchaudio/models/wavernn.py

vincentqb · 2020-06-12T18:16:19Z

torchaudio/models/wavernn.py

+    (https://github.com/G-Wang/WaveRNN-Pytorch)
+
+    Args:
+        num_dims: the number of compute dimensions in the input (default=128).


thinking out loud: we decided in conventions to use n_* see readme. What is written here aligns with wav2letter though. We may need to revisit these conventions.

jimchen90 · 2020-06-12T22:19:19Z

Updates:

Change the output more readable:
residual = x
return self.resblock_model(x) + residual
return self.melresnet_model(x)
Rebase and docstring change.
Add pad variable. Wang's WaveRNN doesn't use it and sets it to 2 inside the MelResNet function. Fatchord's WaveRNN uses pad as a variable. There is no difference between them. Adding pad variable makes the model more general.

vincentqb · 2020-06-15T15:13:59Z

Tests should pass after rebasing, #720.

torchaudio/models/wavernn.py

vincentqb · 2020-06-16T20:25:37Z

@jimchen90 -- While we are working on implementing the full pipeline, let's make the wavernn module private by prefixing the file with an underscore, _wavernn.py. This will allow us to merge the PR even though all the steps are not yet finished.

Please also add a fourth step in the description:

Add example pipeline with wavernn (ongoing)

torchaudio/models/__init__.py

vincentqb

LGTM :)

mthrok · 2020-06-17T23:05:33Z

test/test_models.py

@@ -29,3 +29,23 @@ def test_mfcc(self):
        out = model(x)

        assert out.size() == (batch_size, num_classes, 2)
+
+
+class TestMelResNet:


This test does not subclass unittest.TestCase so it won't run in fbcode.

good catch! @jimchen90 -- can you send a follow-up pull request to update this?

Yes. I will update it.

PetrochukM · 2020-06-18T17:12:30Z

Thanks for your contribution!

Respectfully, the WaveRNN paper never mentions "upsampling" or "residual block" (the only mention is in reference to WaveNet) or "mel" or "spectrogram". It would be inaccurate to say:

It is a block used in WaveRNN. WaveRNN is based on the paper "Efficient Neural Audio Synthesis".

The "MelResNet" block is never mentioned or referenced in the paper... and it would be inaccurate to state it was.

vincentqb · 2020-06-18T21:07:47Z

torchaudio/models/_wavernn.py

+        input_dims: the number of input sequence (default=100).
+        hidden_dims: the number of compute dimensions (default=128).
+        output_dims: the number of output sequence (default=128).
+        pad: the number of kernal size (pad * 2 + 1) in the first Conv1d layer (default=2).


nit: typo "kernel"

jimchen90 · 2020-06-18T21:57:44Z

Thanks for your contribution!

Respectfully, the WaveRNN paper never mentions "upsampling" or "residual block" (the only mention is in reference to WaveNet) or "mel" or "spectrogram". It would be inaccurate to say:

It is a block used in WaveRNN. WaveRNN is based on the paper "Efficient Neural Audio Synthesis".

The "MelResNet" block is never mentioned or referenced in the paper... and it would be inaccurate to state it was.

Thank you so much for pointing this out. @PetrochukM
Yes, you are right. The WaveRNN paper doesn’t mention “upsampling” or “residue block”. I will open a PR to update the reference. 🙂
BTW, do you have any suggestions about the reference that can be put here? Any idea is appreciated.
This model is based on fatchord’s model

PetrochukM · 2020-06-19T00:17:33Z

Happy to help!

With regards to references... There are no papers written about "fatchord's model", that I know about. From my understanding, there also hasn't been any formal evaluation of "fatchord's model". From an identity perspective, I don't even know fatchord's name, so it's hard to credit.

I'm also a bit perplexed. I'm thinking it doesn't make sense to implement a model that has not been published, peer-reviewed, or evaluated.

Simple example to demonstrate parameter server training pattern

mthrok reviewed Jun 10, 2020

View reviewed changes

jimchen90 requested a review from vincentqb June 10, 2020 17:34

vincentqb reviewed Jun 11, 2020

View reviewed changes