Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MelResNet Block #705

Merged
merged 10 commits into from
Jun 16, 2020
Merged

Add MelResNet Block #705

merged 10 commits into from
Jun 16, 2020

Conversation

jimchen90
Copy link
Contributor

@jimchen90 jimchen90 commented Jun 8, 2020

This MelResNet block is part of WaveRNN model . Now the test is to validate the output dimensions of this block. Other tests will be added after other blocks are combined.
Related to #446

Stack:
1. Add MelResNet Block #705 #751
2. Add Upsampling Block #724
3. Add WaveRNN Model #735
4. Add example pipeline with WaveRNN #749

@codecov
Copy link

codecov bot commented Jun 8, 2020

Codecov Report

Merging #705 into master will increase coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #705      +/-   ##
==========================================
+ Coverage   89.05%   89.15%   +0.09%     
==========================================
  Files          28       29       +1     
  Lines        2467     2489      +22     
==========================================
+ Hits         2197     2219      +22     
  Misses        270      270              
Impacted Files Coverage Δ
torchaudio/models/__init__.py 100.00% <100.00%> (ø)
torchaudio/models/_wavernn.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab733e7...9eaefd5. Read the comment docs.

Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is draft but added some comments before I forget.
Good work!

class ResBlock(nn.Module):
r"""
Args:
num_dims (int, optional): Number of compute dimensions in ResBlock. (Default: ``128``)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the default value declared in function signature. Did you forget?

Copy link
Contributor Author

@jimchen90 jimchen90 Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I will update the file with comments. The default value will be added.

@pytest.mark.parametrize('batch_size', [2])
@pytest.mark.parametrize('num_features', [200])
@pytest.mark.parametrize('input_dims', [100])
@pytest.mark.parametrize('output_dims', [128])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are not really parameterizing the variables, it's better to put them inside of test definition.
If you add more parameters, please use parameterized.parameterized_expand, instead of pytest.mark.parametrize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check it. Thanks.

torchaudio/models/wavernn.py Outdated Show resolved Hide resolved
@jimchen90 jimchen90 requested a review from vincentqb June 10, 2020 17:34
@vincentqb
Copy link
Contributor

For the docstring, let's follow transformer as an example.

@jimchen90
Copy link
Contributor Author

jimchen90 commented Jun 12, 2020

Updates:

  1. Use sequential container in the model.
  2. Update test function by putting variables inside the function.
  3. Add docstring in each class and forward function.

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pointed to a few nits, but this looks good to me overall :)

torchaudio/models/wavernn.py Outdated Show resolved Hide resolved
torchaudio/models/wavernn.py Outdated Show resolved Hide resolved
torchaudio/models/wavernn.py Outdated Show resolved Hide resolved
torchaudio/models/wavernn.py Outdated Show resolved Hide resolved
@vincentqb
Copy link
Contributor

Can you rebase? I see a few unrelated errors that have been fixed on master.

E       RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

(https://github.com/G-Wang/WaveRNN-Pytorch)

Args:
num_dims: the number of compute dimensions in the input (default=128).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking out loud: we decided in conventions to use n_* see readme. What is written here aligns with wav2letter though. We may need to revisit these conventions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimchen90
Copy link
Contributor Author

jimchen90 commented Jun 12, 2020

Updates:

  1. Change the output more readable:
    residual = x
    return self.resblock_model(x) + residual
    return self.melresnet_model(x)
  2. Rebase and docstring change.
  3. Add pad variable. Wang's WaveRNN doesn't use it and sets it to 2 inside the MelResNet function. Fatchord's WaveRNN uses pad as a variable. There is no difference between them. Adding pad variable makes the model more general.

@vincentqb
Copy link
Contributor

vincentqb commented Jun 15, 2020

Tests should pass after rebasing, #720.

@jimchen90 jimchen90 marked this pull request as ready for review June 15, 2020 19:47
@vincentqb
Copy link
Contributor

vincentqb commented Jun 16, 2020

@jimchen90 -- While we are working on implementing the full pipeline, let's make the wavernn module private by prefixing the file with an underscore, _wavernn.py. This will allow us to merge the PR even though all the steps are not yet finished.

Please also add a fourth step in the description:

Add example pipeline with wavernn (ongoing)

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@jimchen90 jimchen90 merged commit 4318fc5 into pytorch:master Jun 16, 2020
This was referenced Jun 17, 2020
@@ -29,3 +29,23 @@ def test_mfcc(self):
out = model(x)

assert out.size() == (batch_size, num_classes, 2)


class TestMelResNet:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not subclass unittest.TestCase so it won't run in fbcode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! @jimchen90 -- can you send a follow-up pull request to update this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will update it.

@PetrochukM
Copy link

PetrochukM commented Jun 18, 2020

Thanks for your contribution!

Respectfully, the WaveRNN paper never mentions "upsampling" or "residual block" (the only mention is in reference to WaveNet) or "mel" or "spectrogram". It would be inaccurate to say:

It is a block used in WaveRNN. WaveRNN is based on the paper "Efficient Neural Audio Synthesis".

The "MelResNet" block is never mentioned or referenced in the paper... and it would be inaccurate to state it was.

input_dims: the number of input sequence (default=100).
hidden_dims: the number of compute dimensions (default=128).
output_dims: the number of output sequence (default=128).
pad: the number of kernal size (pad * 2 + 1) in the first Conv1d layer (default=2).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo "kernel"

@jimchen90
Copy link
Contributor Author

jimchen90 commented Jun 18, 2020

Thanks for your contribution!

Respectfully, the WaveRNN paper never mentions "upsampling" or "residual block" (the only mention is in reference to WaveNet) or "mel" or "spectrogram". It would be inaccurate to say:

It is a block used in WaveRNN. WaveRNN is based on the paper "Efficient Neural Audio Synthesis".

The "MelResNet" block is never mentioned or referenced in the paper... and it would be inaccurate to state it was.

Thank you so much for pointing this out. @PetrochukM
Yes, you are right. The WaveRNN paper doesn’t mention “upsampling” or “residue block”. I will open a PR to update the reference. 🙂
BTW, do you have any suggestions about the reference that can be put here? Any idea is appreciated.
This model is based on fatchord’s model

@PetrochukM
Copy link

PetrochukM commented Jun 19, 2020

Happy to help!

With regards to references... There are no papers written about "fatchord's model", that I know about. From my understanding, there also hasn't been any formal evaluation of "fatchord's model". From an identity perspective, I don't even know fatchord's name, so it's hard to credit.

I'm also a bit perplexed. I'm thinking it doesn't make sense to implement a model that has not been published, peer-reviewed, or evaluated.

mpc001 pushed a commit to mpc001/audio that referenced this pull request Aug 4, 2023
Simple example to demonstrate parameter server training pattern
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants