Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference of Multiband MelGAN (v2) with ForwardTacotron #346

Closed
prajwaljpj opened this issue Mar 21, 2022 · 4 comments
Closed

Inference of Multiband MelGAN (v2) with ForwardTacotron #346

prajwaljpj opened this issue Mar 21, 2022 · 4 comments
Labels
question Further information is requested

Comments

@prajwaljpj
Copy link

I have trained a ForwardTacotron text2mel model and I would like to integrate it into Parallelwavegan.
For now we have extracted the genrated mel_post (Melspectrogram after Postnet) from here and saving it as a .npy file (alifiya_esp_1.npy.zip). Then we use StandardScalar to normalize the data from here (1 and 2) and infer through here.
This is the sample output of the corresponding numpy file (alifiya_esp_1.wav.zip)
The same mel is working fine with GriffinLim.

Where am I going wrong?

@kan-bayashi kan-bayashi added the question Further information is requested label Mar 22, 2022
@kan-bayashi
Copy link
Owner

kan-bayashi commented Mar 22, 2022

Not sure but the following points might be different:

  • log basis (I use log10 as a default)
  • fmin and fmax for mel basis (I use 80-7600 as a default)
  • normalization (I use mean var normalization using stats of training data)

If you use the same mel basis, maybe log is different.
https://github.com/as-ideas/ForwardTacotron/blob/3bcaf3569ea2379ff995403b31f280720df3f03d/utils/dsp.py#L71-L87
https://github.com/as-ideas/ForwardTacotron/blob/3bcaf3569ea2379ff995403b31f280720df3f03d/utils/dsp.py#L105-L107

You can change the log basis to match the feature extraction condition.
Relate: #169 (comment)

@prajwaljpj
Copy link
Author

Is there a way to circumvent without re-training the text2mel model?

@redhood95
Copy link

  • log basis (I use log10 as a default)
  • normalization (I use mean var normalization using stats of training data)

we have tried to fix this by using np.log10(np.exp(ft_mel_output)) and normalize it using the mean var normalization with standardScaler
are there any other changes we can make at the inference time for fixing this ?

@kan-bayashi
Copy link
Owner

If fmin / fmax are different, there is no way to use pretrained model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants