Input output number of mel #10

wjc2830 · 2024-12-31T11:58:33Z

Hi Haohe, this is really an awesome work you have done. However, after checking your code, I have a question on the data pre-processing part, where I notice a mel map with a height of 128 is fed into the model to cater to the AudioMAE dimension, while the output mel map has a height of 64. So I wanna sure during training, if there should be two kinds of mel maps, one for extracting AudioMAE feature, and one for diffusion loss computation. If yes, would you mind to share your scripts to extract two kinds of mel maps? Cuz I have tried to use that within your code to produce 64 mel, while I failed to recover waveforms through the vocoder included within your decoder.ckpt. Thanks for your prompt reply!

wjc2830 · 2025-01-02T09:34:40Z

No need, I found it. Thx again for your contribution!

wjc2830 · 2025-01-02T09:45:06Z

BTW, I am still curious on the training strategy. Did you really utilize two types of mels, one for input and one for output?

haoheliu · 2025-01-17T00:08:45Z

Hi @wjc2830 If I remember correctly, yes. It's not the best, but at least it works—there is a lot of room for improvement in future work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input output number of mel #10

Input output number of mel #10

wjc2830 commented Dec 31, 2024

wjc2830 commented Jan 2, 2025

wjc2830 commented Jan 2, 2025

haoheliu commented Jan 17, 2025

Input output number of mel #10

Input output number of mel #10

Comments

wjc2830 commented Dec 31, 2024

wjc2830 commented Jan 2, 2025

wjc2830 commented Jan 2, 2025

haoheliu commented Jan 17, 2025