Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input output number of mel #10

Open
wjc2830 opened this issue Dec 31, 2024 · 3 comments
Open

Input output number of mel #10

wjc2830 opened this issue Dec 31, 2024 · 3 comments

Comments

@wjc2830
Copy link

wjc2830 commented Dec 31, 2024

Hi Haohe, this is really an awesome work you have done. However, after checking your code, I have a question on the data pre-processing part, where I notice a mel map with a height of 128 is fed into the model to cater to the AudioMAE dimension, while the output mel map has a height of 64. So I wanna sure during training, if there should be two kinds of mel maps, one for extracting AudioMAE feature, and one for diffusion loss computation. If yes, would you mind to share your scripts to extract two kinds of mel maps? Cuz I have tried to use that within your code to produce 64 mel, while I failed to recover waveforms through the vocoder included within your decoder.ckpt. Thanks for your prompt reply!

@wjc2830
Copy link
Author

wjc2830 commented Jan 2, 2025

No need, I found it. Thx again for your contribution!

@wjc2830
Copy link
Author

wjc2830 commented Jan 2, 2025

BTW, I am still curious on the training strategy. Did you really utilize two types of mels, one for input and one for output?

@haoheliu
Copy link
Owner

Hi @wjc2830 If I remember correctly, yes. It's not the best, but at least it works—there is a lot of room for improvement in future work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants