You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Haohe, this is really an awesome work you have done. However, after checking your code, I have a question on the data pre-processing part, where I notice a mel map with a height of 128 is fed into the model to cater to the AudioMAE dimension, while the output mel map has a height of 64. So I wanna sure during training, if there should be two kinds of mel maps, one for extracting AudioMAE feature, and one for diffusion loss computation. If yes, would you mind to share your scripts to extract two kinds of mel maps? Cuz I have tried to use that within your code to produce 64 mel, while I failed to recover waveforms through the vocoder included within your decoder.ckpt. Thanks for your prompt reply!
The text was updated successfully, but these errors were encountered:
Hi Haohe, this is really an awesome work you have done. However, after checking your code, I have a question on the data pre-processing part, where I notice a mel map with a height of 128 is fed into the model to cater to the AudioMAE dimension, while the output mel map has a height of 64. So I wanna sure during training, if there should be two kinds of mel maps, one for extracting AudioMAE feature, and one for diffusion loss computation. If yes, would you mind to share your scripts to extract two kinds of mel maps? Cuz I have tried to use that within your code to produce 64 mel, while I failed to recover waveforms through the vocoder included within your decoder.ckpt. Thanks for your prompt reply!
The text was updated successfully, but these errors were encountered: