You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What's the semantic value of the HuBERT model you trained if it's using the first RVQ layer of the acoustic tokenizer? I.e. the acoustic model is already exposed to that.
What was the sampling rate of the input audio for the semantic model? Is it the same for the acoustic model?
The text was updated successfully, but these errors were encountered:
Thank you for your interest in our work.
1, Our training approach aligns with that of the HuBERT model, with a modification being the target of our acoustic unit discovery system. Instead of employing k-means clustering on MFCCs, we utilize the first VQ (vector quantization) layer of the codec.
2,16khz
Hey, first of all, great work!
Two things bug me though:
The text was updated successfully, but these errors were encountered: