The details on HuBERT-General-Audio #7

vican9000 · 2024-09-18T13:54:25Z

Hey, first of all, great work!

Two things bug me though:

What's the semantic value of the HuBERT model you trained if it's using the first RVQ layer of the acoustic tokenizer? I.e. the acoustic model is already exposed to that.
What was the sampling rate of the input audio for the semantic model? Is it the same for the acoustic model?

zhenye234 · 2024-09-26T14:30:58Z

Thank you for your interest in our work.
1, Our training approach aligns with that of the HuBERT model, with a modification being the target of our acoustic unit discovery system. Instead of employing k-means clustering on MFCCs, we utilize the first VQ (vector quantization) layer of the codec.
2，16khz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The details on HuBERT-General-Audio #7

The details on HuBERT-General-Audio #7

vican9000 commented Sep 18, 2024

zhenye234 commented Sep 26, 2024

The details on HuBERT-General-Audio #7

The details on HuBERT-General-Audio #7

Comments

vican9000 commented Sep 18, 2024

zhenye234 commented Sep 26, 2024