Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTMOS上比GT还高? #1

Open
lastapple opened this issue Sep 2, 2024 · 5 comments
Open

UTMOS上比GT还高? #1

lastapple opened this issue Sep 2, 2024 · 5 comments

Comments

@lastapple
Copy link

震惊,难道GT还不如生成的?
image

@zhenye234
Copy link
Owner

Thank you for your interest and observation!
It's becoming more common for synthesized results to outperform ground truth in TTS. For instance, models like NS2, StyleTTS2, and VALL-E 2 have all shown instances where the generated outputs surpass the original recordings. Instead, our approach focus on the shortcomings of existing codecs, significantly enhancing the TTS performance.

Moreover, we've open-sourced the codec's checkpoint, making it easy for you to replicate our experiments using VALL-E (https://github.com/lifeiteng/vall-e). I've also just upload my VALL-E results for you to listen to (https://drive.google.com/file/d/1irlGr-5fpnPwIzHMkMTGbU5T3OpiPsIS/view?usp=sharing). I look forward to your thoughts and further discussion.

@patriotyk
Copy link

It sounds really fantastic. As I understand it can be also used with StyleTTS2? Do you have an example how it could be applied?

@zhenye234
Copy link
Owner

Thank you for your question! StyleTTS2 is trained end-to-end, so it might be challenging to apply our approach directly. For non-autoregressive (NAR) TTS models like NS2, our method might be more applicable, but I'm not sure if it will work. It would be interesting to explore whether unifying semantic and acoustic representations could further improve NAR audio generation models.

@patriotyk
Copy link

So as I understand the biggest problem in StyleTTS2 is vocoder? But maybe it could be replaced with codes based one?

@zhenye234
Copy link
Owner

You're right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants