Suggestion: support GPT-SoVITS as TTS (Fast voice clone - so users can talk to his/her favorite voice other than generic AI voice). #92

insufficient-will · 2024-09-07T10:36:38Z

Congratulate and many thanks first! I think the project has great potential into becoming a popular foundation.
If you deem appropriate, would you support GPT-SoVITS as well?

I know there has already been lots of TTS support so far, but GPT-SoVITS has something different. It allows users to clone his/her favorite voice in a very efficient way.

Talking to AI is inspiring, but enjoying response from a particular voice is what intrigues people, and it could be one of the ultimate goals when people are willing to talk to a machine. GPT-SoVITS can do a decent voice clone with a few clips in a few minutes, thus making it an ideal addition to the existing TTS solutions.

Best wishes!

rs545837 · 2024-09-07T15:58:09Z

Did you ever take a look at StyleTTS2?

insufficient-will · 2024-09-08T07:21:31Z

It looks promising. I am in dire need for voice clone and multilanguage support. Here is a supplement of the issue.

Use scenario
I am making AI voiced audio books and RAG. My audience is a bunch of Third-person Shooter Gacha gamers (Snowbreak). I will clone characters' voice which I will use in either voicing a book or responding to a question.

The TTS has to excel in voice clone. A pre-trained voice won't do because every audience don't want that voice, they need his/her particularly favorite ones.

And the TTS should support multilanguage scenarios, especially Chinese, English, Italian (the game has a heated character with Italian background) and if possible, Hindi (for an AI bot - I don't know why a bot is popular in a Gacha game, but it happens)

To expand this topic a bit. For professional use cases, like medicine consulting, a pre-trained voice will do, because the key is not the voice, but the accuracy of the content. But for everyday use cases, emotional engagement comes in. It won't limit to Gacha game.

Limitation
Amount of voice clone training datasets.
Training hardware requirement and time consumption.
The fewer the better.

Current Solution
GPT-SoVITS. Can do a decent clone with 10-50 clips, 3-10 seconds each, in 10 minutes (RTX 3090). But not perfect yet, explained below.

Current options
Voice clone quality: everyone claims its best. I don't judge. But I've tied with some available methods, they don't come close to my current solution.
CN support: ChatTTS, Melo, GPT-Sovits OK. Parler Not OK.
EN support: Of course all are OK.
Italian and Hindi: Of course none is OK.

It looks like StyleTTS2 could be my savior after all.

Did you ever take a look at StyleTTS2?

andimarafioti · 2024-09-09T14:58:11Z

Hey, I would be more than ok adding support for this TTS. If you want to do it I think it would be cool, I would review it 👍

We are still discussing a bit where to take this library next, thank you for sharing your ideas!

insufficient-will · 2024-09-10T08:38:59Z

Hey, I would be more than ok adding support for this TTS. If you want to do it I think it would be cool, I would review it 👍

We are still discussing a bit where to take this library next, thank you for sharing your ideas!

Right now I am using silly tavern, kobold, and GPT-Sovits to do a kind of speech-to-speech (with the voice I cloned). But it's slow even on a 3090, maybe 4090 can do better? I have tried this HF speech to speech on mac, it is a much better experience. Wherever you are heading, may fortune favor your path.

PaParaZz1 · 2024-10-08T09:40:52Z

Thanks for this awesome project. Based on the similar pipeline, we have released a Chinese Speech-to-Speech project named CleanS2S, supporting more interesting and streaming interactions.

Here is a snapshot of this project:

Looking forward to more advices and feedbacks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: support GPT-SoVITS as TTS (Fast voice clone - so users can talk to his/her favorite voice other than generic AI voice). #92

Suggestion: support GPT-SoVITS as TTS (Fast voice clone - so users can talk to his/her favorite voice other than generic AI voice). #92

insufficient-will commented Sep 7, 2024

rs545837 commented Sep 7, 2024

insufficient-will commented Sep 8, 2024 •

edited

Loading

andimarafioti commented Sep 9, 2024

insufficient-will commented Sep 10, 2024

PaParaZz1 commented Oct 8, 2024

Suggestion: support GPT-SoVITS as TTS (Fast voice clone - so users can talk to his/her favorite voice other than generic AI voice). #92

Suggestion: support GPT-SoVITS as TTS (Fast voice clone - so users can talk to his/her favorite voice other than generic AI voice). #92

Comments

insufficient-will commented Sep 7, 2024

rs545837 commented Sep 7, 2024

insufficient-will commented Sep 8, 2024 • edited Loading

andimarafioti commented Sep 9, 2024

insufficient-will commented Sep 10, 2024

PaParaZz1 commented Oct 8, 2024

insufficient-will commented Sep 8, 2024 •

edited

Loading