-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AllTalk update Standard & Streaming generation #98
Conversation
Update AllTalk Integration * Added Standard Generation mode as an alternative to Streaming * Integrated RVC (voice conversion) support with voice selection * Added RVC pitch adjustment (-24 to +24) * RVC controls automatically disable when using Streaming mode * Standard generation mode set as default This now opens up Kobold to all the TTS engines that AllTalk supports, as well as the RVC/Voice2voice pipeline. So for example, you can use Piper TTS, which is very low on GPU/CPU RAM and resource, then use the RVC/Voice2voice pipeline to change the TTS output to sound like any RVC based voice you want. Full details here https://github.com/erew123/alltalk_tts/wiki/RVC-(Retrieval%E2%80%90based-Voice-Conversion) along with a link to 100,000+ voices. Streaming generation only works with the Coqui XTTS engine,
Line 4184 & 11656
Hi @erew123 Looking through your PR now. Seems like a lot has changed - at the moment it doesn't work but it's probably due to the ad-hoc audio element you created. But before I look through that, I wanna check some things Previously, AllTalk returns Raw audio data from the generate endpoint.
Is that no longer an option? I notice you now return a URL to a remote resource on server
I'm wondering if you might instead consider a param to the generation request like
so a second download is not needed? Happy to discuss further. |
I got the non-streaming mode working correctly, and now handle responses from both v1 and v2 based on returned content type. However, the streaming is still not working - i could not get even the official colab example to stream. Seeing a lot of error 502 responses. Is there a simple setup I can do to get the streaming demo working? |
In XTTS API server streaming is also not possible remotely as it then plays it trough local speakers. |
It's XTTS powered but the server is different. so it might work? Not sure yet Additionally, once it crashes it wont start again
|
…t gets blocked by autoplay plugins)
Alright I got the streaming endpoint working - but it seems like it doesn't really make much difference - the audio still needs to be fully generated on the backend before the streaming even starts. Anyway, I will add the streaming endpoint in but buffer it synchronously first. |
Hi @LostRuins I had to travel 100+ miles last night due to a break-in which I am dealing with and hence a little caught up. That aside, let me do what I can to answer questions. The streaming setup should have worked for both AllTalk V1 and V2... at least it did when I was testing Locally. I can re-test in a bit. Streaming can be funny on Colab at times, which I think is something to do with cloudflare tunnels and as mentioned, streaming only works with the XTTS engine. Also some browsers are incapable of dealin with the PCM stream e.g. firefox, so will not work when getting a streaming response. Brave & Chrome should be fine. There is a basic HTML test page available on the AllTalk API address Both Streaming and Standard AllTalk protocols, with code examples are listed here https://github.com/erew123/alltalk_tts/wiki#-api-documentation As for how long WAV's will remain on the AllTalk server, that will depend on what duration someone sets to auto delete older audio outputs: Default deletion is Disabled. The difference between the AllTalk V1 and V2 protocol is that on a Standard reposnse, the protocol/IP/Port is not returned in tbe response:
I will get back to you at some point, but I currently have to go deal with police, window companies, insruance etc... :/ so will take another look ASAP/when I am free to do so. (Sorry) |
oof take care, no rush on this. i'll merge first, we can always review again as needed. |
Update AllTalk Integration
This now opens up Kobold to all the TTS engines that AllTalk supports, as well as the RVC/Voice2voice pipeline. So for example, you can use Piper TTS, which is very low on GPU/CPU RAM and resource, then use the RVC/Voice2voice pipeline to change the TTS output to sound like any RVC based voice you want. Full details here https://github.com/erew123/alltalk_tts/wiki/RVC-(Retrieval%E2%80%90based-Voice-Conversion) along with a link to 100,000+ voices.
Streaming generation only works with the Coqui XTTS engine,
I saw 2x changes that were different from the "lite" DEV branch VS my index.html so I matched those changes in a second commit to my fork. AKA, these changes below are now exactly as they should be to match your dev branch updates and noted this change on this commit erew123@c7c02a4 Image below of what was incorrect/not matching and so I change those to be correct/back to what they should be. (god I hope that makes sense).