AllTalk update Standard & Streaming generation #98

erew123 · 2024-12-13T17:34:59Z

Update AllTalk Integration

Added Standard Generation mode as an alternative to Streaming
Integrated RVC (voice conversion) support with voice selection
Added RVC pitch adjustment (-24 to +24)
RVC controls automatically disable when using Streaming mode
Standard generation mode set as default

This now opens up Kobold to all the TTS engines that AllTalk supports, as well as the RVC/Voice2voice pipeline. So for example, you can use Piper TTS, which is very low on GPU/CPU RAM and resource, then use the RVC/Voice2voice pipeline to change the TTS output to sound like any RVC based voice you want. Full details here https://github.com/erew123/alltalk_tts/wiki/RVC-(Retrieval%E2%80%90based-Voice-Conversion) along with a link to 100,000+ voices.

Streaming generation only works with the Coqui XTTS engine,

I saw 2x changes that were different from the "lite" DEV branch VS my index.html so I matched those changes in a second commit to my fork. AKA, these changes below are now exactly as they should be to match your dev branch updates and noted this change on this commit erew123@c7c02a4 Image below of what was incorrect/not matching and so I change those to be correct/back to what they should be. (god I hope that makes sense).

Update AllTalk Integration * Added Standard Generation mode as an alternative to Streaming * Integrated RVC (voice conversion) support with voice selection * Added RVC pitch adjustment (-24 to +24) * RVC controls automatically disable when using Streaming mode * Standard generation mode set as default This now opens up Kobold to all the TTS engines that AllTalk supports, as well as the RVC/Voice2voice pipeline. So for example, you can use Piper TTS, which is very low on GPU/CPU RAM and resource, then use the RVC/Voice2voice pipeline to change the TTS output to sound like any RVC based voice you want. Full details here https://github.com/erew123/alltalk_tts/wiki/RVC-(Retrieval%E2%80%90based-Voice-Conversion) along with a link to 100,000+ voices. Streaming generation only works with the Coqui XTTS engine,

Line 4184 & 11656

…l event

LostRuins · 2024-12-14T02:33:18Z

Hi @erew123

Looking through your PR now. Seems like a lot has changed - at the moment it doesn't work but it's probably due to the ad-hoc audio element you created. But before I look through that, I wanna check some things

Previously, AllTalk returns Raw audio data from the generate endpoint.

fetch(localsettings.saved_alltalk_url + alltalk_gen_endpoint, {
method: 'POST',
body: formData, // send payload as FormData
})
.then(response => response.arrayBuffer())
.then(data => {
return audioContext.decodeAudioData(data); //this is raw data
})

Is that no longer an option? I notice you now return a URL to a remote resource on server
"output_file_path": "/content/alltalk_tts/outputs/audiofile_173413884301c0b.wav",

To me that seems rather risky, since the API is no longer stateless. How long does that .wav stay valid for? Will the download expire? If it doesn't, won't that clutter AllTalk with old wav files from previous generations?
You used the exact same API endpoint, does that mean you are breaking backwards compatibility with all clients using prior versions of AllTalk? This also means that once Lite is updated, users running old versions locally will suddenly see it break.
Also seems like a potential privacy risk. Anyone with the url can download anyone else's audio. And the files will potentially persist on server side

I'm wondering if you might instead consider a param to the generation request like stateless: true where either

it returns the raw wav data in the same request instead
it returns a JSON payload containing the audio as a base64 encoded representation

so a second download is not needed? Happy to discuss further.

LostRuins · 2024-12-14T09:48:48Z

I got the non-streaming mode working correctly, and now handle responses from both v1 and v2 based on returned content type.

However, the streaming is still not working - i could not get even the official colab example to stream. Seeing a lot of error 502 responses. Is there a simple setup I can do to get the streaming demo working?

henk717 · 2024-12-14T09:55:11Z

In XTTS API server streaming is also not possible remotely as it then plays it trough local speakers.

LostRuins · 2024-12-14T09:56:39Z

It's XTTS powered but the server is different. so it might work? Not sure yet

Additionally, once it crashes it wont start again

ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/content/alltalk_tts/tts_server.py", line 201, in startup_shutdown
    await model_engine.setup()
  File "/content/alltalk_tts/system/tts_engines/vits/model_engine.py", line 170, in setup
    self.available_models = self.scan_models_folder()
  File "/content/alltalk_tts/system/tts_engines/vits/model_engine.py", line 319, in scan_models_folder
    print(f"[{self.branding}ENG] \033[91mWarning\033[0m: Model folder '{model_name}' is missing required")
UnboundLocalError: local variable 'model_name' referenced before assignment

ERROR:    Application startup failed. Exiting.

…t gets blocked by autoplay plugins)

LostRuins · 2024-12-14T10:46:29Z

Alright I got the streaming endpoint working - but it seems like it doesn't really make much difference - the audio still needs to be fully generated on the backend before the streaming even starts.

Anyway, I will add the streaming endpoint in but buffer it synchronously first.
Please review the final version and I will merge this. This should correctly support alltalk v1 and v2.

erew123 · 2024-12-14T10:46:55Z

Hi @LostRuins I had to travel 100+ miles last night due to a break-in which I am dealing with and hence a little caught up.

That aside, let me do what I can to answer questions.

The streaming setup should have worked for both AllTalk V1 and V2... at least it did when I was testing Locally. I can re-test in a bit. Streaming can be funny on Colab at times, which I think is something to do with cloudflare tunnels and as mentioned, streaming only works with the XTTS engine. Also some browsers are incapable of dealin with the PCM stream e.g. firefox, so will not work when getting a streaming response. Brave & Chrome should be fine. There is a basic HTML test page available on the AllTalk API address

Both Streaming and Standard AllTalk protocols, with code examples are listed here https://github.com/erew123/alltalk_tts/wiki#-api-documentation

As for how long WAV's will remain on the AllTalk server, that will depend on what duration someone sets to auto delete older audio outputs:

Default deletion is Disabled.

The difference between the AllTalk V1 and V2 protocol is that on a Standard reposnse, the protocol/IP/Port is not returned in tbe response:

- **AllTalk v2 API (Recommended):**
    - Returns relative file paths only
    - More flexible for different deployments
    - Example response: 
        ```json
        {
            "output_file": "/outputs/tts_output.wav",
            "status": "success"
        }
        ```
    - Best for:
        - New integrations
        - Modern web applications
        - Flexible deployment environments
        - Container-based systems

- **AllTalk v1 API (Legacy):**
    - Returns complete URLs with protocol and IP
    - Maintains older integration support
    - Example response:
        ```json
        {
            "output_file": "http://127.0.0.1:7851/outputs/tts_output.wav",
            "status": "success"
        }
        ```
    - Best for:
        - Existing integrations
        - Systems requiring full URLs
        - Direct file access needs
        - Backward compatibility

I will get back to you at some point, but I currently have to go deal with police, window companies, insruance etc... :/ so will take another look ASAP/when I am free to do so. (Sorry)

LostRuins · 2024-12-14T14:46:58Z

oof take care, no rush on this. i'll merge first, we can always review again as needed.

erew123 added 2 commits December 13, 2024 17:21

AllTalk match PR to current Dev branch

c7c02a4

Line 4184 & 11656

erew123 mentioned this pull request Dec 13, 2024

Re AllTalk & updating the API LostRuins/koboldcpp#1249

Open

LostRuins added the enhancement New feature or request label Dec 14, 2024

changed global dom events to direct function calls from the ui contro…

b1c1f7d

…l event

fixed functionality of non-streaming to handle both alltalk v1 and v2

5edcd34

LostRuins added the help wanted Extra attention is needed label Dec 14, 2024

support streaming endpoint without creating a media player (since tha…

6fd0bbc

…t gets blocked by autoplay plugins)

LostRuins approved these changes Dec 14, 2024

View reviewed changes

LostRuins merged commit 6e5f22f into LostRuins:dev Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AllTalk update Standard & Streaming generation #98

AllTalk update Standard & Streaming generation #98

erew123 commented Dec 13, 2024

LostRuins commented Dec 14, 2024 •

edited

Loading

LostRuins commented Dec 14, 2024

henk717 commented Dec 14, 2024

LostRuins commented Dec 14, 2024 •

edited

Loading

LostRuins commented Dec 14, 2024

erew123 commented Dec 14, 2024

LostRuins commented Dec 14, 2024

AllTalk update Standard & Streaming generation #98

AllTalk update Standard & Streaming generation #98

Conversation

erew123 commented Dec 13, 2024

LostRuins commented Dec 14, 2024 • edited Loading

LostRuins commented Dec 14, 2024

henk717 commented Dec 14, 2024

LostRuins commented Dec 14, 2024 • edited Loading

LostRuins commented Dec 14, 2024

erew123 commented Dec 14, 2024

LostRuins commented Dec 14, 2024

LostRuins commented Dec 14, 2024 •

edited

Loading

LostRuins commented Dec 14, 2024 •

edited

Loading