Realtime/Fastest way to generate stable voice audio locally #882
Unanswered
lev-laptinov
asked this question in
Q&A
Replies: 1 comment 1 reply
-
Can you specify which API endpoint you are using? Because http://LINK/v1/tts has "Content-Type": "application/msgpack". Also, is the inference speed you mentioned measured after generating audio from a chunk or before? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to play generated audio as fast as it possible from my pod, i'm using runpod.io where i run docker image with github repo with start comand
python tools/api_server.py --llama-checkpoint-path checkpoints/fish-speech-1.5 --decoder-checkpoint-path checkpoints/fish-speech-1.5/firefly-gan-vq-fsq-8x1024-21hz-generator.pth --listen 0.0.0.0:8080 --compile
Then i run the request to this pod like:
As i understand there is no other option to get stable voice apart from use reference audio
Abour streaming as i understand it returns generated chunks as it are generated:
I've tried to use it using:
but i didn't get any difference, it starts playing the same time as the whole audio is written
I also used use_memory_cache, it gives increase in speed
I have also tried to finetune it, i increased the time from 6 to 4 secs for audio ~13sec
Now when i run it on 2xRTX 4090 with --compile i get smth ~10sec audio per ~3sec
So like main enhancement as i think is streaming, is it possible to stream audio?
Also maybe i use or understand smth wrong?
Beta Was this translation helpful? Give feedback.
All reactions