-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Live audio streaming output #5077
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
🦄 change detectedThis Pull Request includes changes to the following packages.
With the following changelog entry.
Maintainers or the PR author can modify the PR title to modify this entry.
|
🎉 The demo notebooks match the run.py files! 🎉 |
All the demos for this PR have been deployed at https://huggingface.co/spaces/gradio-pr-deploys/pr-5077-all-demos You can install the changes in this PR by running: pip install https://gradio-builds.s3.amazonaws.com/ab0b17df50c938c4fa6ff0608805406025087774/gradio-3.39.0-py3-none-any.whl |
Tested this out and it works great @aliabid94 even with multiple outputs! However, I'm concerned about the fact that it uses a completely different mechanism for streaming, as compared to regular generator function including a separate Is this actually necessary? I don't have a concrete alternative right now, but among other things, this breaks the client for any route with streaming. You can try by running: import gradio as gr
import numpy as np
from pydub import AudioSegment
import time
def stream_audio(lag):
audio_file = 'test.mp3' # Your audio file path
audio = AudioSegment.from_mp3(audio_file)
chunk_length = 1000
chunks = []
while len(audio) > chunk_length:
chunks.append(audio[:chunk_length])
audio = audio[chunk_length:]
if len(audio): # Ensure we don't end up with an empty chunk
chunks.append(audio)
def iter_chunks():
for chunk in chunks:
file_like_object = chunk.export(format="mp3")
data = file_like_object.read()
time.sleep(lag)
yield data
return iter_chunks(), "fixed response"
demo = gr.Interface(
stream_audio,
gr.Slider(0, 3, 0, label="lag", info="Duration before generating next second of audio. >1s to cause lag."),
[gr.Audio(autoplay=True), gr.Textbox()]
)
if __name__ == "__main__":
_, url, _ = demo.launch() and then: from gradio_client import Client
client = Client(url)
result = client.predict(
0, # int | float (numeric value between 0 and 3) in 'lag' Slider component
api_name="/predict"
)
print(result) |
Also the user-facing API with having to return the generator is a little different than how Gradio users are used to generating/streaming. I would have expected something like this as the API (directly returning the generator, plus setting import gradio as gr
import numpy as np
from pydub import AudioSegment
import time
def stream_audio(lag):
...
for chunk in chunks:
file_like_object = chunk.export(format="mp3")
data = file_like_object.read()
time.sleep(lag)
yield data, "fixed response"
demo = gr.Interface(
stream_audio,
gr.Slider(0, 3, 0, label="lag", info="Duration before generating next second of audio. >1s to cause lag."),
[gr.Audio(autoplay=True, streaming=True), gr.Textbox()]
)
if __name__ == "__main__":
_, url, _ = demo.launch() |
Ok so here's an idea that doesn't fix everything above but I think would allow you to use the above developer API. Steps:
def stream_audio(lag):
...
for chunk in chunks:
file_like_object = chunk.export(format="mp3")
data = file_like_object.read()
time.sleep(lag)
yield data, "fixed response" and sets demo = gr.Interface(
stream_audio,
gr.Slider(0, 3, 0, label="lag", info="Duration before generating next second of audio. >1s to cause lag."),
[gr.Audio(autoplay=True, streaming=True), gr.Textbox()]
)
|
StreamingResponse requires a generator that only yields bytes. We could "wrap" the generator with another generator that tosses out all other outputs. However this will obviously ignore the intended user behaviour of setting the other outputs. There's no way we would be able to get access to the other outputs because we don't have access to the outputs as they are being yielded - only FastAPI does, so we can't send updates or anything with those outputs. |
Suppose you you wanted to create a demo that streamed music and also generated lyrics in realtime for the streaming music. That would not be possible with this API, correct? |
I think we need to do something like this:
In the def stream_until_complete():
chunks = pending_stream
chunk = None
index = 0
while not chunk == StopIteration:
yield chunk
if index >= len(chunks):
yield None
else:
chunk = chunks[index]
index += 1 (code may need to be tweaked but this is the general idea) Then you pass in The basic idea is that instead of directly passing in our generator function to StreamingResponse (which would mean we lose the other outputs as you said), here we use our generator function to populate a list (potentially even multiple lists if there are multiple streaming output components), and have a second generator that reads from that list which is passed into StreamingResponse. The benefits of this approach I believe would be to (1) allow developers to maintain an API they are familiar with (2) allow for use cases where you have multiple outputs streaming together |
Ok now I accept direct yielding from the function, see demo/stream_audio_out/ for an example. Ready for re-review @abidlabs |
$code_stream_frames | ||
|
||
Streaming can also be done in an output component. A `gr.Audio(streaming=True)` output component can take a stream of audio data yielded piece-wise by a generator function and combines them into a single audio file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put the stream_audio_out
example demo here (ideally after simplifying it a bit)
Here's a simplified demo you can use @aliabid94: import gradio as gr
from pydub import AudioSegment
import time
def stream_audio(audio_file, lag):
audio = AudioSegment.from_mp3(audio_file)
i = 0
chunk_size = 1000
while chunk_size*i < len(audio):
chunk = audio[chunk_size*i:chunk_size*(i+1)]
i += 1
if chunk:
file = f"/tmp/{i}.mp3"
chunk.export(file, format="mp3")
yield file, i
demo = gr.Interface(
fn=stream_audio,
inputs=[
gr.Audio(type="filepath", label="Audio file to stream"),
gr.Slider(0, 3, 0,
label="lag",
info="Duration before generating next second of audio. Set >1s to cause lag.",
),
],
outputs=[
gr.Audio(
autoplay=True,
streaming=True), # needed to stream output audio
gr.Textbox()
],
)
if __name__ == "__main__":
demo.queue().launch() |
Noticing some small issues:
Here's an adaption of the code above: import gradio as gr
from pydub import AudioSegment
import time
def stream_audio(audio_file, lag):
audio = AudioSegment.from_mp3(audio_file)
i = 0
chunk_size = 1000
while chunk_size*i < len(audio):
chunk = audio[chunk_size*i:chunk_size*(i+1)]
i += 1
if chunk:
file = f"/tmp/{i}.mp3"
chunk.export(file, format="mp3")
yield file, file
demo = gr.Interface(
fn=stream_audio,
inputs=[
gr.Audio(type="filepath", label="Audio file to stream"),
gr.Slider(0, 3, 0,
label="lag",
info="Duration before generating next second of audio. Set >1s to cause lag.",
),
],
outputs=[
gr.Audio(
autoplay=True,
streaming=True), # needed to stream output audio
gr.Audio(
autoplay=True,
streaming=True), # needed to stream output audio
],
)
if __name__ == "__main__":
demo.queue().launch()
|
Fixed.
I think it's because we were streaming 1 second chunks, which was too frequent. Increased to 3 second chunks in the demo and the breaks are much better imo. |
I think I still hear them in the 3 second, but its very minor so not a blocker imo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome PR @aliabid94!
🎉 Chromatic build completed! There are 0 visual changes to review. |
This PR allows users to stream audio out. See demo/streaming_audio_out for an example that streams out pieces of an audio file second by second.
Fixes: #5110