Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable AudioWorklet process block size (higher than 128 samples)? #1503

Closed
josh83abc opened this issue Feb 23, 2018 · 16 comments
Closed

Comments

@josh83abc
Copy link

josh83abc commented Feb 23, 2018

Hello everybody!

As far as I see, the AudioWorkletProcessor process block size is 128 samples, like any AudioNodes.

I haven't really tested the robustness of the audio stream but this value seems to be pretty low to me if you compare it with what you see in desktop music softwares (process block size is usually more than 512 samples and can easily be 1024 or 2048 samples).

I don't know if it is related but I can hear frequent tiny audio glitches with the AudioWorklet sinus demo when I switch from tab to tab. https://googlechromelabs.github.io/web-audio-samples/audio-worklet/basic/hello-audio-worklet.html

Also in my app, the audio latency is not the most important aspect, so I prefer to have a higher latency allowing a better robust audio stream.

Would it be possible to change the AudioWorkletProcessor process block size in the future? Is there a workaround?

PS : this post also talks about it : #1466

@hoch
Copy link
Member

hoch commented Feb 23, 2018

The internal buffering (e.g. FIFO) in the processor might be the way to resolve the buffer size difference. Even if we change the spec to accommodate the variable buffer size, the implementation will do the buffering internally anyway because the other parts of WebAudio use 128 frames.

Having the same render quantum size is the key to the lower latency, and it was the one of design goals of AudioWorklet. I doubt that WG will change something fundamental like this now, but it can be revisited later for the V2.

@hoch
Copy link
Member

hoch commented Feb 23, 2018

Also the glitch is an implementation issue. If you have a repro case, please file a bug and cc me (hongchan@).

@sletz
Copy link

sletz commented Feb 23, 2018

@josh83abc
Copy link
Author

Thanks @sletz for this link!! lot of very interesting stuff in it. Avoid audio glitches is the main priority of my music player, so I'm very interested in understanding all the details.

@hoch, thanks a lot for all these technical details. I'm aware that all the webaudio graph is processed with a render quantum of 128 samples and so does the AudioWorkletProcessor. For now I didn't really tested the audio robustness of the webaudio graph rendering, it is pretty good for sure but I will try to reach its limits. Also I will test on iOS and Android. Tell me if I'm wrong but I don't think a FIFO on top of the AudioWorkletProcessor will improve the robustness that much.

For the V.next, it would be interesting if the webaudio render quantum size could be ajusted by the programmer who doesn't need a very low latency. For instance, a render quantum of 1024 samples ([email protected]) would be way enough for my application. Actually Flash is using this value to process audio, the latency is ok and there is no audio glitch at all, even if Chrome crashes.

@padenot
Copy link
Member

padenot commented Feb 26, 2018

The internal block size of the Web Audio rendering graph provides an lower bound for the OS buffer size, not an upper bound (and even then, it's not strictly true).

The Web Audio API already has a mechanism to request a higher latency on an AudioContext, using AudioContextLatencyCategory.

Please open issues on the UA's bug tracker for issues about implementations.

@josh83abc
Copy link
Author

josh83abc commented Feb 26, 2018

Thanks a lot @padenot for letting me aware of the latencyHint option of the AudioContext!! I totally missed it when I read the spec, but this exactly what I was talking about : having an option to ajust the audio latency according the need of the app.
Amazing it already exists!! webaudio is dope :)

@rtoy
Copy link
Member

rtoy commented Feb 26, 2018

Based on #1503 (comment), I think we can close out this issue. I don't think there's anything that needs to be done for the spec.

@padenot padenot closed this as completed Feb 27, 2018
@fr0m
Copy link

fr0m commented Mar 23, 2018

Back then using ScriptProcessorOrNode, buffer size controls how frequently the audioprocess event is dispatched, well latencyHint doesn't affect that.

Since the AudioWorkletProcessor process buffer size is 128, the frequency of process is triggered is overwhelmingly higher than using ScriptProcessorOrNode with 1024 buffer size or higher.

This situation may cause high CPU occupation since expensive operation may be executed in the process, which is the situation in my project. So do we have another way to control the frequency of process triggered?

@padenot
Copy link
Member

padenot commented Mar 23, 2018

This situation may cause high CPU occupation since expensive operation may be executed in the process, which is the situation in my project. So do we have another way to control the frequency of process triggered?

No. If it's too expensive when computing 128 frames, why is it OK when computing 1024 frames? You exactly have the same amount of time per frame to compute the audio.

You probably simply need to optimize your code.

@sletz
Copy link

sletz commented Mar 23, 2018

Because running the complete audio chain adds a fixed processing cost per buffer. When smaller buffers are used, the fixed cost is added more times, so finally takes more of the available CPU.

@fr0m
Copy link

fr0m commented Mar 26, 2018

Thanks for the reply.

We use audioWorklet to do audio live stream, encoded audioBuffer is sent via WebSocket in AudioWorkletProcessor. It's expensive because of the frequency, not frame.

I can cache the buffer, and send the buffer in a particular buffer size. But it will be much better if the unnecessary process will not be triggered at all. Will you take this situation into account?

@positonic
Copy link

@sletz that benchmark link for glitches is dead, has it moved somewhere?

@sletz
Copy link

sletz commented Mar 2, 2019

Which link ?

@thedracle
Copy link

thedracle commented Dec 14, 2022

Was this closed with a resolution of some kind?

I have a situation where I have a trained model that works on buffers that are at a minimum 512 frames, or multiples of it.

The processing time isn't the issue, but the network was trained/optimized for this frame size, and doesn't seem to be an easy to way to match it with the demand that all processing be done on 128 sample chunks.

Is it acceptable to set the latencyHint to "playback" and to block the callback until I've collected 512 frames, processed them, and then feed my collected frames out one at a time?

@padenot
Copy link
Member

padenot commented Dec 15, 2022

Was this closed with a resolution of some kind?

This was in fact about latency, and was closed because of this: #1503 (comment). We didn't update the issue's title, maybe we should have.

I have a situation where I have a trained model that works on buffers that are at a minimum 512 frames, or multiples of it.

The processing time isn't the issue, but the network was trained/optimized for this frame size, and doesn't seem to be an easy to way to match it with the demand that all processing be done on 128 sample chunks.

Is it acceptable to set the latencyHint to "playback" and to block the callback until I've collected 512 frames, processed them, and then feed my collected frames out one at a time?

There are three things you can do. Two you can do now, and one you'll be able to do later next year. It's not a huge effort to do the first two, but the best solution depends on the situation:

  • First, you can internally buffer 384 frames: you accumulate 384 frames of input, in the first 3 callbacks, and then on the 4th callback, when you've had 512 frames of input, you can run your model. This induces 3 blocks of latency, maybe this is acceptable for your use-case. This works well if your model runs in less than 128 / sampleRate seconds. You can measure this easily, https://blog.paul.cx/post/profiling-firefox-real-time-media-workloads/ has instructions.
  • Otherwise, if your model takes more than 128 / sampleRate seconds to execute, you can offload the computation to a Web Worker, by sending from the worklet to a Web Worker using a ring buffer. This is explained in https://blog.paul.cx/post/a-wait-free-spsc-ringbuffer-for-the-web/, the repo has two examples: efficiently sending audio to a Web Worker, and back to an AudioWorkletProcessor. If you don't need to play the audio out, then this is the solution I'd recommend (you can skip sending the audio back to the AudioWorkletProcessor in this case.
  • Finally, sometime next year, we'll merge and implement a feature to change the block size of an AudioContext, in which case you'll be able to specify 512, and you'll get buffers with 512 frames in your AudioWorkletProcessor. The specification part is done, but it's a big change in the implementations, so we've delayed merging the specification text for clarity.

In any case, you can never "block the callback", this will cause problems, such as demoting the real-time audio thread from real-time priority to regular priority, causing all sorts of glitches and problems. But if you're not playing the audio out anyways, you can set the latencyHint to "playback", and it might save some power (depending on the OS and implementation).

@thedracle
Copy link

@padenot Awesome, thanks for the very detailed response! It's extremely helpful.

I'm pleased to see the final resolution, and the latency introduction is unfortunately unavoidable given the constraints of the ML model we are using, and we are willing to accept it.

I will try the first suggestion for now, and I look forward to the new feature to be able to change the block size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants