Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on AudioWorklets API #18853

Open
mackron opened this issue Feb 25, 2023 · 13 comments
Open

Feedback on AudioWorklets API #18853

mackron opened this issue Feb 25, 2023 · 13 comments

Comments

@mackron
Copy link

mackron commented Feb 25, 2023

With the newly added support for AudioWorklets to Emscripten I've been adding support for it to my audio library. I just had a few things that I thought I'd bring up in case it might be of interest to the maintainers.

Is there no function for suspending an AudioContext? There is a emscripten_resume_audio_context_sync() but there's no emscripten_suspend_audio_context_sync(). Is this an oversight? As it stands now I need to manually write some embedded JavaScript for suspending.

I don't like that the buffer size seems to be hard coded to 128. The MDN documentation says the following (https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Using_AudioWorklet#the_input_and_output_lists):

By specification, each block of audio your process() function receives contains 128 frames (that is, 128 samples
for each channel), but it is planned that this value will change in the future, and may in fact vary depending on
circumstances, so you should always check the array's length rather than assuming a particular size.

Personally, I'd like the see a variable be used to carry this value, with the Emscripten documentation only ever referring to said variable rather than explicitly mentioning the 128 magic number. Is the Emscripten implementation currently robust enough to handle cases when the browser might report a value other than 128 as indicated by the MDN comment above? If the Emscripten implementation is indeed hard coded to 128, even if it's specified as such by the WebAudio spec, I would consider this to be very bad practice, and in my opinion needs to be addressed.

This is a bit more of a nit-picky thing, and I'm not sure how feasible this would be in practice, but it would be really nice to have synchronous versions of async functions. My library implements multiple backends across all the major platforms, and the abstraction requires that everything be synchronous, which I think is reasonable and a fairly common scenario. It gets annoying having to link to -sASYNCIFY and doing emscripten_sleep() in a loop while it waits for initialization to complete. What I was hoping for was something like this:

EM_BOOL result = emscripten_start_wasm_audio_worklet_thread_sync(audioContext, pStackBuffer, stackSize);
if (result == EM_FALSE) {
    // Error.
}

And the same with emscripten_create_wasm_audio_worklet_processor_async():

EM_BOOL result = emscripten_create_wasm_audio_worklet_processor_sync(audioContext, &workletProcessorOptions);
if (result == EM_FALSE) {
    // Error.
}

In my opinion this is just so much cleaner than having to fluff around with callbacks and running a polling loop while the calling thread waits for everything to complete.

That's all I had for the moment. Thanks a lot for your work on this AudioWorklets stuff. Overall it seems to work really well!

@juj
Copy link
Collaborator

juj commented Feb 25, 2023

Is there no function for suspending an AudioContext? There is a emscripten_resume_audio_context_sync() but there's no emscripten_suspend_audio_context_sync(). Is this an oversight? As it stands now I need to manually write some embedded JavaScript for suspending.

The resume function was added mainly to enable tests to resume the context. There was no design to add a full-fledged API for all Web Audio functionality over to C side.

That being said, it would make sense to mirror the resume function with a suspend, so this could be expanded.

Note that it is a non-goal to enable writing fully C-based wasm audio worklets code without needing to hand-write any JS code. This is because the node-based Web Audio graph API is so vast that bringing it fully to C side would be a large undertaking. So if you need to drop over to JS side to achieve something, then that is ok. The function emscriptenGetAudioObject() was added for this purpose to enable grabbing the context in JS side.

I don't like that the buffer size seems to be hard coded to 128. The MDN documentation says the following

Note that documentation != specification. I don't know why the documentation would make such a forward-looking statement over the future of the Web Audio specification.

This design issue to track changing the render quantum ( https://www.w3.org/TR/webaudio/#rendering-loop ) is at WebAudio/web-audio-api#2450 , but this progress has stalled for more than a year, so it is unclear if that will land.

If the above feature lands, the quantum size will not change underneath the user, but based on the conversation there, the user must opt-in to choosing the quantum size.

Also the quantum size will be a hint, so it will not necessarily provide the quantum that is requested: https://github.com/WebAudio/web-audio-api/pull/2469/files#diff-5e793325cd2bfc452e268a4aa2f02b4024dd9584bd1db3c2595f61f1ecf7b985R2139

Posted a comment at WebAudio/web-audio-api#2450 (comment) to clarify if the direction is still current.

Is the Emscripten implementation currently robust enough to handle cases when the browser might report a value other than 128 as indicated by the MDN comment above?

This cannot happen, if the Web Audio implementation is up to the spec. When I was implementing this, I did not want to anticipate the API with changes that may not happen. Examples mentioned that one can assume hardcoded 128 render quantum, since that is what Web Audio has been built on top of, hence the same design in the current callback.

However, assuming there is a strong positive consensus in the Web Audio community that the quantum size change will occur, then it will make sense to update the API. Doing a breaking change to this Emscripten presented API side will be ok, since this feature is still so fresh.

It gets annoying having to link to -sASYNCIFY and doing emscripten_sleep() in a loop while it waits for initialization to complete.

Currently Wasm Audio Worklets do not yet support -sASYNCIFY, but adding that compatibility is a good idea.

With that update, we could avoid the need to manually call emscripten_sleep(), by providing _sync() versions of the functions, but those sync versions would still require linking with -sASYNCIFY. I am not sure if that will help in your case?

@mackron
Copy link
Author

mackron commented Feb 25, 2023

Note that it is a non-goal to enable writing fully C-based wasm audio worklets code without needing to hand-write any JS code.

Yep, understood. It just threw me a bit when I saw a resume function but no corresponding suspend function. Just thought I'd mention it in case it was an accidental oversight.

Note that documentation != specification. I don't know why the documentation would make such a forward-looking statement over the future of the Web Audio specification.

Totally understand that you would have based your decisions on the specification. However, I would argue that the MDN is actually making a more responsible statement over the Web Audio specification in this particular case. Of all the audio backends I support in my library, about 14 or so over all the major platforms, Web Audio is the only one I've seen that uses the concept of hard coded specification-defined processing size. All I'm saying is that I don't think the Emscripten API and documentation should be encouraging the use of a hard coded 128 processing size. I think a parameter should be passed into the processing callback, even if it's fixed to 128 in practice, and then have the documentation refer to that parameter.

Thanks for pointing out that Web Audio GitHub issue. I 100% agree with that proposal. I'm honestly surprised that wasn't brought up right from the start when the spec was initially being written.

With that update, we could avoid the need to manually call emscripten_sleep(), by providing _sync() versions of the functions, but those sync versions would still require linking with -sASYNCIFY. I am not sure if that will help in your case?

That would still be a big improvement. Currently I'm doing this whole song and dance which apart from being messy, is just error prone (this is just my initial exploratory implementation - there's likely a cleaner way of doing it):

pDevice->webaudio.isInitialized = MA_FALSE;

emscripten_start_wasm_audio_worklet_thread_async(...);

while (pDevice->webaudio.isInitialized == MA_FALSE) {
    emscripten_sleep(1);
}

I just know that I'll make some change in one of the callbacks and do an early return or something and forget to set the isInitialized flag. If I still need to link with -sASYNCIFY so be it. Not the end of the world. My situation is just a bit of a unique one because my library is designed to be added straight to a project's source tree, so any complications added to the build system have flow on effects for my users. (Just a side a note, an in-code way to specify linking with -sASYNCIFY, similar to MSVC's #pragma comment(linker, ...) would be a great feature.)

@padenot
Copy link

padenot commented Feb 27, 2023

Of all the audio backends I support in my library, about 14 or so over all the major platforms, Web Audio is the only one I've seen that uses the concept of hard coded specification-defined processing size.

[...]

Thanks for pointing out that Web Audio GitHub issue. I 100% agree with that proposal. I'm honestly surprised that wasn't brought up right from the start when the spec was initially being written.

Just a quick historical note as one of the Web Audio API specification editor: I think it's fair to say that we all agree with you, but the Web Audio API spec we inherited back in 2012-2013 hard coded everything to 128, was already shipping in a major browser, and we've been busy actually writing a spec that's a bit more than a user manual, and also doing AudioWorklet along with quite a number of other useful bits since then. You could say that our roadmap, as far as specification work was concerned, was mostly driven by compatibility rather than performance.

When it was clear that AudioWorklet was performant enough, with real real-time threads and all, and as the web platform evolved with WASM, and SIMD, and atomics, etc. we looked into the next important thing we could do to improve performance, and buffer size it was. It's going to land in the spec and in implementation, we just reordered our roadmap and landed audio device selection first because it was easier to implement, very useful, and we didn't want to have something in the spec for ages that nobody had implemented in practice.

Hope this helps clearing up any uncertainties, and we're around at https://github.com/webaudio/web-audio-api/issues if anybody needs anything or has any idea of new APIs or things to modify to make things better (including for things like Emscripten or other advanced WASM usage, we've a few things planned already). And we're all certainly very excited about the recent landing of the big Emscripten branch that makes using AudioWorklet a lot easier!

@padenot
Copy link

padenot commented Feb 27, 2023

With that update, we could avoid the need to manually call emscripten_sleep(), by providing _sync() versions of the functions, but those sync versions would still require linking with -sASYNCIFY. I am not sure if that will help in your case?

That would still be a big improvement. Currently I'm doing this whole song and dance which apart from being messy, is just error prone (this is just my initial exploratory implementation - there's likely a cleaner way of doing it):

pDevice->webaudio.isInitialized = MA_FALSE;

emscripten_start_wasm_audio_worklet_thread_async(...);

while (pDevice->webaudio.isInitialized == MA_FALSE) {
    emscripten_sleep(1);
}

I just know that I'll make some change in one of the callbacks and do an early return or something and forget to set the isInitialized flag. If I still need to link with -sASYNCIFY so be it. Not the end of the world. My situation is just a bit of a unique one because my library is designed to be added straight to a project's source tree, so any complications added to the build system have flow on effects for my users. (Just a side a note, an in-code way to specify linking with -sASYNCIFY, similar to MSVC's #pragma comment(linker, ...) would be a great feature.)

Now on a technical note: it's unlikely that the Web Audio API will ever provide a synchronous constructor for AudioContext and other objects, because it's all implemented with asynchronous message passing, often with lock free primitives, underneath.

But it's perfectly conceivable to write a wrapper that does the synchronous bits for you, in a way that's cleaner. Essentially, you want to block, and then to unblock on the first "statechange" event fired at the AudioContext. This will handle autoplay prevention and regular device initialization (that can be very slow, e.g. with bluetooth devices) correctly.

@mackron
Copy link
Author

mackron commented Feb 27, 2023

Thanks. That'll certainly be a good change. Glad to hear it'll make it in at some point.

Now on a technical note: it's unlikely that the Web Audio API will ever provide a synchronous constructor for AudioContext and other objects, because it's all implemented with asynchronous message passing, often with lock free primitives, underneath.

Understood. And if it's not practical to do a synchronous version at the Emscripten level it's not the end of the world or anything. But it's certainly something I thought worth raising because a lot of audio work in C will be synchronous, and a lot of cross-API abstractions, like my own, would require it so it'd be quite convenient.

@imiskolee
Copy link
Contributor

I am also afraid of the hard 128 samples buffer size, because of buffer size totally equal to audio callback interval time, or each frame max processing time, small buffer more real-time, when we build a DAW like software, sometimes need more user CPU resources.

@padenot
Copy link

padenot commented Aug 10, 2023

This is already specified and talked about above (preview, issue with discussion), and will start being worked on likely next year. After discussion with implementors, there wasn't enough resources this year to implement it.

@rbdannenberg
Copy link

rbdannenberg commented Oct 17, 2023

Note that it is a non-goal to enable writing fully C-based wasm audio worklets code without needing to hand-write any JS code.

Along that line, it seems that currently, the only way to create an audio worklet is

EM_BOOL result = emscripten_create_wasm_audio_worklet_processor_sync(audioContext, &workletProcessorOptions);

which means (I think) you have to have an audioContext on the C++ side, but if the audioContext was created on the JS side, how can we reference it from or pass it to C++? Does anyone know if emscriptenRegisterAudioObject() exists or is documented? Emscripten docs say "you would instead register that context to be visible to WebAssembly by calling the function emscriptenRegisterAudioObject()" but doesn't say how.

What if you had two modules with two classes of Audio Nodes? Are you required to keep them in separate Audio Contexts?

I apologize for cross-posting (#20349 (comment)).

I didn't solve this problem, but found a way to get the audio context back into JavaScript and also to create and run multiple instances of multiple types of audio nodes, working around many (real or perceived -- not sure) emscripten limitations: webaudiowasm-blog19oct2023.

@fwcd
Copy link

fwcd commented Mar 11, 2024

I can only emphasize this:

This is a bit more of a nit-picky thing, and I'm not sure how feasible this would be in practice, but it would be really nice to have synchronous versions of async functions. My library implements multiple backends across all the major platforms, and the abstraction requires that everything be synchronous, which I think is reasonable and a fairly common scenario. It gets annoying having to link to -sASYNCIFY and doing emscripten_sleep() in a loop while it waits for initialization to complete.

It would be indeed be very nice if there was either a way to use operate on audio contexts fully synchronously (even though I understand that it's next to impossible given that the underlying API is asynchronous) or if Asyncify would have support for simultaneous in-flight async operations.

In my anecdotal experience, integrating a Web Audio backend into a Qt app that already has to tiptoe around other async operations (#18412) is definitely a bit of a challenge. There are just so many moving pieces with both Asyncify and threads involved that it becomes hard to debug.

@padenot
Copy link

padenot commented Mar 11, 2024

We'll be able to do this when WebAudio/web-audio-api#2423 will be available. It's always possible to wrap an async API to be sync and the opposite (granted you have threads of course), maybe we should write examples. It's always a bad idea to block the main thread, however.

@Pinteresting
Copy link

Is anyone benchmarking or tried running audio worklets with 128 samples on a large sample of computers?

I want to use audio worklets over script processor nodes to more reliably play audio on potato devices i.e. 10+ year old laptop / phones / newer devices that are overloaded due to other applications running/etc.

However 128 samples seems like it's designed for extreme lower latency over lower CPU usage and will be a glitchy mess for a lot of people.

@padenot
Copy link

padenot commented Mar 12, 2024

The fact that the Web Audio API renders in blocks of 128 frames doesn't mean that the system renders in block of 128 frames. Latency isn't correlated with the render block size.

10 years old computers are fine, 10 years old phone it really depends on the OS and phone maker, it can range from great to terrible.

Implementations are generally running AudioWorkletProcessors on real-time threads. If it glitches because other applications overload the machine, an issue should be filed in the browser vendor's bug tracker (if the machine is otherwise capable of computing the workload in real-time, that is).

@visla-jiaozebo
Copy link

visla-jiaozebo commented May 31, 2024

@rbdannenberg

Does anyone know if emscriptenRegisterAudioObject() exists or is documented?

as from the source code, I think it should be in the javascript side:

let audioCtx = new AudioContext();
let ctx = Module.emscriptenRegisterAudioObject(audioCtx);
Module.someFunctionWithAudioCtx(ctx)

and in the c/c++ side:

void someFunctionWithAudioCtx(EMSCRIPTEN_WEBAUDIO_T ctx) {
  // do something with the audio context
  printf("the audio ctx:%d\n", ctx);
}
EMSCRIPTEN_BINDINGS(AudioCtx) {
  function("someFunctionWithAudioCtx", &someFunctionWithAudioCtx);
}

and needs to be exported with -sEXPORTED_RUNTIME_METHODS=[emscriptenRegisterAudioObject]
after the AudioContext was passed into the native layer, it can be treated as globally, as EMSCRIPTEN_WEBAUDIO_T is just a integer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants