-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Low latency output on Windows via ASIO/WASAPI exclusive #682
Comments
I'm curious how much of the latency is a result of using shared mode or non-ASIO output. "Button to audio" latency with random games doesn't say much, since it's also including input latency (the time from physically pressing a button to the OS detecting the input, then to the process detecting the input), and logic/frame latency (the time from the process getting input to processing a new logic frame, and from a logic frame to updating audio state, which can be at different rates), and only then getting to the audio latency. OpenAL Soft itself will add about 50ms on average, given the default 20ms period size and 60ms buffer. Certain post processors may add a couple more milliseconds (output limiter, UHJ encoder, etc, which will be reported as "Fixed device latency: ..." in the trace log). According to this page, starting with Windows 10 the default audio engine latency is 1.3ms, plus a 10ms default period size which will get written to the buffer for the hardware. So adding that all up, there should be about 51.3ms to 71.3ms if there's no other hidden latency anywhere. By changing OpenAL Soft's period size and period count properties, It could be reduced to a period size of 10ms and a 20ms buffer, which would make OpenAL Soft average 15ms, making the latency from OpenAL to output about 21.3ms to 31.3ms. Although this will have a higher risk of underruns. Before Windows 10, there's an additional 11ms for floating point sample streams and 5ms for integer sample streams. APOs may add additional latency, but there's no information about if there's any used normally.
Well, one apparent difference is it passes a bad period size to |
I have always appreciated ASIO... if not any because I had a Xonar sound card, and even for my poor realtek I had found a *native* driver anyway (also, I think they had made some multiclient driver?). I mean, putting aside that I don't think games are meaningfully hampered by this. Academically speaking, is it worth at least 1ms? Or is it just a relic of another epoch when the windows mixer was called KMixer? p.s. as far as WDM-KS workarounds go.. I believe FlexASIO was the current champ |
Inb4 this is as good as exclusive EDIT: follow up is here |
Not sure that would help too much. That simply forces the audio server/service to use a shorter update period, but the app's buffer size is left unchanged. Unless the app calculates a buffer size based on the device's period size, that would only cause more frequent updates for the same buffer size. And actually for such cases, that would cause slightly higher overall latency since the buffer won't drain as much before doing another update. If the buffer is 40ms total, for example, the default 10ms period size would mean the buffer would have 30ms filled by the time an update occurs, meaning latency as low as 30ms for anything triggered just before the update; whereas if the period size is forced to 2ms (or whatever it sets), the same buffer will have 38ms filled when an update occurs, meaning latency closer to 38ms for anything triggered just before the update. So with the default period size, latency can vary between 30-40ms, whereas with a "low latency" 2ms period size, latency can vary between between 38-40ms, a notably higher average and minimum bound. In the case of OpenAL Soft, it uses a multiple of the period size to stay close to its internal 20ms update size (or whatever |
I'm the one who made the modifications a long time ago. It was just a quick hack as a proof of concept, and wasn't really meant to be shared, no pun intended. While not a controlled experiment, I used Wireshark with USBcap to measure the delta between a mouse click and a response in the audio stream. The advantage of this approach is that the DAC's own latency is factored out, but it assumes Wireshark is precise enough to be useful. Using the lowest period size I could in shared mode, it tended towards 30 milliseconds and up, while in exclusive mode, it was typically around 20 milliseconds. |
Following kcat's suggestions here, I was able to compile OpenAL Soft with ASIO output via PortAudio: However, I'm not sure how to force set buffer size to 64 samples (lowest my sound card can handle in native ASIO apps) for the lowest possible latency because even after setting On a side note, adding DSOAL+RightMark3DSound.zip (specifically dsound.dll) breaks EFXShow for some reason, and RightMark3DSound crashes too with this build. |
Currently the way the PortAudio backend works is it opens and configures the output stream during
Probably a dependency loop. |
On a separate note, if you have a realtek you should try native asio instead of asio4all. |
@kcat Also I tried using the latest commit instead of latest stable version (from 2021) with no luck, and I found a PR that implements ASIO messages and rebuilt the dll with it, but Buffer Size Changed via the ASIO4ALL control panel still gets ignored. e.g. here I set it to request 64 samples (which works with native ASIO apps, at least after a restart) but it wouldn't change from 1920 as the log below shows
I wonder if disabling DirectSound support from PortAudio would get around that issue.
@mirh Sadly, my motherboard's onboard Realtek ALC1150 drivers don't include an ASIO driver, and I've tried the Dell Realtek drivers, even with this installer, but I ran into the same issue, even at 44100hz which some have reported to work more reliably:
It also suffers from other bugs like inputs/outputs randomly disappearing, some apps reporting single, high buffer size like 888 or 960. Using Creative's generic ASIO drivers on my X-Fi is even worse so I just use ASIO4ALL which just works™️ 99% of the time. |
I spy with my little eye 👀 aafaf6c
|
Did you try with some older drivers? ALC1150 is at least a decade old (UAD in particular is pretty delicate) |
It can't go less than 128 with an update size of 64. Some samples need to be playing while new samples are being generated, which is accomplished with double-buffering, and 64x2 = 128. Though it looks like it's not going lower than 176 (~3.6ms), which is x2.75. That could be a limit of PortAudio, to ensure there's enough time to call for more audio before underrunning, but OpenAL Soft is only asking for 128-sample latency for double-buffering, and is getting back 176. |
@mirh Any idea if those drivers perform any differently than ASIO4ALL? 🤔
@kcat I noticed there may be a pattern here:
So if my guess and math are right, ActualBuffer = ( Math( Alternatively, If I could set So would it be feasible to lower those limits to compensate for that extra buffering that's added anyway? Also worth noting that I'm still able to use 64 samples even at up to 192000hz (max supported in general) in other ASIO apps without a single crackle. |
|
There is some shaky report the new W10 low latency mode may get you 3ms latencies, but it's really freaking annoying how no competent developer can seem to independently get it to work and confirmed. p.s. as for the realtek asio driver, I found mixed opinions: one super positive, one neutral (old version sucks royally while new one is good, but pretty much the same of WASAPI) and another negative. |
<5ms latency in shared mode sounds too good to be true, but then again so did the graphics equivalent (fullscreen optimizations/flip model or whatever it's called) add it really turned out to be a decent middle middle ground between the performance and latency of exclusive fullscreen, without its inconveniences like not being able to draw regular windows on top of it and non seamless alt-tabbing, so I wonder if this would be feasible here as well, to reduce inconvenience and extra setup for the end user 🤔 |
There's really no reason why that shouldn't be possible, but to me the main caveat is this "low latency shared mode" apparently requires explicit support from the audio driver. I don't know if typical drivers offer such support (hopefully at least the Microsoft USB Audio drivers and Realtek drivers do, otherwise that's a huge chunk of the market left unadressed). I've never really looked into this particular feature. |
Commit 246d50d adds a config option for using WASAPI exclusive mode. It's experimental, I can't test it, and I don't know if Windows may have some quirky behavior with the reported period and buffer sizes, but it's there to try out. |
@kcat Neat! It activates exclusive mode, but the audio's not quite right. Tested on Realtek ALC1150 (internal), Creative X-Fi Surround 5.1 (USB) and the PlayStation 5 DualSense controller (USB) because why not.
Test files and logs: OpenAL Soft + DSOAL - WASAPI Exclusive test.zip |
This is normal and expected if the Creative has hardware volume control, but the DualSense doesn't. In any case, there's nothing an application can do about it. |
That is odd. I don't know what requesting such a high period size will do (which gets clamped to 8192). WASAPI seems to report back 10ms (480 samples) for the period size, but 8192 for the buffer size. It may be more helpful to see logs using more normal (or other low values that are expected to work) values. As well as logs using normal OpenAL apps, rather than DSOAL (if OpenAL Soft is having issues mishandling the buffer or period size, it could influence DSOAL behavior and cause extra issues). |
Yup, I set it to 65536 myself to see how high I'd need to go to stop popping/crackling but yeah I noticed after one point it stopped improving, so here's another test with And here's another one with OpenAL Minerva: Minerva Tests
Results were similar, but worse:
On a side note, I don't remember the log getting flooded with errors when moving the sound emitter position in EFXShow. |
New test with the build from @LAGonauta's PR #1084 Now I'm getting audio on my Realtek onboard, and even without any slowdowns or pops, like the old fork, though the lowest period size I can reach without crashing is 160@48000hz on my Realtek. I remember being able to run 96@44100hz in the old fork. My DualSense and X-Fi USB devices can get down to 144@48000hz, but they still suffer from the same severe audio speed/quality issues even at high period size. IIRC all my devices can get down to 64 samples when using ASIO, so I wonder if WE is just not able to reach such low buffer/period sizes 🤔 Anyway, I included the logs here: |
The Realtek crash with 144 period size seems to be due to The DualSense crash is because it's an unsupported format, |
@kcat It might be worth pointing out that in addition to Quad config only, the DualSense only supports 16-bit 48000hz |
@kcat That makes sense.
Btw, apparently |
I'm curious how it calculates the latency with DSound. The only place DSOAL reads OpenAL update rate is
The period size should now get clamped to the lowest supported value, with commit ee61ef3. I don't know what's causing the crash when there's an error preventing the device from being reset, since it doesn't seem to be in OpenAL Soft itself I don't know where to look. More testing from more people would be nice in either case. |
@kcat Clamping works as expected 👌 By the way, would it be a good idea for the period size to get automatically clamped to the lowest supported value by default on WASAPI exclusive when the period size isn't set? Lowest possible latency is probably the main reason why people would use it (besides bypassing system effects). By the way, I noticed something really odd: [General]
period_size=1
periods=1
[wasapi]
exclusive-mode=true When using just that config (removed stuff setting frequency/stereo/headphone/HRTF), I noticed that setting the speaker config to 7.1 or Quadraphonic in the Windows sound panel clamps to an even lower value (144) compared to 160 in Stereo and 5.1. 🤔 |
I'd be wary for performance concerns. It's fine if the user wants to play around and find what works for their system, but the defaults should reliably work. Exclusive mode will automatically drop to 2 periods (WASAPI itself forces this for exclusive mode), but the period size probably shouldn't act differently.
It's a limitation of how the system works. The device needs to have samples to play while more samples are being generated. Just like how with graphics, you have one image buffer that's being physically shown on the monitor, while rendering the next image on a separate unseen image buffer. It's a classic double-buffer setup; write to one buffer while the other buffer is being presented to the user, so that when the other buffer is done being presented to the user, the next one is ready and gets swapped in seamlessly. Unless the code is poking physical hardware bits at precisely timed intervals for the DAC (which is typically at the hardware or firmware level, modern OSs don't give the necessary timing and performance guarantees to do it with software), you need a second buffer/period to prepare samples in ahead of time. Regardless, in the case of WASAPI exclusive mode, the client (OpenAL Soft) has no say in the number of periods, it's always 2.
Interesting, maybe some restrictions on the period sizes the hardware can use (for the number of bytes rather than the number of sample frames). Each period invokes a hardware interrupt, so there's probably limits on how often the interrupt can be fired while processing the hardware buffer. |
@kcat Then I guess we could just direct anyone who wants the lowest possible latency to the configs added in #1094 👌
Does that mean ASIO actually uses 128 sample buffer when it says it's using 64, or does it just have more precise hardware timings that allow single period which WASAPI lacks even in exclusive mode? |
Yes. ASIO uses a double buffering system. When you ask ASIO for a buffer size of N, that refers to the size of a single buffer, and the total buffer size is N*2. This is indeed a bit confusing - ideally ASIO should have called this number something like "periodicity" (which is the term other APIs like WASAPI use), i.e. the application gets a callback every N samples. |
High audio latency (100ms+) is something that has plagued apps/games on Windows for a long time, yet people rarely notice, let alone measure it, like Matt Gore, HeSuVi developer and Battle(non)sense.
So I've been wondering if ASIO could be implemented into OpenAL Soft directly, since Crystal Mixer already did something like that, but AFAIK it's only capable of virtualizing the multichannel audio mix.
Alternatively, someone even modified OpenAL Soft to use WASAPI in exclusive mode, which I've tested and confirmed it does make a difference (tho I'm yet to measure it), so I forked it here
Perhaps a flag to switch to exclusive mode in the main branch would be more feasible since it just seems to require a couple line edits (tho it would probably need some improvement so it's not restricted to
sample-type=int16
andperiod_size
is automatically set to the lowest supported by the sound card, as well as minimal/no mixahead or any other bottlenecks to ensure lowest possible latency).Either option should hopefully allow ultra low latency on thousands of games that are at least potentially supported by OpenAL Soft by using the sound card's native ASIO or ASIO4ALL. I think audio would also be bit-perfect audio (at least on WASAPI exclusive) by bypassing the Windows mixer. So perhaps eventually including both would give people options based on their needs.
The text was updated successfully, but these errors were encountered: