Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Low latency output on Windows via ASIO/WASAPI exclusive #682

Open
ThreeDeeJay opened this issue Apr 10, 2022 · 32 comments

Comments

@ThreeDeeJay
Copy link
Contributor

ThreeDeeJay commented Apr 10, 2022

High audio latency (100ms+) is something that has plagued apps/games on Windows for a long time, yet people rarely notice, let alone measure it, like Matt Gore, HeSuVi developer and Battle(non)sense.
image
image

So I've been wondering if ASIO could be implemented into OpenAL Soft directly, since Crystal Mixer already did something like that, but AFAIK it's only capable of virtualizing the multichannel audio mix.

Alternatively, someone even modified OpenAL Soft to use WASAPI in exclusive mode, which I've tested and confirmed it does make a difference (tho I'm yet to measure it), so I forked it here
Perhaps a flag to switch to exclusive mode in the main branch would be more feasible since it just seems to require a couple line edits (tho it would probably need some improvement so it's not restricted to sample-type=int16 and period_size is automatically set to the lowest supported by the sound card, as well as minimal/no mixahead or any other bottlenecks to ensure lowest possible latency).

Either option should hopefully allow ultra low latency on thousands of games that are at least potentially supported by OpenAL Soft by using the sound card's native ASIO or ASIO4ALL. I think audio would also be bit-perfect audio (at least on WASAPI exclusive) by bypassing the Windows mixer. So perhaps eventually including both would give people options based on their needs.

@kcat
Copy link
Owner

kcat commented Apr 11, 2022

I'm curious how much of the latency is a result of using shared mode or non-ASIO output. "Button to audio" latency with random games doesn't say much, since it's also including input latency (the time from physically pressing a button to the OS detecting the input, then to the process detecting the input), and logic/frame latency (the time from the process getting input to processing a new logic frame, and from a logic frame to updating audio state, which can be at different rates), and only then getting to the audio latency.

OpenAL Soft itself will add about 50ms on average, given the default 20ms period size and 60ms buffer. Certain post processors may add a couple more milliseconds (output limiter, UHJ encoder, etc, which will be reported as "Fixed device latency: ..." in the trace log).

According to this page, starting with Windows 10 the default audio engine latency is 1.3ms, plus a 10ms default period size which will get written to the buffer for the hardware. So adding that all up, there should be about 51.3ms to 71.3ms if there's no other hidden latency anywhere. By changing OpenAL Soft's period size and period count properties, It could be reduced to a period size of 10ms and a 20ms buffer, which would make OpenAL Soft average 15ms, making the latency from OpenAL to output about 21.3ms to 31.3ms. Although this will have a higher risk of underruns.

Before Windows 10, there's an additional 11ms for floating point sample streams and 5ms for integer sample streams. APOs may add additional latency, but there's no information about if there's any used normally.

Alternatively, someone even modified OpenAL Soft to use WASAPI in exclusive mode, which I've tested and confirmed it does make a difference (tho I'm yet to measure it), so I forked it here https://github.com/ThreeDeeJay/openal-soft-WASAPI-exclusive/commit/9cd722fc9a80181cc9c86db9a0ec86728dafb7a3

Well, one apparent difference is it passes a bad period size to IAudioClient::Initialize (it passes the same size for the buffer and period size, using the buffer size, when the buffer size should be at least twice the period size), sets incorrect values for the OpenAL device's buffer and period size (sets the period size using the buffer size), and doesn't properly pace updates (whenever the mixer thread wakes up, it processes however many samples WASAPI says are available regardless if it's at the period size yet). It also seems to get the minimum period size before initialization and the buffer size after initialization, but does nothing with them. It's impossible to tell what the device is going to do with regards to buffering/latency.

@mirh
Copy link

mirh commented Apr 18, 2022

I have always appreciated ASIO... if not any because I had a Xonar sound card, and even for my poor realtek I had found a *native* driver anyway (also, I think they had made some multiclient driver?).
But is there really much of a point in 2022 over a "normal" api like IAudioClient3 in exclusive mode? mumble-voip/mumble#1604

I mean, putting aside that I don't think games are meaningfully hampered by this. Academically speaking, is it worth at least 1ms? Or is it just a relic of another epoch when the windows mixer was called KMixer?

p.s. as far as WDM-KS workarounds go.. I believe FlexASIO was the current champ

@mirh
Copy link

mirh commented Jun 1, 2022

Inb4 this is as good as exclusive
https://github.com/miniant-git/REAL

EDIT: follow up is here

@kcat
Copy link
Owner

kcat commented Jun 1, 2022

Inb4 this is as good as exclusive
https://github.com/miniant-git/REAL

Not sure that would help too much. That simply forces the audio server/service to use a shorter update period, but the app's buffer size is left unchanged. Unless the app calculates a buffer size based on the device's period size, that would only cause more frequent updates for the same buffer size.

And actually for such cases, that would cause slightly higher overall latency since the buffer won't drain as much before doing another update. If the buffer is 40ms total, for example, the default 10ms period size would mean the buffer would have 30ms filled by the time an update occurs, meaning latency as low as 30ms for anything triggered just before the update; whereas if the period size is forced to 2ms (or whatever it sets), the same buffer will have 38ms filled when an update occurs, meaning latency closer to 38ms for anything triggered just before the update. So with the default period size, latency can vary between 30-40ms, whereas with a "low latency" 2ms period size, latency can vary between between 38-40ms, a notably higher average and minimum bound.

In the case of OpenAL Soft, it uses a multiple of the period size to stay close to its internal 20ms update size (or whatever period_size is set to), with a total buffer that's 3 times the size (or whatever periods is set to). So latency and update granularity should remain somewhat consistent regardless of what that does. It will just be woken up more often to check if there's enough writable space to do a full update, wasting CPU time. That would allow you to set a smaller period size since it won't be limited to a multiple of the 10ms default, instead a multiple of whatever that sets, but it won't do anything on its own.

@ThreeDeeJay
Copy link
Contributor Author

Never had luck getting less than 10ms with REAL.
image
I should point out that I didn't revert back to the Microsoft drivers (which is optional anyway) cuz I wouldn't wanna lose 7.1/5.1 in both my internal/USB sound cards.

mirh referenced this issue in LAGonauta/RetroArch Jul 17, 2022
@Enokilis
Copy link

Well, one apparent difference is it passes a bad period size to IAudioClient::Initialize (it passes the same size for the buffer and period size, using the buffer size, when the buffer size should be at least twice the period size), sets incorrect values for the OpenAL device's buffer and period size (sets the period size using the buffer size), and doesn't properly pace updates (whenever the mixer thread wakes up, it processes however many samples WASAPI says are available regardless if it's at the period size yet). It also seems to get the minimum period size before initialization and the buffer size after initialization, but does nothing with them. It's impossible to tell what the device is going to do with regards to buffering/latency.

I'm the one who made the modifications a long time ago. It was just a quick hack as a proof of concept, and wasn't really meant to be shared, no pun intended.

While not a controlled experiment, I used Wireshark with USBcap to measure the delta between a mouse click and a response in the audio stream. The advantage of this approach is that the DAC's own latency is factored out, but it assumes Wireshark is precise enough to be useful. Using the lowest period size I could in shared mode, it tended towards 30 milliseconds and up, while in exclusive mode, it was typically around 20 milliseconds.
As mentioned before, this is very app-dependent, and some OpenAL game could easily create a delay close to three digits of milliseconds, so exclusive mode is hardly a panacea.

@ThreeDeeJay
Copy link
Contributor Author

Following kcat's suggestions here, I was able to compile OpenAL Soft with ASIO output via PortAudio:
OpenALSoft+PortAudio+ASIO.zip
sublime_text_45POugxw0M
image sublime_text_7hZrPKb15p
na3tXhQ8DC

However, I'm not sure how to force set buffer size to 64 samples (lowest my sound card can handle in native ASIO apps) for the lowest possible latency because even after setting period_size=64 it keeps resetting to much higher values and the slider in ASIO4ALL gets ignored, so perhaps there's something that I'm missing? 🤔
alsoft_error.txt

On a side note, adding DSOAL+RightMark3DSound.zip (specifically dsound.dll) breaks EFXShow for some reason, and RightMark3DSound crashes too with this build.

@kcat
Copy link
Owner

kcat commented Jul 20, 2024

However, I'm not sure how to force set buffer size to 64 samples (lowest my sound card can handle in native ASIO apps) for the lowest possible latency because even after setting period_size=64 it keeps resetting to much higher values and the slider in ASIO4ALL gets ignored, so perhaps there's something that I'm missing? 🤔

Currently the way the PortAudio backend works is it opens and configures the output stream during alcOpenDevice, when the actual ALCdevice configuration isn't handled until alcCreateContext (and alcResetDeviceSOFT, etc), so the PortAudio stream gets configured with the default properties. A way to fix this would be to recreate the PortAudio stream in PortPlayback::reset, but that risks failing if the device doesn't like being reopened immediately after closing, or if there's trouble getting it working with a usable format, making the device unusable.

On a side note, adding DSOAL+RightMark3DSound.zip (specifically dsound.dll) breaks EFXShow for some reason, and RightMark3DSound crashes too with this build.

Probably a dependency loop. 0xC0000142 is STATUS_DLL_INIT_FAILED, and since PortAudio can use DSound, having OpenAL Soft load PortAudio, which loads DSound/DSOAL, which loads OpenAL Soft, creates a circular loop which causes the DLL to fail initialization.

@mirh
Copy link

mirh commented Jul 20, 2024

On a separate note, if you have a realtek you should try native asio instead of asio4all.

@ThreeDeeJay
Copy link
Contributor Author

ThreeDeeJay commented Jul 21, 2024

@kcat Also I tried using the latest commit instead of latest stable version (from 2021) with no luck, and I found a PR that implements ASIO messages and rebuilt the dll with it, but Buffer Size Changed via the ASIO4ALL control panel still gets ignored. e.g. here I set it to request 64 samples (which works with native ASIO apps, at least after a restart) but it wouldn't change from 1920 as the log below shows
image

Probably a dependency loop. 0xC0000142 is STATUS_DLL_INIT_FAILED, and since PortAudio can use DSound, having OpenAL Soft load PortAudio, which loads DSound/DSOAL, which loads OpenAL Soft, creates a circular loop which causes the DLL to fail initialization.

I wonder if disabling DirectSound support from PortAudio would get around that issue.
It would be interesting to check whether apps/games using at least DirectSound would get low latency via this ASIO route 🤔

On a separate note, if you have a realtek you should try native asio instead of asio4all.

@mirh Sadly, my motherboard's onboard Realtek ALC1150 drivers don't include an ASIO driver, and I've tried the Dell Realtek drivers, even with this installer, but I ran into the same issue, even at 44100hz which some have reported to work more reliably:

[ALSOFT] (II) Created device 02E32960, "OpenAL Soft on Realtek ASIO"
[ALSOFT] (II) Found option frequency = "44100"
[ALSOFT] (II) Found option period_size = "128"
[ALSOFT] (II) Found option stereo-encoding = "hrtf"
[ALSOFT] (II) ALC_MAX_AUXILIARY_SENDS = 2
[ALSOFT] (II) Pre-reset: Stereo, Float32, *44100hz, 128 / 384 buffer
[ALSOFT] (II) Reported stream latency: 0.002979 sec (143.000000 samples)
[ALSOFT] (WW) Failed to set 44100hz, got 48000hz instead
[ALSOFT] (II) Post-reset: Stereo, Float32, 48000hz, 960 / 1920 buffer
[...]
[ALSOFT] (II) Post-start: Stereo, Float32, 48000hz, 960 / 1920 buffer

It also suffers from other bugs like inputs/outputs randomly disappearing, some apps reporting single, high buffer size like 888 or 960. Using Creative's generic ASIO drivers on my X-Fi is even worse so I just use ASIO4ALL which just works™️ 99% of the time.

@ThreeDeeJay
Copy link
Contributor Author

I spy with my little eye 👀 aafaf6c
Good news, now it's reporting much lower buffer size, though not quite the lowest.
I specified period_size=64 and periods=2 but ASIO4ALL still refuses to go below 128 for some reason.
alsoft_error.txt

[ALSOFT] (II) Pre-reset: Stereo, Float32, 48000hz, 64 / 128 buffer
[ALSOFT] (II) Post-reset: Stereo, Float32, 48000hz, 64 / 172 buffer
[ALSOFT] (II) Post-start: Stereo, Float32, 48000hz, 64 / 172 buffer
[ALSOFT] (II) Pre-reset: Stereo, Float32, 48000hz, 64 / 128 buffer
[ALSOFT] (II) Post-reset: Stereo, Float32, 48000hz, 64 / 176 buffer
[ALSOFT] (II) Post-start: Stereo, Float32, 48000hz, 64 / 176 buffer

EFX10ShowWin32_p9LQ83FZ0k
For reference, here's how it should look:
image

@mirh
Copy link

mirh commented Jul 21, 2024

Did you try with some older drivers? ALC1150 is at least a decade old (UAD in particular is pretty delicate)
Or maybe with the hacked ones on TPU.

@kcat
Copy link
Owner

kcat commented Jul 22, 2024

Good news, now it's reporting much lower buffer size, though not quite the lowest. I specified period_size=64 and periods=2 but ASIO4ALL still refuses to go below 128 for some reason.
alsoft_error.txt

It can't go less than 128 with an update size of 64. Some samples need to be playing while new samples are being generated, which is accomplished with double-buffering, and 64x2 = 128. Though it looks like it's not going lower than 176 (~3.6ms), which is x2.75. That could be a limit of PortAudio, to ensure there's enough time to call for more audio before underrunning, but OpenAL Soft is only asking for 128-sample latency for double-buffering, and is getting back 176.

@ThreeDeeJay
Copy link
Contributor Author

Did you try with some older drivers? ALC1150 is at least a decade old (UAD in particular is pretty delicate) Or maybe with the hacked ones on TPU.

@mirh Any idea if those drivers perform any differently than ASIO4ALL? 🤔
Seems a bit tedious and unsafe if it might also require disabling driver signature enforcement to install modified drivers.
I even had to revert drivers R2.83 released earlier this year because 7.1 surround configuration was missing so I went back to R2.82 from like 2017 lol

It can't go less than 128 with an update size of 64. Some samples need to be playing while new samples are being generated, which is accomplished with double-buffering, and 64x2 = 128. Though it looks like it's not going lower than 176 (~3.6ms), which is x2.75. That could be a limit of PortAudio, to ensure there's enough time to call for more audio before underrunning, but OpenAL Soft is only asking for 128-sample latency for double-buffering, and is getting back 176.

@kcat I noticed there may be a pattern here:

  • frequency=44100
    • period_size=64
      • * 2 = 128 double buffer
        • + 44 = 172 actual buffer
          • [ALSOFT] (II) Post-start: Stereo, Float32, 44100hz, 64 / 172 buffer
  • frequency=48000
    • period_size=96
      • * 2 = 192 double buffer
        • + 48 = 240 actual buffer
          • [ALSOFT] (II) Post-start: Stereo, Float32, 48000hz, 96 / 240 buffer
  • frequency=96000
    • period_size=64
      • * 2 = 128 double buffer
        • + 96 = 224 actual buffer
          • [ALSOFT] (II) Post-start: Stereo, Float32, 96000hz, 64 / 224 buffer

So if my guess and math are right, ActualBuffer = (period_size * 2) + (frequency/1000), then given ActualBuffer = 64 and frequency=48000, period_size would need to be 8 but that's way below the acceptable values.

Math

(period_size * 2) + (frequency/1000) = 64
͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞2 ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞ ͞   ͞ ͞ ͞2 ͞
period_size + [(48000/1000)/2] = 32
period_size + (48/2) = 32
period_size + 24 = 32
period_size = 32 - 24
period_size = 32 - 24
period_size = 8

Alternatively, If I could set periods=1, I could just set period_size=16 so that extra + 48 at 48000Hz adds up to 64, but periods=1 isn't acceptable either.

So would it be feasible to lower those limits to compensate for that extra buffering that's added anyway?
I wonder if it'd increase CPU usage significantly tho, at least compared to native ASIO at 64 buffer.

Also worth noting that I'm still able to use 64 samples even at up to 192000hz (max supported in general) in other ASIO apps without a single crackle.
image

@mirh
Copy link

mirh commented Jul 22, 2024

I don't have any idea, other than audio vendors not having needed ASIO in the first place if WDM-KS had been enough.
Anyhow, whatever it's just an audio driver. Even R2.79 has its admirers

@mirh
Copy link

mirh commented Aug 17, 2024

There is some shaky report the new W10 low latency mode may get you 3ms latencies, but it's really freaking annoying how no competent developer can seem to independently get it to work and confirmed.

p.s. as for the realtek asio driver, I found mixed opinions: one super positive, one neutral (old version sucks royally while new one is good, but pretty much the same of WASAPI) and another negative.

@ThreeDeeJay
Copy link
Contributor Author

ThreeDeeJay commented Aug 17, 2024

<5ms latency in shared mode sounds too good to be true, but then again so did the graphics equivalent (fullscreen optimizations/flip model or whatever it's called) add it really turned out to be a decent middle middle ground between the performance and latency of exclusive fullscreen, without its inconveniences like not being able to draw regular windows on top of it and non seamless alt-tabbing, so I wonder if this would be feasible here as well, to reduce inconvenience and extra setup for the end user 🤔

@dechamps
Copy link

<5ms latency in shared mode sounds too good to be true

There's really no reason why that shouldn't be possible, but to me the main caveat is this "low latency shared mode" apparently requires explicit support from the audio driver. I don't know if typical drivers offer such support (hopefully at least the Microsoft USB Audio drivers and Realtek drivers do, otherwise that's a huge chunk of the market left unadressed). I've never really looked into this particular feature.

@kcat
Copy link
Owner

kcat commented Dec 14, 2024

Commit 246d50d adds a config option for using WASAPI exclusive mode. It's experimental, I can't test it, and I don't know if Windows may have some quirky behavior with the reported period and buffer sizes, but it's there to try out.

@ThreeDeeJay
Copy link
Contributor Author

ThreeDeeJay commented Dec 14, 2024

@kcat Neat! It activates exclusive mode, but the audio's not quite right. Tested on Realtek ALC1150 (internal), Creative X-Fi Surround 5.1 (USB) and the PlayStation 5 DualSense controller (USB) because why not.

  • It requires sample-type=int16 to prevent crashing, like the old fork
  • If I use period_size=144 (lowest value that doesn't crash, though my onboard Realtek requires 160), it sounds kinda robotic/low bitrate and slowed down. And if I use period_size=65536, playback speed and quality seems normal but but there's still frequent pops/crackles.
  • I'm getting no audio on my onboard Realtek, which ironically is the only one that works with the old fork where Creative X-Fi USB has no audio and DualSense crashes
  • The system volume doesn't work on my Creative X-Fi USB (not uncommon when using WASAPI exclusive), but weirdly enough, it does work on my DualSense, which by the way only supports quadraphonic configuration (first 2 channels are the regular stereo for headphones/built-in speaker and the last 2 are for haptics/vibration)

Test files and logs: OpenAL Soft + DSOAL - WASAPI Exclusive test.zip
Old fork code for reference: master...ThreeDeeJay:openal-soft:WASAPI-Exclusive

@dechamps
Copy link

dechamps commented Dec 14, 2024

The system volume doesn't work on my Creative X-Fi USB (not uncommon when using WASAPI exclusive), but weirdly enough, it does work on my DualSense

This is normal and expected if the Creative has hardware volume control, but the DualSense doesn't. In any case, there's nothing an application can do about it.

@kcat
Copy link
Owner

kcat commented Dec 15, 2024

[ALSOFT] (II) Found option period_size = "65536"
[ALSOFT] (II) Found option periods = "1"

That is odd. I don't know what requesting such a high period size will do (which gets clamped to 8192). WASAPI seems to report back 10ms (480 samples) for the period size, but 8192 for the buffer size. It may be more helpful to see logs using more normal (or other low values that are expected to work) values. As well as logs using normal OpenAL apps, rather than DSOAL (if OpenAL Soft is having issues mishandling the buffer or period size, it could influence DSOAL behavior and cause extra issues).

@ThreeDeeJay
Copy link
Contributor Author

Yup, I set it to 65536 myself to see how high I'd need to go to stop popping/crackling but yeah I noticed after one point it stopped improving, so here's another test with period_size=960 (default for 48KHz) using OpenAL SDK's EFX10ShowWin32:
EFX10Show - WASAPI Exclusive test.zip

And here's another one with OpenAL Minerva:
OpenAL-Minerva - WASAPI Exclusive test.zip

Minerva Tests
This application is currently running as x86.
Renderer: OpenAL Soft
Total XRam: 64MB
Free XRam: 64MB
Free XRam after loading files: 64MB

Testing 2D Mixing.
Playing stereo source of 22kHz and 16-bit.
Mixing with mono source of 44kHz and 16-bit.
Mixing with stereo source of 22kHz and 16-bit.
Mixing with mono source of 11kHz and 16-bit.
Mixing with mono source of 44kHz and 8-bit.
Mixing with mono source of 44kHz and 16-bit.

Testing Sample Rate Conversion.
Loading mono sample of 11kHz and 16-bit.
Initial playback at normal rate, followed by...
sample rate conv. with factor range of <0.5 - 3.5>.
Loading mono sample of 22kHz and 16-bit.
Initial playback at normal rate, followed by...
sample rate conv. with factor range of <0.5 - 3.5>.

Testing Distance...
for mono file of format 44kHz and 16-bit
-Doppler effect turned off
 with MIN=1.0m and MAX=35.0m

 with MIN=1.0m and MAX=infinity

Testing Doppler...
for mono source of 22kHz and 16-bit
with speed of 36 km/h
with speed of 108km/h
with speed of 36km/h
with speed of 108km/h
with speed of 90km/h and doppler factor of 1.0
with speed of 90km/h and doppler factor of 2.0
with speed of 90km/h and doppler factor of 4.0

Testing Positioning with single source...
mono of 44kHz and 16-bit
-Doppler effect turned off
Source first rotates around the head twice, with 2m radius...
Then rises vertically to the front left of the listener
and falls vertically to the rear right of the listener

Testing multi source positioning...
Red source format: 44kHz 16-bit
Blue source format: 22kHz 8-bit
Green source format: 44kHz 16-bit
Pink source format: 44kHz 8-bit

Testing Radiation...
Inner Cone angle = 90 degrees
Outer Cone angle = 270 degrees
Source position = (0, 0, -2)
Source position = (0, 0, 2)

Testing Latency...

Testing EFX Echo effect...
with mono source of 22kHz and 8-bit...
first without effects
now with effects
and with mono source of 44kHz and 16-bit

Testing EFX Reverb effect...
with mono source of 22kHz and 8-bit...
first without effects
now with effects
and with mono source of 44kHz and 16-bit

Testing EFX EAX Reverb effect...
with mono source of 22kHz and 8-bit...
first without effects
now with effects
now one rotation of the source at a distance of 4m
now one rotation of the source and the effects at a distance of 4m

Testing EFX Occlusion effect...
with mono source of 22kHz and 8-bit...
first fully occluded
now disabling occlusion
and enabling it again

Testing EFX Exclusion effect...
with mono source of 22kHz and 8-bit...
first showing the transition with reverb that is transmited through walls
now showing the transition with reverb that can only go through the aperture

Testing EFX Obstruction effect...
with mono source of 22kHz and 8-bit...

All tests ended.

Results were similar, but worse:

  • Audio slow down more pronounced
  • More frequent pops/crackling and longer (especially with the DualSense where it's more like audio getting intermittently getting cut off instead of shorter crackling like with the Creative card)
  • Still no audio from my Realtek device
  • Increasing periods doesn't seem to help
  • Not using sample-type=int16 still crashes

On a side note, I don't remember the log getting flooded with errors when moving the sound emitter position in EFXShow.
If anyone else can confirm on their end I can open a new issue 👀

@ThreeDeeJay
Copy link
Contributor Author

New test with the build from @LAGonauta's PR #1084

Now I'm getting audio on my Realtek onboard, and even without any slowdowns or pops, like the old fork, though the lowest period size I can reach without crashing is 160@48000hz on my Realtek. I remember being able to run 96@44100hz in the old fork. My DualSense and X-Fi USB devices can get down to 144@48000hz, but they still suffer from the same severe audio speed/quality issues even at high period size. IIRC all my devices can get down to 64 samples when using ASIO, so I wonder if WE is just not able to reach such low buffer/period sizes 🤔

Anyway, I included the logs here:
LAGonautaPR - OpenAL Soft - WASAPI Exclusive test.zip

@kcat
Copy link
Owner

kcat commented Dec 16, 2024

The Realtek crash with 144 period size seems to be due to IAudioClient::Initialize failing with AUDCLNT_E_BUFFER_SIZE_NOT_ALIGNED. This isn't handled yet, so the reset fails. I haven't been able to dig too deep into it, but for some reason, if resetting fails, a crash happens at some later point, even though it should be able to clean up fine.

The DualSense crash is because it's an unsupported format, IAudioClient::IsFormatSupported returns false and it tries to use the default mixing format as a fallback, which it doesn't like since it's a float format and initialization fails. The X-Fi crash is a little weird... IAudioClient::Initialize fails with AUDCLNT_E_UNSUPPORTED_FORMAT, but IAudioClient::IsFormatSupported succeeded just before, so I don't know what it doesn't like.

@ThreeDeeJay
Copy link
Contributor Author

@kcat It might be worth pointing out that in addition to Quad config only, the DualSense only supports 16-bit 48000hz
As for the Creative card, I noticed the drivers didn't actually install for some reason (guess Windows 11 hates PAX drivers?) so I just installed the official version and now everything's working fine: at 44100Hz I can even get down to period_size=133 and in high quality (no longer bit crushed). Sadly I can't find any newer/official Sony drivers for the DualSense, only the generic Windows one .

@ThreeDeeJay
Copy link
Contributor Author

ThreeDeeJay commented Jan 3, 2025

#1084 (comment)
Possibly the Realtek device can't go lower than 160 (~3.333ms), while the Creative card can go a bit lower. If initializing fails with AUDCLNT_E_BUFFER_SIZE_NOT_ALIGNED, it gets the next supported period size, which would be 160. Values less than 133 are probably triggering a different initialization error (AUDCLNT_E_BUFFER_SIZE_ERROR maybe?), which causes a failure and the delayed crash.

@kcat That makes sense.
I found a neat program called FreePiano to test a MIDI/computer keyboard with multiple output APIs and even reports latency which seems about right though I'm not sure how accurate it is, and the sample-rate/bit-depth sometimes change unpredictably.

  • WASAPI shared reports 10ms
    image
  • Same for DirectSound, which makes sense since it uses WASAPI shared (Exclusive option makes no difference)
    image
  • With DSOAL using WASAPI exclusive build/cfg, the improvement in latency is subtle but noticeable, even if it's just 6.4ms. I wonder if allowing a single period in OpenAL Soft would be feasible/beneficial 🤔
    image
  • WASAPI exclusive reports 3.3ms, which matches 160 buffer like you mentioned so I guess that's just the lowest the device can go in WASAPI exclusive 🤔
    image
  • ASIO is still able to go even lower as usual
    explorer_pr3x1tSF6A
    It's almost indistinguishable from DSOAL but tbh I doubt ~2ms(?) would be worth the hassle of setting up ASIO (not to mention implementing it in OpenAL Soft).

Btw, apparently sample-type=int16 isn't needed anymore, so I guess the only thing left to do before we can close this issue is to prevent no audio/crash when setting periods too low (perhaps exclusive-mode=true should automatically find the lowest supported period_size if it's not specified already, to simplify setup) and maybe get some more testers so it's considered stable. 👀👌

@kcat
Copy link
Owner

kcat commented Jan 3, 2025

With DSOAL using WASAPI exclusive build/cfg, the improvement in latency is subtle but noticeable, even if it's just 6.4ms. I wonder if allowing a single period in OpenAL Soft would be feasible/beneficial 🤔

I'm curious how it calculates the latency with DSound. The only place DSOAL reads OpenAL update rate is DSound8OAL::notifyThread, which is just checking source offsets/state to trigger any event handles the app sets for DSound buffers, and doesn't report it to the app. I don't see anything in the source where it calculates the latency either.

Btw, apparently sample-type=int16 isn't needed anymore, so I guess the only thing left to do before we can close this issue is to prevent no audio/crash when setting periods too low (perhaps exclusive-mode=true should automatically find the lowest supported period_size if it's not specified already, to simplify setup) and maybe get some more testers so it's considered stable and this can be closed. 👀👌

The period size should now get clamped to the lowest supported value, with commit ee61ef3. I don't know what's causing the crash when there's an error preventing the device from being reset, since it doesn't seem to be in OpenAL Soft itself I don't know where to look. More testing from more people would be nice in either case.

@ThreeDeeJay
Copy link
Contributor Author

@kcat Clamping works as expected 👌
https://www.diffchecker.com/otDkbAIh/

By the way, would it be a good idea for the period size to get automatically clamped to the lowest supported value by default on WASAPI exclusive when the period size isn't set? Lowest possible latency is probably the main reason why people would use it (besides bypassing system effects).
And on that note, could periods=1 ever work now that we have exclusive mode? or is there a fundamental reason/hardware limitation why 2 would still be the lowest acceptable value?

By the way, I noticed something really odd:

[General]
period_size=1
periods=1

[wasapi]
exclusive-mode=true

When using just that config (removed stuff setting frequency/stereo/headphone/HRTF), I noticed that setting the speaker config to 7.1 or Quadraphonic in the Windows sound panel clamps to an even lower value (144) compared to 160 in Stereo and 5.1. 🤔
https://www.diffchecker.com/7lbpLQFO/
https://www.diffchecker.com/NBJQLmxI/
Same with frequency=44100
https://www.diffchecker.com/8dqoFpnm/

@kcat
Copy link
Owner

kcat commented Jan 4, 2025

By the way, would it be a good idea for the period size to get automatically clamped to the lowest supported value by default on WASAPI exclusive when the period size isn't set? Lowest possible latency is probably the main reason why people would use it (besides bypassing system effects).

I'd be wary for performance concerns. It's fine if the user wants to play around and find what works for their system, but the defaults should reliably work. Exclusive mode will automatically drop to 2 periods (WASAPI itself forces this for exclusive mode), but the period size probably shouldn't act differently.

And on that note, could periods=1 ever work now that we have exclusive mode? or is there a fundamental reason/hardware limitation why 2 would still be the lowest acceptable value?

It's a limitation of how the system works. The device needs to have samples to play while more samples are being generated. Just like how with graphics, you have one image buffer that's being physically shown on the monitor, while rendering the next image on a separate unseen image buffer. It's a classic double-buffer setup; write to one buffer while the other buffer is being presented to the user, so that when the other buffer is done being presented to the user, the next one is ready and gets swapped in seamlessly. Unless the code is poking physical hardware bits at precisely timed intervals for the DAC (which is typically at the hardware or firmware level, modern OSs don't give the necessary timing and performance guarantees to do it with software), you need a second buffer/period to prepare samples in ahead of time.

Regardless, in the case of WASAPI exclusive mode, the client (OpenAL Soft) has no say in the number of periods, it's always 2.

When using just that config (removed stuff setting frequency/stereo/headphone/HRTF), I noticed that setting the speaker config to 7.1 or Quadraphonic in the Windows sound panel clamps to an even lower value (144) compared to 160 in Stereo and 5.1. 🤔

Interesting, maybe some restrictions on the period sizes the hardware can use (for the number of bytes rather than the number of sample frames). Each period invokes a hardware interrupt, so there's probably limits on how often the interrupt can be fired while processing the hardware buffer.

@ThreeDeeJay
Copy link
Contributor Author

I'd be wary for performance concerns. It's fine if the user wants to play around and find what works for their system, but the defaults should reliably work. Exclusive mode will automatically drop to 2 periods (WASAPI itself forces this for exclusive mode), but the period size probably shouldn't act differently.

@kcat Then I guess we could just direct anyone who wants the lowest possible latency to the configs added in #1094 👌

It's a limitation of how the system works. The device needs to have samples to play while more samples are being generated. Just like how with graphics, you have one image buffer that's being physically shown on the monitor, while rendering the next image on a separate unseen image buffer. It's a classic double-buffer setup; write to one buffer while the other buffer is being presented to the user, so that when the other buffer is done being presented to the user, the next one is ready and gets swapped in seamlessly. Unless the code is poking physical hardware bits at precisely timed intervals for the DAC (which is typically at the hardware or firmware level, modern OSs don't give the necessary timing and performance guarantees to do it with software), you need a second buffer/period to prepare samples in ahead of time.

Regardless, in the case of WASAPI exclusive mode, the client (OpenAL Soft) has no say in the number of periods, it's always 2.

Does that mean ASIO actually uses 128 sample buffer when it says it's using 64, or does it just have more precise hardware timings that allow single period which WASAPI lacks even in exclusive mode?
@dechamps Might know more about it 👀

@dechamps
Copy link

dechamps commented Jan 8, 2025

Does that mean ASIO actually uses 128 sample buffer when it says it's using 64

Yes. ASIO uses a double buffering system. When you ask ASIO for a buffer size of N, that refers to the size of a single buffer, and the total buffer size is N*2. This is indeed a bit confusing - ideally ASIO should have called this number something like "periodicity" (which is the term other APIs like WASAPI use), i.e. the application gets a callback every N samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants