-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix increased audio latency and increased pitch after fast-forwarding ends #12038
base: master
Are you sure you want to change the base?
Conversation
7bf06aa
to
143bee7
Compare
I'm not a coder, but all the debug commits that are reverted and their reverts should be squashed out. The commits that actually do different things and help with making it easier to review should stay. |
143bee7
to
fe41898
Compare
I've removed the debugging commits, and squashed away the previous versions of my code which used a different atomic structure (keeping it would pollute git logs with WIP code I later replaced). For posterity I've pushed a tag to the full history in my fork (https://github.com/nyanpasu64/dolphin/tree/fix-ffw-audio-latency-4). |
This looks a lot easier to review at a glance. I think Filoppi said they'd review it next week, I'll defer any comments until after that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good, I left some minor comments.
This should be tested on Windows and maybe a video showcasing the old vs new behaviour should be recorded, so we can all judge whether the new behaviour is better.
I worked on audio stretching a lot in the past, and this PR doesn't really affect that, but I do indeed remember there was some audio problems when releasing the fast forward button if stretching wasn't enabled.
I do think dropping samples is a good solution over here, it's likely better than keeping a slightly increased audio speed for ~10 seconds, though the best solution might be to quickly play back all the extra samples within 0.2 or so, which IIRC is what I implemented in my WIP PR #8983 (I'm planning to get back at it in the next couple months).
I'm also not sure about the s_is_throttler_temp_disabled
atomic change, but I doubt it will damage anything.
@@ -88,6 +87,14 @@ unsigned int Mixer::MixerFifo::Mix(short* samples, unsigned int numSamples, | |||
(static_cast<u64>(m_input_sample_rate_divisor) * 1000); | |||
low_watermark = std::min(low_watermark, MAX_SAMPLES / 2); | |||
|
|||
// Throw away extra audio when the writer exits fast forward. | |||
if (should_resync && numLeft > low_watermark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this branch should be skipped if auto stretching is eanbled? We never really want to throw away samples in that case, we prioritize playing all of them, even if they play at a different "speed".
EDIT: nevermind, this is already skipped is audio stretching is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we rename consider_framelimit
to something else? Though I don't have a better name, since I don't understand the fixed-pitch time-stretching path (and TBH I've forgotten the details of the buffer-resampling path as well).
@@ -371,6 +371,11 @@ void CoreTimingManager::Throttle(const s64 target_cycle) | |||
const TimePoint min_deadline = time - max_fallback; | |||
const TimePoint max_deadline = time + max_fallback; | |||
|
|||
if (speed <= 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of explained in the PR description already, but could you further explain what it is for?
Also, should we skip this branch if audio stretching is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously
m_throttle_deadline
was not being updated during fast-forward, causing the emulator to try to render as many frames as possible in zero seconds. And when you released fast-forward, the emulator would continue to fast-forward for an emulated time equal to the physical time you held the key, waiting form_throttle_deadline
to catch up to the physical time. This was only limited when the "System can not to keep up with timings!" fallback would keep the target timebase 100ms before the current time, so the emulator would never fast-forward more than 100ms emulated time after you released Tab.
In my opinion, the previous behavior of fast-forwarding the emulated game for up to 100ms of emulated time after releasing the fast-forward key, was a bug. And you should stop fast-forwarding (begin throttling emulation using the current host time) as soon as you release the fast-forward key, with or without audio stretching enabled. This change accomplishes it.
Though another route to take would be to neither write nor read m_throttle_deadline
while fast-forwarding, and when transitioning from fast-forwarding to regular speed, assign m_throttle_deadline = time
(either before or after m_throttle_deadline += cycles...
) to begin throttling from that time (my current PR's code sleeps until the time
of the final fast-forward plus the cycles
of the current non-fast-forward emulation block). These approaches will have subtly different timing characteristics when ending fast-forward.
- It seems
CoreTimingManager::Advance()
calls throttle before running a computation at a given emulated/real time? I'm still not sure what theAdvance()
doc comment and PowerPC downcount mean.
Reassigning m_throttle_deadline
after ending fast-forward, requires adding a variable to cache whether fast-forward was enabled or not on the previous Throttle call. I'm not sure if the result has more or less "correct" timing behavior, but realistically there's no difference if we advance timing thousands of times per second.
Adding a "was fast forwarding" variable adds code complexity, but the same variable can be used for "making CoreTimingManager inform the mixer when CoreTimingManager itself stops fast-forwarding" (another proposal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downcount is a variable the JIT uses to keep track of how many emulated clock cycles are left until Advance needs to be called again. For instance, let's say that Advance is called and it decides that 2000 cycles are left until the next scheduled event. It will then set downcount to 2000 and return control to the JIT. As the JIT executes code, it will decrease the downcount variable little by little, and between each JIT block, if it notices the downcount is 0 or negative, it will call Advance again.
{ | ||
numLeft = m_numLeftI = low_watermark; | ||
indexR = (indexW - 2 * low_watermark) & INDEX_MASK; | ||
ASSERT(numLeft == static_cast<float>(((indexW - indexR) & INDEX_MASK) / 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this verifying? A small comment would help in case this assert actually triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally the code computes numLeft = ((indexW - indexR) & INDEX_MASK) / 2
. When ending fast-forward and throwing away all but low_watermark
frames of audio, we assign numLeft = low_watermark
then rewrite indexR
to low_watermark
frames (or 2 * that samples) before the end of available audio. This assertion is just to check that I got my math right, and the indexR
I write has the correct number of samples left; I copied the formula from the initial point we calculate float numLeft
. (I'd hope that 2 * low_watermark
is always less than INDEX_MASK
, if not the current code would already break?)
Should I keep this assert or not? I'm not sure how to comment it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep it, and add something like "not enough samples to continue"
Source/Core/AudioCommon/Mixer.cpp
Outdated
m_streaming_mixer.Mix(samples, num_samples, true, emulation_speed, timing_variance); | ||
m_wiimote_speaker_mixer.Mix(samples, num_samples, true, emulation_speed, timing_variance); | ||
m_skylander_portal_mixer.Mix(samples, num_samples, true, emulation_speed, timing_variance); | ||
const bool is_ffw = Core::GetIsThrottlerTempDisabled(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, variables shouldn't use acronyms. I might be wrong but I think it would be better. ffw
is not clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, directly reading Core::GetIsThrottlerTempDisabled()
here is likely fine, but maybe it could be passed in in a more thread friendly way? No big deal anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually determined when to reset audio on the main thread (though not in CoreTiming which actually handles fast-forwarding), in an earlier version of this code (nyanpasu64@61e7b78).
The existing code updates
s_is_throttler_temp_disabled
on HotkeyScheduler but reads it (withGetIsThrottlerTempDisabled()
) on the CPU and Video threads.
I replaced s_is_throttler_temp_disabled
with an atomic to make it safe to access on multiple threads (which is already happening, with or without my PR merged). But perhaps it's still more reliable for the main thread to set a flag (read by the mixer thread) when the main thread starts and stops fast-forwarding.
- You'll most reliably detect when fast-forwarding starts and stops by making
CoreTimingManager
inform the mixer whenCoreTimingManager
itself stops fast-forwarding. But this is a nasty entanglement of concerns (CoreTimingManager
shouldn't care about audio mixer resyncing?). Should I implement this change or not? - How does other code read
s_is_throttler_temp_disabled
? Right now,VideoConfig.cpp#IsVSyncActive()
callsGetIsThrottlerTempDisabled()
rather than checking whetherCoreTimingManager
has started or stopped fast-forwarding. Should we disable vsync for a frame if the user starts and stops fast-forwarding within one frame, causing a sub-frame shift in emulation phase relative to vsync? This is somewhat theoretical and unrelated to my code changes. I still hope that something like Add closed-loop latency control for Vulkan vsync #12035 can get merged.
I'll do that when I have a chance.
I could record two videos, though editing them into one file will be tricky to figure out and involve a lot of frantic ChatGPT "how to join videos in ffmpeg", and might be harder to compare? (not sure)
Playing back samples faster will result in an increase in audio pitch. I believe that garbled/dropped audio during fast forward, followed by increased pitch after releasing fast-forward, will be more jarring to the user than this PR (which produces garbled/dropped audio during and after fast-forward). I do wonder if we should enable time-squeezing (time-stretching but compressing) during fast forwarding.
something something mdfourier wii (I still haven't gotten around to it :(
clever.
Is this necessary? I don't know. With the current audio mixer on Linux, outside of game startup or Dolphin lagging I didn't observe any noticeable pitch change, since I didn't review your code. But do you handle homebrew which outputs DSP audio at 48khz (the official SDK ran at 32khz)?
|
You don't need to make a single movie file recorded from dolphin. Just record your desktop or something. I'm not sure what you mean by time squeezing (stretching?), but I don't think it's necessary if the user didn't enable audio stretching. I think the adaptive mixer latency was helpful yes, though we can further discuss that in the other PR. |
Previously `m_throttle_deadline` was not being updated during fast-forward, causing the emulator to try to render as many frames as possible in zero seconds. And when you released fast-forward, the emulator would continue to fast-forward for an emulated time equal to the physical time you held the key, waiting for `m_throttle_deadline` to catch up to the physical time. This was only limited when the "System can not to keep up with timings!" fallback would keep the target timebase 100ms before the current time, so the emulator would never fast-forward more than 100ms emulated time after you released Tab. Instead update `m_throttle_deadline` to track wall-clock time, so emulation resumes throttling as usual once fast-forward ends.
Fixes excessive audio being buffered and audio going sharp after fast-forwarding.
fe41898
to
dde6a61
Compare
I've commented the audio skipping code and assert (though differently from your suggestion). Is that sufficient?
Oddly on Windows exclusive WASAPI and Cubeb, audio goes sharp when the game (specifically Super Monkey Ball 2) starts, whereas on Linux (IIRC) PulseAudio it went flat in my earlier testing. I don't know if that's because I had tested on a slightly older base commit in master. I often get lag during an in-game loading screen that reads data from the ISO, because I put my games on a secondary hard drive which apparently spins down during inactivity. Afterwards the audio often goes sharp once Dolphin fast-forwards to catch up.
If we did so, after Dolphin lags and then speeds up to catch up, it will skip audio once. I'm not sure if this is more or less noticeable than a period of increased audio latency, accompanied by slightly sharp sound (depending on the length of the lag spike, either briefly or up to ~8 seconds then slowly returning to normal). I'd say skipping audio is arguably more distracting to the user, unless you only do it upon silence. Honestly I think the best way to prevent audio from piling up is to reduce how long CoreTiming fast-forwards games after a lag spike, but that's a topic for another day. I verified the PR works on Windows as well. I've made some video recordings with resampling ratio debug prints enabled:
The pitch shift is slightly perceptible. When playing Mario Sunshine, the increased audio latency after fast-forwarding was more obvious:
|
When fast-forwarding emulation, the CPU thread fills the audio buffers with sound faster (
Mixer::MixerFifo::PushSamples()
) than sound gets played back (Mixer::MixerFifo::Mix()
). When the buffer fills upPushSamples()
drops incoming samples until enough samples get read to make room for pushing a new buffer.When fast-forwarding finishes, the excessive buffer contents were never cleared. This caused audio to remain playing at an elevated pitch for 10 seconds (assuming a 48 kHz system audio rate), slowly dropping to normal over ~10 more seconds.
In this PR, I changed
Mixer::MixerFifo::Mix()
to detect when fast-forwarding ends, and reset the read pointer to (low_watermark
= target latency) (audio frames = samples not * channels) before the write pointer, and reset the current and smoothed latency values (used to calculate resampling ratio) to match. This causes a pop in the audio, but it's fine because during fast-forward the audio is already garbled. (The modified code should never be run when audio stretching is enabled andconsider_framelimit=false
.)I also had to change
CoreTimingManager::Throttle()
to properly throttle emulation immediately when you released the fast-forward key, so it wouldn't generate more audio after the mixer jumped ahead to catch up. Previouslym_throttle_deadline
was not being updated during fast-forward, causing the emulator to try to render as many frames as possible in zero seconds. And when you released fast-forward, the emulator would continue to fast-forward for an emulated time equal to the physical time you held the key, waiting form_throttle_deadline
to catch up to the physical time. This was only limited when the "System can not to keep up with timings!" fallback would keep the target timebase 100ms before the current time, so the emulator would never fast-forward more than 100ms emulated time after you released Tab.After fast-forwarding finishes, audio resumes playing at the perfect latency (offset is at ±single digits as opposed to 200 previously). When combined with #12035, with vsync enabled, audio may drop in pitch for 10 seconds or so after fast-forward ends, because emulation is being slowed down (generating less audio) because it's generating frames too soon relative to vsync.
Tested on Linux X11.
Concerns
ASSERT(numLeft == static_cast<float>(((indexW - indexR) & INDEX_MASK) / 2))
?m_dma_mixer
(the main game audio synthesized in real time) when fast-forward ends. Ideally you'd resynchronize all of them to gameplay.Ideally all mixers would share a locked resampling offset to prevent them from going out of sync with each other, which could be a problem in DTK-based games with sound effects or user inputs synchronized with audio. But in practice, since the mixers are generated and consumed in sync, I'd hope it's unlikely for them to desynchronize in normal gameplay (and this PR prevents them from desyncing after fast-forwarding). And since the mixers inherently run at different rates and have different durations (in real time), I don't think there's a good way to keep them synced once they buffers overflow or underflow.
Mixer::PushSamples
to missCore::GetIsThrottlerTempDisabled() = true
even thoughCoreTiming
observed it. This is unlikely to occur in practice, sinceAudioCommon::SendAIBuffer()
is supposed to execute at 4khz (0.25ms) whileHotkeyScheduler::Run()
updates hotkey state every 5ms.s_is_throttler_temp_disabled
on HotkeyScheduler but reads it on the CPU and Video threads. This is a data race. My code adds a new read of this flag on the CPU thread.s_is_throttler_temp_disabled
directly inMixer::MixerFifo::Mix()
, avoiding them_resyncReader
song and dance altogether? This also has the positive side effect of making all active mixers resynchronize themselves automatically, whereas right now I only tellm_dma_mixer
toEndFastForward()
.Mixer::PushSamples
observes the change. I'll have to check whether this resets the buffer beforePushSamples()
finishes running too quickly (EDIT: it's ok). (Though even my current code doesn't wait forCoreTimingManager
to see the change, and it still works well in practice.)Mixer::Mix
so we only read the value once, so all mixers either reset together simultaneously or don't reset.Common::Flag
because I'm unfamiliar with usingFlag
, and I think my code is more efficient than the current implementation. If necessary I could replace it with the existing flag code (it'll probably not be noticeably slower, considering we don't resample audio very often).Speeding up atomic flags
I'm actually not sure if you can change
Flag
while keeping it 100% compatible with the current behavior (unlimited number of threads, sequential consistency, etc.). I have some ideas on how it can be more optimized around common use cases (two threads only, so I'm not sure it can safely replace all use cases). It would be interesting to benchmark to see if any change in FPS is detectable, but I don't know how to benchmark Dolphin.Flag
with acquire/release orderings is correct.TestAndClear/Set()
, you can probably perform an acquire read, and onlycompare_exchange_strong
(and return whether the value was flipped) if the value is different from the target.TestAndClear()
, and all other threads only set the value totrue
, you can replacecompare_exchange_strong
with an acquire fence followed by a write (most likely a relaxed write, you can also use release, though I think other threads generally don't care what you were doing before you tested/set the flag).TestAndClear()
and perhapsClear()
to both erroneously indicate they cleared the flag.