Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Capture and Record AudioEffect bugs for surround systems #92532

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BuzzLord
Copy link

@BuzzLord BuzzLord commented May 30, 2024

The AudioEffectCapture and AudioEffectRecord classes do not work properly when there is more than one audio channel active. This PR makes some changes to several AudioEffect classes to fix these bugs. Fixes #91133.

I added set_current_channel to AudioEffectInstance to keep AudioServer::_update_bus_effects clean, since I wanted that functionality for capture and record. It was already part of compressor, and seems like something that could be useful for other effect instances if they needed it. The default implementation is a no-op.

Added an active_channel property to both AudioEffectCapture and AudioEffectRecord. Capture uses it to return early from process so that only the active instance writes data to the buffer.

Record now stores all instances (that are registered) by channel, and when changing the active_channel, it changes the internal current_instance to reflect that.

The active_channel property is exposed in the editor for both classes. It doesn't currently update the available drop-down options to reflect the actual number of channels available in the AudioServer. The set_active_channel functions also don't explicitly check for the validity of the p_channel args, but they do work with invalid inputs (it just disables capture/recording).

Update (2024/01/18): After some discussion, I decided to change how this PR works, removing the active_channel property. The core capture/recording logic has been moved into the respective AudioEffectInstance classes, with the original AudioEffect methods and properties copied to them to maintain the same functionality. The base AudioEffect interfaces are still in place, but now pass through to the channel 0 instance of the effect.

So all channels are available for capture at the same time, accessed via the instances, but continue to work as is by using the base effect, which mirrors the channel 0 instance.

@BuzzLord BuzzLord requested review from a team as code owners May 30, 2024 01:09
@Chaosus Chaosus added this to the 4.3 milestone May 30, 2024
@BuzzLord BuzzLord force-pushed the audioeffect-surround-bug branch 2 times, most recently from 3dca84f to c3f8750 Compare May 31, 2024 08:30
@BuzzLord
Copy link
Author

Is anyone able to take a look at this? I can appreciate it might be hard to test without an affected audio setup. I have a (crude) change to the Audio Dummy Driver that could help with testing, if the audio setup is too much of an issue. The change just exposes the speaker mode as a configurable parameter when the dummy driver is active in the editor.

@Mickeon
Copy link
Contributor

Mickeon commented Jun 21, 2024

Letting the tests run.
There's very few maintainers that are keenly aware of the audio side of Godot so this may take a while to be taken a look at.

@akien-mga akien-mga modified the milestones: 4.3, 4.4 Jul 31, 2024
@OpenFormEon
Copy link

Is anyone able to take a look at this? I can appreciate it might be hard to test without an affected audio setup. I have a (crude) change to the Audio Dummy Driver that could help with testing, if the audio setup is too much of an issue. The change just exposes the speaker mode as a configurable parameter when the dummy driver is active in the editor.

I have a game where the entire gameplay is recording and playing back audio via AudioEffectRecord from players' microphones, and many players have reported that the recording doesn't work. Input is working, because they can hear themselves when the bus isn't muted and AudioEffectSpectrumAnalyzer works just fine. Only the recording is failing. Do you think my game's issue could also be due to the bug you describe? If so, what's the best way to try out your fix in my version of Godot? Apologies, I'm unfamiliar with both Github and how to change the Godot Engine's own code.

@Nihlus
Copy link

Nihlus commented Aug 24, 2024

I'm also experiencing this - as long as the output is a stereo 2.0 channel, recording works fine, but any HiFi / Surround configuration makes AudioEffectCapture unusable for my use case (speech to text).

@BuzzLord
Copy link
Author

I have a game where the entire gameplay is recording and playing back audio via AudioEffectRecord from players' microphones, and many players have reported that the recording doesn't work. Input is working, because they can hear themselves when the bus isn't muted and AudioEffectSpectrumAnalyzer works just fine. Only the recording is failing. Do you think my game's issue could also be due to the bug you describe?

I do think this is the bug. AudioEffectRecord only records from the last channel (because of the bug), which on anything more than a stereo system will be silent (microphones normally only record stereo to the first channel). This should fix it.

The SpectrumAnalyzer is one of the only effects that is channel aware, which is why it works. You select the effect instance (which is per-channel) with AudioServer.get_bus_effect_instance(bus, effect), but there's a third channel argument that defaults to 0 (the main stereo channel).

If so, what's the best way to try out your fix in my version of Godot? Apologies, I'm unfamiliar with both Github and how to change the Godot Engine's own code.

Unfortunately I'm new here too, and am not sure what the best way to test would be. I setup a dev environment to build and test my own releases, but I don't think sharing executables is the way to go. Maybe one of the main contributors can chime in.

I'm also experiencing this - as long as the output is a stereo 2.0 channel, recording works fine, but any HiFi / Surround configuration makes AudioEffectCapture unusable for my use case (speech to text).

If this is the bug, the audio should sound very garbled and static-y. A work around I use in my own project (in GDScript) is to pull frames from the capture buffer in chunks of 512. Then before passing that array on, I loop over it, and check that both channels of every frame is exactly equal to 0.0. If so, I skip that chunk. This works because the AudioServer processes 512 frames at a time, per channel, so your good data should be in amongst 512 frame chunks of silence.

@BuzzLord BuzzLord force-pushed the audioeffect-surround-bug branch from c3f8750 to b5a57de Compare September 2, 2024 04:21
@BuzzLord
Copy link
Author

BuzzLord commented Sep 2, 2024

@OpenFormEon I was looking at the Godot Contribution Workflow docs and see that you can download the editor with this PR applied for testing, once the workflow/tests are run on it again, or by using the nightly.link website. Details available here.

@brahma0545
Copy link

is there any solution, I have the same problem in my Ubuntu 24 the same code runs fine in Windows.

AudioEffectCapture get_buffer() will return only [(0,0), (0,0), (0,0)..... (0,0)]

@BuzzLord
Copy link
Author

is there any solution, I have the same problem in my Ubuntu 24 the same code runs fine in Windows.

AudioEffectCapture get_buffer() will return only [(0,0), (0,0), (0,0)..... (0,0)]

From the description, I'm not sure this is the same bug. It should be OS agnostic, but maybe the Ubuntu audio driver being used is a surround sound configuration, whereas the Windows system is stereo.

The effect on AudioEffectCapture is to have some of the get_buffer() contents be good, but some be 0.0 frames. If all the frames are empty, that might be another issue. The AudioEffectRecord will return all empty frames, though.

Are you able to pass the audio you're trying to capture out the master audio bus, to confirm it's actually the audio effect that is at fault, rather than the audio device or driver configuration?

doc/classes/AudioEffectCapture.xml Outdated Show resolved Hide resolved
doc/classes/AudioEffectRecord.xml Outdated Show resolved Hide resolved
@BuzzLord BuzzLord force-pushed the audioeffect-surround-bug branch from b5a57de to 08f92e9 Compare December 5, 2024 01:47
@BuzzLord BuzzLord force-pushed the audioeffect-surround-bug branch from 08f92e9 to 65e7382 Compare December 12, 2024 15:58
@Emaxoso
Copy link

Emaxoso commented Jan 14, 2025

Hi!

Instead of adding the "set_current_channel" method to the effect, wouldn't it be better to pass the current channel to the process method?

By changing the method signature like this?
virtual void process(const AudioFrame *p_src_frames, AudioFrame *p_dst_frames, int p_frame_count, int current_channel);

Also, instead of selecting a single channel for the capture effect, wouldn't it be better to move the RingBuffer inside the effect instance?
This way, it would be possible to access the frames of each channel instead of registering a capture effect and enabling it for each available channel.

The methods to retrieve the buffer could be moved to the instance (accessible through AudioServer.get_bus_effect_instance()), and they could also remain on the "total effect," which would handle combining all instance buffers and averaging all frames.

@BuzzLord
Copy link
Author

Instead of adding the "set_current_channel" method to the effect, wouldn't it be better to pass the current channel to the process method?

By changing the method signature like this? virtual void process(const AudioFrame *p_src_frames, AudioFrame *p_dst_frames, int p_frame_count, int current_channel);

That could work... I decided on set_current_channel for a few reasons:

  • a set_current_channel was already being used by one effect (compressor) so there was precedence, and an existing place in the AudioServer code where it was set
  • each effect instance is only ever on a single channel that never changes, so setting it at instantiate seemed sensible
  • I didn't want to add extraneous parameters to other existing functions (e.g. process, effect instance instantiate) to minimize the impact on existing code

I felt set_current_channel was the simplest means of achieving that (i.e. effects that don't need it don't have to implement it; nor do they have to update their process or other inherited functions to accommodate a new parameter).

Also, instead of selecting a single channel for the capture effect, wouldn't it be better to move the RingBuffer inside the effect instance? This way, it would be possible to access the frames of each channel instead of registering a capture effect and enabling it for each available channel.

The methods to retrieve the buffer could be moved to the instance (accessible through AudioServer.get_bus_effect_instance()), and they could also remain on the "total effect," which would handle combining all instance buffers and averaging all frames.

This was actually something I had thought of after I finished the current implementation, and was hoping someone might bring up. I do think it could be better by moving the buffer into the instance. My initial criticism of the idea was I didn't want to add the burden of storing up to 4x the memory for the buffers... but actually writing it out now, that seems a little silly. The buffers are tiny, relatively speaking, so it's probably not as much of an issue as I initially thought.

I'm not sure about the averaging frames, though. That doesn't make sense to me from an audio processing point of view. I think a better option would be for the main audio effect to just return the channel 0 instance data. So using the effects works as intended for 99% of cases (and existing code works without change), but if you need other channels, you can access them via the get_bus_effect_instance.

This change would also mean I could remove the set_current_channel function from AudioEffectInstance.

I like this solution better than the current one, so I'll implement it on another branch (for now), and see how it turns out. In the meantime, any contributors/reviewers (or anyone else) have thoughts or questions about it?

@Emaxoso
Copy link

Emaxoso commented Jan 14, 2025

As for the "set_current_channel," even if it's no longer used, I would suggest keeping it as a standard so that future effects can generally know which channel they are on.

As for averaging all the ring buffers into a single one, I agree with you. We can start with just channel 0, as long as the individual buffers are accessible—processing the audio in other ways won’t be complicated.

@BuzzLord BuzzLord force-pushed the audioeffect-surround-bug branch from 65e7382 to e2b80a9 Compare January 19, 2025 02:51
@BuzzLord
Copy link
Author

Updated the PR. The changes didn't feel significant enough to justify making a different PR, so I just pushed them here.

I removed the active_channel property, and moved the core capture/record logic from the AudioEffects into the AudioEffectInstances. I duplicated the effect methods and properties on the instances, and changed the base effects to pass through to the channel 0 instances. I did keep, and ended up needing, set_current_channel in order to keep track of the channel 0 instances in the base effect. Because of the way cleanup is done with reference counting, I can't keep a reference to the base effect in the instance, and a reference to the instances in the effect, so I clear the base reference once it's no longer necessary.

I rearranged the cpp files a bit to hopefully give an easier to follow order to them. I put effect instance methods at the top, starting with process, then other important methods, then other setters/getters, and finally _bind_methods and constructor/destructor. Then the base effect methods are after, following the same pattern. It makes the PR kind of messy, but hopefully the long term benefits outweigh the mess.

I updated the built in documentation for the affected classes, but I'm not particularly happy with it, and would welcome changes or feedback on it.

I changed AudioEffectInstance to now expose set_current_channel in GDExtension/GDScript/C#, to match the existing process and process_silence methods.

@BuzzLord BuzzLord force-pushed the audioeffect-surround-bug branch from e2b80a9 to 30d5222 Compare January 19, 2025 12:14
@BuzzLord BuzzLord force-pushed the audioeffect-surround-bug branch from 30d5222 to 49049f9 Compare January 19, 2025 20:16
@Emaxoso
Copy link

Emaxoso commented Jan 25, 2025

It seems really good to me. What's the process now to merge this fix?

@BuzzLord
Copy link
Author

It seems really good to me. What's the process now to merge this fix?

Based on the PR guidelines, I think it just needs review and approval by a maintainer. From the teams page, @ellenhp and @reduz are listed as the audio maintainers, though it might be possible someone else could review and approve it.

I've assumed this bug is niche enough that it hasn't been a priority by the team, and I've actually been able to work around it myself in my own project, but maybe just having a few people draw attention to it (e.g. in the contributors chat) would be enough to get someone to look at it.

@Emaxoso
Copy link

Emaxoso commented Jan 26, 2025

Out of curiosity, what workaround did you use to avoid this bug?

@BuzzLord
Copy link
Author

Out of curiosity, what workaround did you use to avoid this bug?

I'm using AudioEffectCapture for mic capture, on its own Capture audio bus. I don't use AudioEffectRecord (and only discovered the issue with it after investigating the other effects to see how widespread the bug was). The mic only captures stereo sound (channel 0), leaving the other channels empty (i.e. all frames are 0.0).

The way AudioEffectCapture manifests the bug is by stacking all channels in series into the effect buffer every time through the mix step loop; i.e. the frames from each channel are appended to the same buffer queue. The AudioServer processes 512 frames at a time (which is hard coded, for now at least), so you can just pull data off the effect buffer in 512 frame chunks, and check if every frame is exactly equal to 0.0. If so, just ignore that chunk. The first channel will have actual data, but the subsequent chunks from the upper channels will have all zeroes (because it is mic data).

As long as you consistently pull 512 frames (or 256, or some smaller power of two), the buffer alignment should not desync, and this will let you filter the empty frames.

This trick doesn't work if you actually have audio in the upper channels, like capturing from a more general 'SFX' audio bus in a 3d game.

GDScript example:

# In audio/voice manager _process:
	while effect.get_frames_available() >= 512:
		var data: PackedVector2Array = effect.get_buffer(512)
		
		if check_empty_data(data):
			continue

		# ... use data

func check_empty_data(data: PackedVector2Array) -> bool:
	for i in range(data.size()):
		if data[i][0] != 0.0 or data[i][1] != 0.0:
			return false
	return true

@goatchurchprime
Copy link
Contributor

Knowing which channel the AudioEffectXXXInstance is attached is essential for the effects know where they stand in relation to the stream. It looks like every AudioEffectXXXInstance has a pointer to its Ref<AudioEffectXXX> base. Why not call this parameter _channel_number instead of current_channel, put it in the AudioEffectInstance base class and let them all be numbered in the _update_bus_effects() function? (At the moment only AudioEffectCompressorInstance has this parameter for the purpose of looking up its peer on another bus.)

By the way, since this whole thing is about obtaining the Microphone data (which is always 1 stereo channel which is incompatible with there being 4) and is another good reason we should access the Microphone entirely outside of the Audio system #100508

At the incoming end there are AudioStreamYYY sources which have an AudioStreamYYYPlayback which is a single channel. The AudioStreamMicrophonePlayback has a pointer to its corresponding AudioStreamMicrophone, which maintains a list of AudioStreamMicrophonePlaybacks, which isn't used. But these aren't duplicated by channel like the AudioEffects are.

It seems that the AudioServer::_mix_step() loops through a flat list of AudioStreamPlaybacks and calls mix(buf, pitch_scale, buffer_size), then the bus(es) are found for this AudioStream and finally we loop through the (up to 4 channels) and call _mix_step_for_channel(buf) on each of them with the same buf from that mix() function.

So I believe that there is only one AudioStreamMicrophonePlayback accessing the microphone per stream whose output is duplicated up into the four different streams in an 8 track audio situation. Unless I'm missing something, all 4 tracks should have the same Microphone data so it shouldn't matter which one we capture, as long as we capture only one of them.

@goatchurchprime
Copy link
Contributor

I am using your active_channel method in my case (called activeinstantiationid) on goatchurchprime/two-voip-godot-4#44 (comment) since I believe we are only interested in filtering out for the first channel (if the input originates with the single channel microphone). I'm using the instance_ids of the EffectInstances to identify the first one uniquely, and finding it on the first call of the EffectInstance._process() because it is always done in order in that loop.

You cannot rely on the order of instantiation to help you out, because it's all chaos and out of order with extra constructions and destructions when you drag another effect across the effect you are interested on the bus in the editor.

@BuzzLord
Copy link
Author

Knowing which channel the AudioEffectXXXInstance is attached is essential for the effects know where they stand in relation to the stream.

In general, I agree. Although there are many effects that don't need that info, since they're channel agnostic; e.g. filters, pitch shift, limiters, etc. I think that's why this issue wasn't identified earlier.

It looks like every AudioEffectXXXInstance has a pointer to its Ref<AudioEffectXXX> base. Why not call this parameter _channel_number instead of current_channel, put it in the AudioEffectInstance base class and let them all be numbered in the _update_bus_effects() function? (At the moment only AudioEffectCompressorInstance has this parameter for the purpose of looking up its peer on another bus.)

That is precisely what I am doing in this PR. I could rename the parameter, though channel_index would be more consistent with other existing names/documentation. I haven't changed it since I wanted to minimize the impact of my changes, and so reused the existing current_channel name from the AudioEffectCompressorInstance.

By the way, since this whole thing is about obtaining the Microphone data (which is always 1 stereo channel which is incompatible with there being 4) and is another good reason we should access the Microphone entirely outside of the Audio system #100508.

Although the microphone was where I discovered the issue, it's not the source of the problem. The bug will manifest when using any audio stream to produce sounds. The problem this PR fixes is to make the Capture and Record audio effects channel aware. I suspect (though haven't looked at the history) that they were written before multi-channel surround sound support was added to the engine, and just weren't tested sufficiently to identify the problems.

This PR will fix those existing effects to work properly, allowing users to capture/record any and all channels that they want (in parallel even). It will also provide audio effect developers the set_current_channel method in audio effect instance, giving the info they need to make their effects channel aware too.

At the incoming end there are AudioStreamYYY sources which have an AudioStreamYYYPlayback which is a single channel. The AudioStreamMicrophonePlayback has a pointer to its corresponding AudioStreamMicrophone, which maintains a list of AudioStreamMicrophonePlaybacks, which isn't used. But these aren't duplicated by channel like the AudioEffects are.

It seems that the AudioServer::_mix_step() loops through a flat list of AudioStreamPlaybacks and calls mix(buf, pitch_scale, buffer_size), then the bus(es) are found for this AudioStream and finally we loop through the (up to 4 channels) and call _mix_step_for_channel(buf) on each of them with the same buf from that mix() function.

So I believe that there is only one AudioStreamMicrophonePlayback accessing the microphone per stream whose output is duplicated up into the four different streams in an 8 track audio situation. Unless I'm missing something, all 4 tracks should have the same Microphone data so it shouldn't matter which one we capture, as long as we capture only one of them.

How the playback steams are mixed into the surround channels is determined in the audio stream player mix_target. It works by changing the per-channel volumes (in the AudioStreamPlaybackBusDetails), which is passed as a parameter into _mix_step_for_channel.

The default mix_target is Stereo, which copies the stream data into only channel 0. I believe the Surround option will distribute the stream data to all channels (which is what you describe), and Center seems to output the data to channel 1, which is the center/bass channel.

Unrelated note, I just noticed that AudioEffectStereoEnhance is probably broken for channel 1 (the 3.1 center/bass channel), because the effect is not channel aware either, and seems to assume the audio frame pairs always correspond to left/right, which is not the case for channel 1.

I am using your active_channel method in my case (called activeinstantiationid) on goatchurchprime/two-voip-godot-4#44 (comment) since I believe we are only interested in filtering out for the first channel (if the input originates with the single channel microphone). I'm using the instance_ids of the EffectInstances to identify the first one uniquely, and finding it on the first call of the EffectInstance._process() because it is always done in order in that loop.

You cannot rely on the order of instantiation to help you out, because it's all chaos and out of order with extra constructions and destructions when you drag another effect across the effect you are interested on the bus in the editor.

I think once this PR is merged, you could change your activeinstantiationid to use the set_current_channel call, which will consistently give you the channel index that the effect instance is being applied to. The instance could then pass it to the base effect in process (e.g. just before base->process()) to modify how the effect handles the frames.

Alternately, you might be able to set your mic audio stream player mix_target to Surround, and just choose one instance_id to pull data from.

@goatchurchprime
Copy link
Contributor

Okay, I see that you are calling set_current_channel() as a virtual function in the base AudioEffectInstance, so that it is up to each derived class to decide whether or not to record it. This does make it possible to record it in a GDExtension defined AudioEffect as you noted.

However, I still favour defining this parameter in AudioEffectInstance base class without that call function so that it is available to all AudioEffects. The more we look into it the more that most of the effects would probably benefit by knowing the channel they are on. (You spotted one case with the AudioEffectStereoEnhance, for example.)

Unfortunately the core team claims they do not have the confidence to merge audio system PRs right now, so we may need to make some squeaky wheel sounds to get any attention at the right moment after the 4.4 release.

The big list of Audio issues is here: #76797 We may need to find a way to volunteer to go through them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AudioEffectCapture with surround sound audio artifacts
10 participants