Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for: ogg importer dialog rounds loop offset values to 3 digits, introducing a sampling imprecision that can lead to an audible pop on loop. #87554

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RustyRoboticsBV
Copy link

@RustyRoboticsBV RustyRoboticsBV commented Jan 24, 2024

Fixes #87427 for ogg and mp3 files, by introducing a toggle that allows the user to enter their loop offsets in either seconds or in samples. AudioStream classes are mostly left unchanged, except that I had to add 2 methods (get_sample_rate and sample_to_seconds) to make this work. Wav stuff was left unchanged.

I also took the liberty to make the importer dialog a little nicer to use, by making the loop offset field a little larger, and making loop-related stuff invisible when looping is disabled.

Disclaimer: I'm a total git newbie, so hopefully I did everything right.

@lyuma
Copy link
Contributor

lyuma commented Jan 24, 2024

Ok I know I am not an audio contributor but I do review some of the asset pipeline changes and I want to understand this change a bit better.

How does this relate to bpm support such as #63265? My fear is making the loop offset be in samples will make it more difficult to calculate the beat syncing which is in units of BPM (seconds)

I'm ok with making this use samples internally to be consistent with wav, but asking the user to type samples is not good UX (and if they use seconds this bug is not fixed)

If it shows seconds, we should not be setting a step on the SpinBox (or EditorSpinSlider), or if we do use a step, it should be within 1/sample rate but this looks ugly too)... it should be possible to pick a value for step size that guarantees no precision loss at common sample rates (for example, 1/100000)

i do not like a boolean that affects the meaning of another value. I would rather deprecate the existing seconds loop? Instead, add a new loop_offset_samples and have an interactive UI for playing or inputing a loop point, with options for samples, seconds or beats (in BPM units)

Generally speaking, I would hope for some way to make this more interactive.

Why is a bad seek offset causing a pop? Being off by one sample shouldn't cause a pop. It must be off by far more than 1 sample... if the issue is that the spin slider is set to 0.001 that is 1/1000 which is 48 samples, that is far too coarse to accurately pinpoint a loop point unless we are able to round to beats.
If so, we could start with a simpler change and use 0.00001 for the spin slider or SpinBox (why is it not using EditorSpinSlider?) and see if that solves the popping issue. Then this change boils down to UX improvement (selecting beats)

The presence of pops when we get this wrong is not ideal. When I seek youtube videos, I don't hear pops. And when I seek haphazardly in DJ software it usually rounds to the nearest beat which perhaps avoids pops too. I wonder if it might be good to make godot more fault tolerant to bad seek points.

@RustyRoboticsBV
Copy link
Author

RustyRoboticsBV commented Jan 25, 2024

Hey, thank you for your reply.

This PR should not interfere with BPM support. It basically only expands the function of the importer's loop offset field, while the AudioStream classes are mostly left unchanged (beyond the two added getter methods). In my view, both beat-based loop points and sample/second-based loop points are desirable features; the former being more useful for songs while the latter is more useful for things like background noise that don't follow a clean beat structure.

And yes, the issue is indeed that the 0.001 step is too coarse. My initial thought was thus of course to simply increase the step value of the seconds field like you suggest, but this introduced some really weird rounding issues on reimport that utterly confounded me (please see issue page #87427, I get the impression you may know more about UI stuff than I do, and I'd really like to address it). I discussed with @bs-mwoerner, and we felt that adding a toggle for sample-based inputs is a good compromise for now.

As a composer I actually generally prefer using samples over seconds for situations where using beats is not possible or desirable, but that's a personal preference thing.

I like your idea of a more interactive unit selection scheme, though maybe this is beyond the scope of this PR? Could this be a good idea for a feature proposal perhaps?

@bs-mwoerner
Copy link
Contributor

First of all: We can and should discuss what the best way is to achieve this, but I am indeed a strong supporter of the idea itself. "Strong supporter" as in my current workflow requires me to edit media in external software to accomodate a limitation of the Godot Editor UI (i.e. a limitation that's not in the Engine itself). Godot is a fantastic piece of software and this is by no means a showstopper, but the workaround feels a bit irksome after some time and there's no doubt Godot could be even better if that workaround would not be necessary.

So, before going into the discussion of how, I'd like to address the why in more detail.

Why do you care as much about where exactly your loop point is?
I see two distinct use cases:

  1. Working with recorded audio or audio provided by a third-party where you guesstimate a suitable loop point by fiddling around with the settings until it sounds right. For this use case, seconds is a good unit and millisecond resolution is probably adequate.
  2. Working with generated audio (digital music, synthesized sound effects) that has been produced to seamlessly loop between two specific positions in the stream. For this use case, seconds is a bit of an awkward unit, because integral stream positions will generally not convert to clean time fractions, and millisecond resolution is not nearly enough to identify a specific point in the stream (when thought of as a sequence of audio samples).

Note that case 1 can transition to case 2 when you want to make your loop cleaner by externally baking a little fade in. Then you do require the loop to line up with the fade you created.

A millisecond covers about 40-50 samples. Can you even hear that difference?
Very much. I recently fixed a bug in the WAV stream (not yet merged I think) where users reported cracks in a loop that should be seamless and the issue was that when subsampling at the end of the loop, the stream interpolated with the first sample after the loop instead of with the sample at the loop start. You might argue that this was an error of less than one sample and it still produced an audible defect in the playback.

The technical reason is that while you can't identify individual samples when listening, discontinuities in the waveform caused by a sudden change in the value of the curve produce an impulse in the signal, which translates to a very short but high energy noise (i.e. a signal containing all frequencies) and can be clearly heard even at such short durations. That's why audio "noise" can be generated by filling a stream with random values. This will produce a series of impulses at varying intensities.

Bonus fact: Digital audio obviously uses numbers to represent the wave form, so it's not actually a curve, it's a series of small stairs, all with a tiny vertical flank to the next sample. This actually produces noise similar to the imperfect loop crack and the intensity of that noise depends on the quantization of the signal. At 16 bits that quantization noise is at about -96 dB, so not really a factor, but for a 1-bit quantized waveform (which surprisingly is enough to understand speech), this quantization noise is very noticeable.

But does looping need to be perfect in the first place?
The thing is that it can be perfect from the source media's point of view and it can be perfect from the engine's point of view. It currently is imperfect purely because the editor does not let you select the correct loop point, which can feel a bit frustrating.

Then how about we add a bit of fading around the loop points to hide the pop? Isn't that what video players do when you seek in the stream?
Please don't do this generally. If the stream has been engineered so that the ends match perfectly, lining them up wrongly and then adding a big blur on top to hide that fact is not an ideal solution. If you have arbitrary audio, on the other hand, such as when seeking in a video stream or looping audio without a dedicated loop point, then there is no "right" way to align the parts and that "blur" is indeed very welcome.

So what about the bpm settings? Can't you use those instead?
In theory, this would work for music that includes no tempo changes and has no lead in, which, to be honest, is probably true for the majority of game music. In practice, however, from my own experience and from looking at the code, I'm not entirely sure what the bpm setting does. It seems to me that it's not related to looping (that's still determined entirely by the "Offset" setting) but rather crops the stream at the end. Unless I misunderstand this feature (which is very possible), this is meant for cases in which you want to loop or stop before the actual end of the file (specified in beats/bpm). It has no effect (positive or negative) on your ability to set the loop point (still specified in milliseconds).

Isn't "samples" a very weird unit? Can't we just allow more digits in the "seconds" input?
From a technical point of view, "samples" is the "native" unit: A stream is a list of samples and each sample in the stream has an index, just like every byte in a file has an index. You can convert this to a time unit by doing some floating point math, but knowing the percise index and having to convert it to some weird floating point number is about as fulfilling as trying to advance 27 bytes in a file with a MoveFilePointer(double distance) function that takes the distance to move as "a fraction of the file size". Still, allowing more digits would indeed solve the problem. If you have a 44.1 kHz file and a loop point at position 40,000, you'd have to be able to enter 0.90703 into the field and this should then produce the correct value when the stream later converts it back to an integral position (39,999 is 0.90701 and 40,001 is 0.90705).

Since this doesn't seem to be a major problem at the moment, what's the workaround?
The workaround is to bake parts of the loop into the source audio by adding samples from the start of the loop to the end of the file until the stars align and the loop point happens to fall on a whole or tenth or so second. Then import that file into Godot by specifying a somewhat clean "seconds" loop point. It's a workable solution as long as you use a tempo where a beat and a millisecond align at some point, but it's still a bit of a hassle considering that the only reason for doing this is just a UI issue.

@RustyRoboticsBV
Copy link
Author

RustyRoboticsBV commented Jan 25, 2024

Ok, in response to the above discussion, I've increased the step value of the loop offset field to 0.00001, which if I understand correctly is enough precision to cover all commonly-used sample rates.

Of course, the aforementioned rounding error on reimport is still there. I suspect it may actually be a deeper-lying problem with how the engine imports floats from resources, but I need to investigate this further - if so, I think I'll make that a separate issue / PR, since it'd be a wider-reaching change.

Still, exposing the issue is probably better than hiding it behind a 0.001 step value. Lowering the step value is a change that we will need to make eventually anyway in order to fully resolve this issue for seconds-based offsets, so I figure we might as well make it now.

@lyuma
Copy link
Contributor

lyuma commented Jan 25, 2024

Floating points have a 23 bit mantissa
2^24/48000/60 is around 5.8, just under 6 minutes.
That means, even in the best case, you will be unable to accurately represent a sample in a track 6 minutes long if we use floating point here.

combined with @bs-mwoerner's comment, this might be good justification for using samples, since an integer has at least 32 bits of precision.

so the real issue I have with this PR is that we are shoehorning two types of data into the float loop_offset which has insufficient precision.

I would suggest changing this to make loop_offset_samples as an int32 or int64 and have some logic to decide which to use...

one suggestion I have is to perhaps deprecate loop_offset (use it if nonzero but keep loop_offset_samples in sync if possible). But that would need wider consensus

@bs-mwoerner
Copy link
Contributor

bs-mwoerner commented Jan 25, 2024

There is a world where we turn this into a huge refactoring that unifies WAV, OGG, and MP3 import but if we prefer to keep this a minimal as possible for now, I think we don't need to change anything with the internal data structure and just need to add a drop down that lets you select between "seconds", "samples" and "beats". Changing this would only change the conversion that is used for the Parse/ToString of the input field, while internally it's always stored as a seconds value (so as to maintain full compatibility). This would still require the change that makes the streams expose their sample rate, but other than that, this wouldn't overthrow a whole lot. We wouldn't even need a persistent loop_use_samples or any other new attributes.

For that to work, though, we need to make sure that the precision of the value actually survives into the audio stream. I think it does, but you get a truncated version when you open the dialog again because the dialog takes its settings not from the resource itself but from the .import file and there float values are serialized via snprintf(buf, 256, "%lg", p_num), which outputs a limited number of digits. If we want to work around that, we may have to move to storing sample counts as an int64 after all, although that would only affect the .import file, can be made backwards compatible, and requires no API changes on the actual resources.

Edit: Oh, I noticed the "Offset" tooltip actually says "Note that if BPM is set, this setting will be ignored.". Hm, that's not what I see happening. I would argue that statement, at least for OGG, is incorrect.

@MJacred
Copy link
Contributor

MJacred commented Feb 2, 2024

Well, the simplest improvement would be to use double consistently. As the audio containers work with double anyway. Though some importers still use float.
I reckon/hope saving a double serializes to a higher precision?!

loop_offset->set_tooltip_text(TTR("Loop offset (from beginning). Note that if BPM is set, this setting will be ignored."));
loop_offset->connect("value_changed", callable_mp(this, &AudioStreamImportSettingsDialog::_settings_changed).unbind(1));
loop_hb->add_child(loop_offset);
loop_use_samples = memnew(CheckBox);
loop_use_samples->set_text(TTR("Use Samples?"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
loop_use_samples->set_text(TTR("Use Samples?"));
loop_use_samples->set_text(TTR("Use Samples"));

Other check boxes here don't use a question mark, nor do they seem to do so elsewhere

@bs-mwoerner
Copy link
Contributor

I reckon/hope saving a double serializes to a higher precision?!

I haven't actually tested this, but from looking at the code, the configuration values for resource importers get stored in a Variant, which internally always uses double for floating point values, so the precision probably isn't lost before the actual conversion to a string during writing out the file very deep down in

String String::num_scientific(double p_num) {

Integers of any kind, on the other hand, get serialized in
String String::num_int64(int64_t p_num, int base, bool capitalize_hex) {

and this seems to keep the full 63 bits + sign precision, which would be enough to precisely address samples in audio files with a play time of less than 6 million years.

Hm, I wonder if .tres files take the same path.

@MJacred
Copy link
Contributor

MJacred commented Feb 2, 2024

@bs-mwoerner: I made a PR to iron out double usage (and remove rounding enforced by the SpinBox): #87860. Let's see if that's enough (for now)

@RustyRoboticsBV
Copy link
Author

I've tested @MJacred's PR (see here). Unfortunately, it doesn't resolve this issue.

I'm going to test @bs-mwoerner's idea and see if I can confirm that the problem occurs in ustring.cpp.

@RustyRoboticsBV
Copy link
Author

RustyRoboticsBV commented Feb 2, 2024

Ok, I've confirmed that double values correctly enter ustring::num_scientific, but have become truncated in the char buf[256] variable that then gets returned. So indeed, the problem lies pretty deep in Godot's internals. There's some compiler-dependent code here too, which makes me a little nervous.

So, I see two ways forward here:

  1. Somehow alter the behaviour of num_scientific and/or its dependencies to serialize doubles with more precision.
  2. Go with an integer-based .import parameter like previously suggested.

I think I prefer the latter (since there's less potential for breaking stuff), but some discussion is definitely needed here. In either case, opening a new issue for the double serialization problem is probably a good idea.

@bs-mwoerner
Copy link
Contributor

I've now had a look at option number 2, whether we could actually use an INT option to bypass the limited precision in the FLOAT serialization. We would still need to expose the sampling rate from the audio streams, but at least we wouldn't have to touch String or the serialization or similar fundamental classes. I've created yet another pull request #87882 that mostly builds upon this one (although I added a third option for entering "Beats"). I've named you as a co-author, I hope that's okay. I'm hoping that this addresses both the input issue and the survive-through-reimport issue and I'd be glad about anyone willing to test it.

@KoBeWi KoBeWi modified the milestones: 4.3, 4.4 Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants