-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some audio data seems to trigger a bug in the encoder/decoder #25
Comments
I can reproduce the problem here. It looks like the predictor weights go into a feedback loop at some point and begin to oscillate wildly. I will investigate how to fix this. Thanks for the test case! |
It looks like the QOA format itself is broken, causing this bug. The spec says I replaced The rest is probably due to rounding errors in the same shifts/divisions. Maybe there should be some damping on the weights to prevent them from growing too large? |
It is my understanding that LMS (or other) filters can easily become unstable with 3 or more terms. I've seen this in my experiments that led up to QOA - e.g. when precomputing the coefficients for an entire frame with Levinson Recursion. This was also one of the reasons that WavPack does multiple passes with shorter terms (1 or 2 weights) instead. I (falsely) assumed this wouldn't be possible if the weights are updated at each sample. Though I'm not convinced that this is to blame on rounding errors or bit shifts (instead of divisions). The given test sample sounds quite bad when encoded with QOA, even without the clicks. The prediction is doing an extremely bad job here and it's not (only) because of the high frequency. This is not the case with pure sine waves for instance. I couldn't yet wrap my head around "why" this is. In any case, this is very unfortunate. I'm looking for ways to fix this (or workaround) in the encoder, without needing to change the format itself. |
So here's a mildly stupid idea: in this specific case As an experiment: just resetting the weights at the beginning of a frame when Adding this before writing the current weights to the stream here results in a clean encoding of the test case: if (abs(qoa->lms[c].weights[0]) > 16384) {
qoa->lms[c].weights[0] = 0;
qoa->lms[c].weights[1] = 0;
qoa->lms[c].weights[2] = -(1<<13);
qoa->lms[c].weights[3] = (1<<14);
} Of course this is rather inelegant, but I think the approach itself may have some merit. Maybe there's a reliable way to detect these cases; or maybe there's a way to "normalize" the weights at the beginning of each frame? This still relies on the assumption that the feedback loop exploding needs more than 5120 samples. I.e. if this whole thing could happen within just one frame, this wouldn't help... |
I've been doing the following:
It's ugly doing two passes, but pSNR does go up by a good amount this way... dark_narrow_road went from to 51.66 to 71.44 dB. pSNR isn't a great way to compare audio though. I've been running the outputs through PEAQ tests and the perceptual quality only seems to be slightly better, evne worse in a few frames. I can also hear clicks on the frame boundaries on the test sample from this Issue. Another idea I had was running frame N-1 with weights set to {0,0,-1,2} and using those output weights to initialize frame N. Didn't work so well... |
Pre-computing the weights for each frame using Levinson Recursion also fixes this test case. Similarly to your approach it slightly improves some of the test samples, while others got worse. I'm not sure if I want to go down this road... Here's an experimental branch: https://github.com/phoboslab/qoa/tree/precompute_weights |
Whatever you do to the weights, it can only happen at the start of a frame, and the spec forces a 1/16th weight update per-sample. By the time you can adjust, you already have up to 5120 bad samples. The only thing an encoder can change within a frame are the scalefactors/residuals, calculated from the original samples. Maybe there's some preprocessing we can do there, similar to noise shaping but for the weights? To bring the weights down mid-frame, generate an anti-signal from the negative weights and mix that in with the sample? |
Yeah, something certanly wonky with the lms calculations. I've been trying to implement the decoder in pure python as a learning experiment and this is one place where I'm getting huge values that will not fit in the data type. I'm guessing C relies on the modular arithmetic and it still works somehow (but causes occasional pops?). I can reply back with the sample file and exact position where i saw it happen if needed. |
The pre-computed weights branch fixes the test case but breaks many other sounds with clicks (maybe that's still the right route but needs tweaking?). With the original code I notice the noise is growing just before the click, so I also think that a dampening somewhere in the code like bdwjn suggested should help (sorry I didn't have time to study the QOA format yet). I made a quick program to generate test sounds in a controlled way and check how the encoder behaves. With a pure sine, no problems. However, if I add a 2nd harmonic to the sound, the issue shows up. First test made with the code below, two harmonics, the noise is growing over time but not exploding: If I increase the frequency without touching other paramters (freq=i ` FILE *test = fopen("test.raw", "wb"); for (unsigned i = 0; i < NBSAMPLES; i++) |
I made some plots of the weights for those files: freq_i0.2.wav: https://i.imgur.com/L6kJu4v.png You can see how it starts to get unstable a few wavelengths before finally exploding. The simplest way of fixing this would be (if the spec allowed it) to adaptively lower the learning rate. Here's freq_i0.3.wav with a µ of 1/64 instead of 1/16: https://i.imgur.com/66c7eIG.png |
I like the idea in general. We may even be able to increase the learning rate for some cases and improve the quality. If that's the case (and the adaptation logic is simple enough) it would be worth a spec update. My main question is: How would that work, exactly? What does it adapt to?
True. But the question remains: is it possible to force the encoder into a bad state within fewer than 5120 samples? Maybe adjusting at the start of a frame is sufficient? FWIW: I generated some more WAV files using Kilklang26's approach and have yet to encounter one that isn't "fixed" by resetting the weights at the beginning of a frame if |
I'm currently running the encoder through my local music library. This will take a while to complete, but even after the first 6000 tracks it identified a few where the LMS goes haywire (though it always catches itself immediately after). With timestamps:
And some false positives, where briefly
So the culprit is always(?) a high pitched tone. Sustained for too long and it explodes. |
I used QOA on samples of instruments I recorded, that's why it exhibits the problem way more than real music, there are a lot of steady tones and some of them are high pitched. What do I need to change in the code to do your "weights[0] > 16384" change? I can try running it on my sample collection and report the results. |
That totally makes sense and it's absolutely a use-case that QOA should be able to support. So far, checking for a large For the tests that I'm running, I just put if (abs(qoa->lms[c].weights[0]) > 16384) {
printf("LMS_ERROR (%d)\n", qoa->lms[c].weights[0]);
} at qoa.h, line 445 and look for For the workaround/"fix" I had if (abs(qoa->lms[c].weights[0]) > 16384) {
qoa->lms[c].weights[0] = 0;
qoa->lms[c].weights[1] = 0;
qoa->lms[c].weights[2] = -(1<<13);
qoa->lms[c].weights[3] = (1<<14);
} |
I have just pushed a workaround to master. I found that 1) the total power of the weights is a good indicator that the LMS is about to explode 2) resetting the weights to This still introduces audible artifacts when the weights reset. It prevents the LMS from exploding, but is far from perfect :/ |
Above commits introduces a better method to prevent the weights from growing too large: the weights are summed after each encoded sample and add to the error "rank" for the current scale factor. In problem cases a scale factor that keeps the weights lower is chosen. This still introduces noise in the problem cases, but it's more uniform and not as abrupt as resetting all weights to zero. |
Hi,
I just tried this compression and it works fine on hundred of different files. However on a specific one, the reconstructed sound has bugs in it. If I lower the volume of the source file, the bug does not appear anymore.
Check below the source file (test.WAV) and the reconstructed file with the bugs (raw PCM 16 bits signed mono)
https://kodamo.org/share2cRGh/test.WAV
https://kodamo.org/share2cRGh/reconst.raw
Is it possible that this exact sound triggers some bug in the encode/decode algorithm?
The text was updated successfully, but these errors were encountered: