Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing Rubber Band padding, preventing it from eating the initial transient #11120

Merged
merged 13 commits into from
Dec 21, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 61 additions & 29 deletions src/engine/bufferscalers/enginebufferscalerubberband.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
#include "engine/bufferscalers/enginebufferscalerubberband.h"

#include <rubberband/RubberBandStretcher.h>

#include <QtDebug>

#include "control/controlobject.h"
Expand All @@ -19,7 +17,7 @@ namespace {
// TODO (XXX): this should be removed. It is only needed to work around
// a Rubberband 1.3 bug.
// This is the default increment from RubberBand 1.8.1.
size_t kRubberBandBlockSize = 256;
constexpr size_t kRubberBandBlockSize = 256;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to update the comment?
Our oldest supported version is 1.8.2 from Ubuntu Focal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, completely remove the block size limit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
I have double checked, the bug has been fixed in Rubberband 1.7
breakfastquay/rubberband@c26dc1d

By the way, Rubberband will never request more than getPreferredStartPad() * 2
https://github.com/breakfastquay/rubberband/blob/de56cd114a678003dfef17abbd74ebd9203964eb/src/finer/R3Stretcher.cpp#L565
Maybe we can use this info to improve our buffers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted most of adb034b in 9b8e82a. It Works On My System ™️ with librubberband 3.1.2.


#define RUBBERBANDV3 (RUBBERBAND_API_MAJOR_VERSION >= 2 && RUBBERBAND_API_MINOR_VERSION >= 7)

Expand All @@ -28,22 +26,16 @@ size_t kRubberBandBlockSize = 256;
EngineBufferScaleRubberBand::EngineBufferScaleRubberBand(
ReadAheadManager* pReadAheadManager)
: m_pReadAheadManager(pReadAheadManager),
m_buffer_back(SampleUtil::alloc(MAX_BUFFER_LEN)),
m_buffers{mixxx::SampleBuffer(MAX_BUFFER_LEN), mixxx::SampleBuffer(MAX_BUFFER_LEN)},
m_bufferPtrs{m_buffers[0].data(), m_buffers[1].data()},
m_interleavedReadBuffer(MAX_BUFFER_LEN),
m_bBackwards(false),
m_useEngineFiner(false) {
m_retrieve_buffer[0] = SampleUtil::alloc(MAX_BUFFER_LEN);
m_retrieve_buffer[1] = SampleUtil::alloc(MAX_BUFFER_LEN);
// Initialize the internal buffers to prevent re-allocations
// in the real-time thread.
onSampleRateChanged();
}

EngineBufferScaleRubberBand::~EngineBufferScaleRubberBand() {
SampleUtil::free(m_buffer_back);
SampleUtil::free(m_retrieve_buffer[0]);
SampleUtil::free(m_retrieve_buffer[1]);
}

void EngineBufferScaleRubberBand::setScaleParameters(double base_rate,
double* pTempoRatio,
double* pPitchRatio) {
Expand Down Expand Up @@ -142,7 +134,7 @@ void EngineBufferScaleRubberBand::clear() {
VERIFY_OR_DEBUG_ASSERT(m_pRubberBand) {
return;
}
m_pRubberBand->reset();
reset();
}

SINT EngineBufferScaleRubberBand::retrieveAndDeinterleave(
Expand All @@ -151,22 +143,25 @@ SINT EngineBufferScaleRubberBand::retrieveAndDeinterleave(
SINT frames_available = m_pRubberBand->available();
SINT frames_to_read = math_min(frames_available, frames);
SINT received_frames = static_cast<SINT>(m_pRubberBand->retrieve(
m_retrieve_buffer, frames_to_read));
m_bufferPtrs.data(), frames_to_read));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to read here frames_to_read + frame_offset;
Even though the buffer is more than big enough, it would be correct to check for its size.
We may consider to use a way smaller buffer than MAX_BUFFER_LEN b.t.w.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, did you miss to push a commit?

The function expects frames and we need to discard frame_offset from the retrieved buffer. So we need to retrieve frames + frame_offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 755a4a4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked code makes that the function returns less frames than it might could have read. So we should not clamp the value to frames , but to frames + frame_offset in line 132

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we should not. I can't check what m_pRubberBand->available() returns here, but logically it would return 1024 if it's doing overlap-add with four times overlap. At a 1.0 pitch ratio/when the time stretcher should not be making any adjustments, we need to throw away half a window's worth of samples, which is 2048 in this case. So in that situation the first two calls to this function must read and return 0 samples or else we've been reading the processed padding that we're trying to throw away here.

The linked code makes that the function returns less frames than it might could have read.

It can only read a small window of output at a time. Presumably this is 1/4th of the FFT window size. So 1024 samples. Which means that the first two calls to this function (each producing 1024 samples of garbage output) should be thrown away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bit debug code to shows the issue after a seek:
qDebug() << "padding = " << m_remainingPaddingInOutput << "frames = " << frames << "available = " << frames_available;

debug [Engine] padding  =  1025 frames  =  256 available  =  0
debug [Engine] padding  =  1025 frames  =  256 available  =  512
debug [Engine] padding  =  769 frames  =  256 available  =  768                 
debug [Engine] padding  =  513 frames  =  256 available  =  1024
debug [Engine] padding  =  257 frames  =  256 available  =  1280
debug [Engine] padding  =  1 frames  =  256 available  =  1536
debug [Engine] padding  =  0 frames  =  1 available  =  1792
debug [Engine] padding  =  0 frames  =  256 available  =  1791
debug [Engine] padding  =  0 frames  =  256 available  =  1535
debug [Engine] padding  =  0 frames  =  256 available  =  1279

In the second call, we read only 256 frames from the available 512. The available frames are stacked up before they are reduced again. We need to read all available frames in that case. This needs to be fixed.

Copy link
Contributor Author

@robbert-vdh robbert-vdh Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah I see what you mean now! Fixed in d3aa210.

Somewhat related, would there be any interest in exposing the Rubberband R3 + short window option? I added it for myself here: robbert-vdh@af570f4 Its performance is in line with R2 (and also no huge spikes when searching), and quality wise it's more or less a sidegrade to R2. It doesn't have R2's time domain transient preservation, but it also doesn't introduce spurious transients and it does still sound better than R2 without transient preservation. Hmm maybe not. It usually does better than R2, but it sometimes also transforms kickdrums into smeary messes.


DEBUG_ASSERT(received_frames <= static_cast<ssize_t>(m_buffers[0].size()));

SampleUtil::interleaveBuffer(pBuffer,
m_retrieve_buffer[0],
m_retrieve_buffer[1],
received_frames);
m_buffers[0].data(),
m_buffers[1].data(),
received_frames);
return received_frames;
}

void EngineBufferScaleRubberBand::deinterleaveAndProcess(
const CSAMPLE* pBuffer, SINT frames, bool flush) {
DEBUG_ASSERT(frames <= static_cast<ssize_t>(m_buffers[0].size()));
Copy link
Member

@JoergAtGithub JoergAtGithub Dec 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try to compile this locally on Windows with VS2022, it fails with an error, that identifier ssize_t is undefined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't realize ssize_t isn't from the standard library, and it compiles fine with MSVC on the CI. Probably through some transitive dependency. I guess ptrdiff_t would be the portable alternative that comes closest to ssize_t.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik its part of the C standard library. I guess you could also #include <cstddef>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't appear to be. Libusb and the other libraries Mixxx uses that also use ssize_t seem to explicitly define it on Windows.


SampleUtil::deinterleaveBuffer(
m_retrieve_buffer[0], m_retrieve_buffer[1], pBuffer, frames);
m_buffers[0].data(), m_buffers[1].data(), pBuffer, frames);

m_pRubberBand->process(m_retrieve_buffer,
m_pRubberBand->process(m_bufferPtrs.data(),
frames,
flush);
}
Expand Down Expand Up @@ -199,10 +194,10 @@ double EngineBufferScaleRubberBand::scaleBuffer(
read += getOutputSignal().frames2samples(received_frames);

if (break_out_after_retrieve_and_reset_rubberband) {
//qDebug() << "break_out_after_retrieve_and_reset_rubberband";
// qDebug() << "break_out_after_retrieve_and_reset_rubberband";
// If we break out early then we have flushed RubberBand and need to
// reset it.
m_pRubberBand->reset();
reset();
break;
}

Expand All @@ -221,26 +216,26 @@ double EngineBufferScaleRubberBand::scaleBuffer(
iLenFramesRequired = kRubberBandBlockSize;
}
}
//qDebug() << "iLenFramesRequired" << iLenFramesRequired;
// qDebug() << "iLenFramesRequired" << iLenFramesRequired;

if (remaining_frames > 0 && iLenFramesRequired > 0) {
SINT iAvailSamples = m_pReadAheadManager->getNextSamples(
// The value doesn't matter here. All that matters is we
// are going forward or backward.
(m_bBackwards ? -1.0 : 1.0) * m_dBaseRate * m_dTempoRatio,
m_buffer_back,
getOutputSignal().frames2samples(iLenFramesRequired));
// The value doesn't matter here. All that matters is we
// are going forward or backward.
(m_bBackwards ? -1.0 : 1.0) * m_dBaseRate * m_dTempoRatio,
m_interleavedReadBuffer.data(),
getOutputSignal().frames2samples(iLenFramesRequired));
SINT iAvailFrames = getOutputSignal().samples2frames(iAvailSamples);

if (iAvailFrames > 0) {
last_read_failed = false;
deinterleaveAndProcess(m_buffer_back, iAvailFrames, false);
deinterleaveAndProcess(m_interleavedReadBuffer.data(), iAvailFrames, false);
} else {
if (last_read_failed) {
// Flush and break out after the next retrieval. If we are
// at EOF this serves to get the last samples out of
// RubberBand.
deinterleaveAndProcess(m_buffer_back, 0, true);
deinterleaveAndProcess(m_interleavedReadBuffer.data(), 0, true);
break_out_after_retrieve_and_reset_rubberband = true;
}
last_read_failed = true;
Expand Down Expand Up @@ -286,3 +281,40 @@ int EngineBufferScaleRubberBand::runningEngineVersion() {
return 2;
#endif
}

void EngineBufferScaleRubberBand::reset() {
m_pRubberBand->reset();

// As mentioned in the docs (https://breakfastquay.com/rubberband/code-doc/)
// and FAQ (https://breakfastquay.com/rubberband/integration.html#faqs), you
// need to run some silent samples through the time stretching engine first
// before using it. Otherwise it will eat add a short fade-in, destroying
// the initial transient.
#if RUBBERBANDV3
size_t remaining_padding = m_pRubberBand->getPreferredStartPad();
#else
// This _should_ be equal to the latency in older Rubber Band versions:
// https://github.com/breakfastquay/rubberband/blob/c5f99d5ff2cba2f4f1def6c38c7843bbb9ac7a78/main/main.cpp#L652
size_t remaining_padding = m_pRubberBand->getLatency();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input and output padding is different by the playback rate.

The interesting commit is this:
breakfastquay/rubberband@72654b0

So we need to do here something similar:
int toPad = int(round(toDrop * frequencyshift));

Copy link
Contributor Author

@robbert-vdh robbert-vdh Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah good catch. Looking at the changes to the actual time stretchers it seems like that's how you get the new getStartPad() behavior in terms of getLatency(). For the R3 time stretcher the getStartDelay() value is different from the old getLatency() value, so I'll emulate that and then push the changes.

Copy link
Contributor Author

@robbert-vdh robbert-vdh Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or well maybe the value was incorrect in that screenshot? Looking at https://github.com/breakfastquay/rubberband/blob/v1.8.2/src/StretcherImpl.cpp#L826-L831 getLatency() should indeed be identical to getStartDelay() (like the documentation suggests). Never mind, now I'm confusing myself. For the R2 algorithm they're the same, with the R3 algorithm they're different but I'm not sure if that was intentional. And that would only affect beta versions of librubberband anyways so it's not a concern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#endif
std::fill_n(m_buffers[0].span().begin(), kRubberBandBlockSize, 0.0f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment that this filling zeros is the correct solution here, because we always fade in from zero after seek to avoid click sounds. This means that the first half of the buffer needs to be zero anyway.

I have verified it with a recording in Audacity that this looks good.

My fist assumption to use here real samples is correct but they have zero gain = 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the idea of the zero padding is that if the STFT process is working correctly, then those samples will not have any impact on the rest of the output. See my comment above. I added a link to the comment in the code.

std::fill_n(m_buffers[1].span().begin(), kRubberBandBlockSize, 0.0f);
while (remaining_padding > 0) {
const size_t pad_samples = std::min<size_t>(remaining_padding, kRubberBandBlockSize);
m_pRubberBand->process(m_bufferPtrs.data(), pad_samples, false);

remaining_padding -= pad_samples;
}

#if RUBBERBANDV3
size_t padding_to_drop = m_pRubberBand->getStartDelay();
#else
size_t padding_to_drop = m_pRubberBand->getLatency();
#endif
while (padding_to_drop > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to call while m_pRubberBand->available() before this loop.
In this regards, I am not sure if we can pull out all padding immediately, or if it needs to be stay in until we have provided the real samples of the same size.

At least in the rubberband main, this is done not immediately after filling in the padding.

I am afraid dropping at this point is not correct, because we likely need to keep the rubber band buffers filled with at least the startDelay() to perform windowing around the current sample.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also also concerned how big the internal buffer of rubberband is, since I see here memory allocation:
https://github.com/breakfastquay/rubberband/blob/9001b5459c1c424a71be0829c8988a25a9edf490/src/faster/StretcherProcess.cpp#L487
Which will likely block our real time thread.

The buffer is initial m_sWindowSize * 2;
4096 @ 48 kHz

The default start pads is m_aWindowSize / 2;
1024 @ 48 kHz

https://github.com/breakfastquay/rubberband/blob/52fc576500931afb5a7374a241c5f386b14a5e29/src/faster/R2Stretcher.cpp#L131

All this makes kind of sense. To retrieve 1 sample, we need half of a window samples before the sample and half a window silence after it.

In case of Mixxx, and after seeking within a track, it is not true that these samples are zero. So I think it is required to use the real samples before the start instead of zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusing point is that we also have the available() function. In theory there must not bee any output available before not receive more than a whole m_aWindowSize. Is this the case?

I think the initial getSamplesRequired() return m_aWindowSize.
It would be interesting how many samples are available after passing the requested amount.
If it is 0 or m_aWindowSize / 2?
In the later case it is clear why a padding of m_aWindowSize / 2 is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also also concerned how big the internal buffer of rubberband is, since I see here memory allocation:
https://github.com/breakfastquay/rubberband/blob/9001b5459c1c424a71be0829c8988a25a9edf490/src/faster/StretcherProcess.cpp#L487
Which will likely block our real time thread.

The padding is half of the window size, so that should definitely fit in Rubber Band's buffers. And thanks to the m_pRubberBand->setTimeRatio(2.0) call the buffer should be able to fit at least two of those windows before reallocating.

In case of Mixxx, and after seeking within a track, it is not true that these samples are zero. So I think it is required to use the real samples before the start instead of zero.

That was part of my experiments in solving #11125, but it's a bit more difficult than it seems. Especially because the read ahead manager is shared by multiple things and reading data through it invokes a bunch of side effects. Older samples also may not be cached (I've noticed this causing issues when beat jumping somewhere for the first time that weren't present when not reading past samples). I have not yet had time to dig into this further, so if you have suggestions for how to compensate for the latency here without messing with the rest of the deck's state (so switching back and forth between linear interpolation and Rubber Band time stretching should be seamless and not shift any timings around) then that would be great.

The confusing point is that we also have the available() function. In theory there must not bee any output available before not receive more than a whole m_aWindowSize. Is this the case?

Yes, like I mentioned before, they're using an STFT (Short-time Fourier Transform). It's an overlap-add process where you a window function to a chunk of the input data, take the discrete Fourier transform of that, process the frequency domain signal, take the inverse discrete Fourier transform of that, apply another window function, and then add that to the output buffer, partially overlapping the previous output. The reason why the required padding is half of the input buffer is because most window functions used for these things are raised cosine windows, like the Hann/Hanning function I linked below. Because you apply (= multiple the chunk of audio with) the window twice, once before the DFT, and once after the IDFT, you've effectively applied the square of the window function to the signal if you didn't do anything to the signal in the frequency domain. And if you try adding up squared Hann functions with a quarter period difference between them, you'll notice that after a period the resulting function forms a flat line. That explains why the required padding is half a window, and why you get that smooth fade in if you don't add the padding.

Some wikipedia inks if you're curious about how and why this all works:

https://en.wikipedia.org/wiki/Short-time_Fourier_transform
https://en.wikipedia.org/wiki/Overlap%E2%80%93add_method
https://en.wikipedia.org/wiki/Hann_function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah I guess you are right, we shouldn't be reading and dropping samples immediately. We should be doing the same thing the rubberband cli is doing, and process the audio as normal but drop the first getStartDelay() samples of it. That also aligns with the STFT primer I posted above heh.

const size_t drop_samples = std::min<size_t>(padding_to_drop, kRubberBandBlockSize);
m_pRubberBand->retrieve(m_bufferPtrs.data(), drop_samples);

padding_to_drop -= drop_samples;
}
}
35 changes: 27 additions & 8 deletions src/engine/bufferscalers/enginebufferscalerubberband.h
Original file line number Diff line number Diff line change
@@ -1,21 +1,27 @@
#pragma once

#include <rubberband/RubberBandStretcher.h>

#include <array>

#include "engine/bufferscalers/enginebufferscale.h"
#include "util/memory.h"

namespace RubberBand {
class RubberBandStretcher;
} // namespace RubberBand
#include "util/samplebuffer.h"

class ReadAheadManager;

// Uses librubberband to scale audio. This class is not thread safe.
class EngineBufferScaleRubberBand : public EngineBufferScale {
class EngineBufferScaleRubberBand final : public EngineBufferScale {
Q_OBJECT
public:
explicit EngineBufferScaleRubberBand(
ReadAheadManager* pReadAheadManager);
~EngineBufferScaleRubberBand() override;

EngineBufferScaleRubberBand(const EngineBufferScaleRubberBand&) = delete;
EngineBufferScaleRubberBand& operator=(const EngineBufferScaleRubberBand&) = delete;

EngineBufferScaleRubberBand(EngineBufferScaleRubberBand&&) = delete;
EngineBufferScaleRubberBand& operator=(EngineBufferScaleRubberBand&&) = delete;

// Let EngineBuffer know if engine v3 is available
static bool isEngineFinerAvailable();
Expand All @@ -39,6 +45,10 @@ class EngineBufferScaleRubberBand : public EngineBufferScale {
void onSampleRateChanged() override;

int runningEngineVersion();
/// Reset the rubberband instance and run the prerequisite amount of padding
/// through it. This should be used instead of calling
/// `m_pRubberBand->reset()` directly.
void reset();

void deinterleaveAndProcess(const CSAMPLE* pBuffer, SINT frames, bool flush);
SINT retrieveAndDeinterleave(CSAMPLE* pBuffer, SINT frames);
Expand All @@ -48,8 +58,17 @@ class EngineBufferScaleRubberBand : public EngineBufferScale {

std::unique_ptr<RubberBand::RubberBandStretcher> m_pRubberBand;

CSAMPLE* m_retrieve_buffer[2];
CSAMPLE* m_buffer_back;
/// The audio buffers samples used to send audio to Rubber Band and to
/// receive processed audio from Rubber Band. This is needed because Mixxx
/// uses interleaved buffers in most other places.
std::array<mixxx::SampleBuffer, 2> m_buffers;
/// These point to the buffers in `m_buffers`. They can be defined here
/// since this object cannot be moved or copied.
std::array<float*, 2> m_bufferPtrs;

/// Contains interleaved samples read from `m_pReadAheadManager`. These need
/// to be deinterleaved before they can be passed to Rubber Band.
mixxx::SampleBuffer m_interleavedReadBuffer;

// Holds the playback direction
bool m_bBackwards;
Expand Down