Skip to content

Commit

Permalink
Merge pull request #92 from flucoma/MelBands-reference
Browse files Browse the repository at this point in the history
MelBands reference (RST and SC examples)
  • Loading branch information
tedmoore authored Mar 3, 2022
2 parents d10b4fd + b7b7a29 commit 45b00fd
Show file tree
Hide file tree
Showing 4 changed files with 193 additions and 193 deletions.
52 changes: 23 additions & 29 deletions doc/BufMelBands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,87 +3,81 @@
:sc-categories: Libraries>FluidDecomposition
:sc-related: Guides/FluidCorpusManipulationToolkit, Classes/FluidBufMFCC
:see-also: MelBands, BufPitch, BufLoudness, BufMFCC, BufSpectralShape, BufStats
:description: A spectral shape descriptor where the amplitude is given for a number of equally spread perceptual bands.
:description: Magnitudes for a number of perceptually-evenly spaced bands.
:discussion:
The spread is based on the Mel scale (https://en.wikipedia.org/wiki/Mel_scale) which is one of the first attempt to mimic pitch perception scientifically. This implementation allows to select the range and number of bands dynamically.

The process will return a single multichannel buffer of ``numBands`` per input channel. Each frame represents a value, which is every hopSize.
:fluid-obj:`BufMelBands` returns a Mel-Frequency Spectrum comprised of the user-defined ``numBands``. The Mel-Frequency Spectrum is a histogram of FFT bins bundled according their relationship to the Mel scale (https://en.wikipedia.org/wiki/Mel_scale) which represents frequency space logarithmically, mimicking how humans perceive pitch distance. The name "Mel" derives from the word "melody". The Hz-to-Mel conversion used by :fluid-obj:`BufMelBands` is ``mel = 1127.01048 * log(hz / 700.0 + 1.0)``.

This implementation allows to select the range and number of bands dynamically. The ``numBands`` MelBands will be perceptually equally distributed between ``minFreq`` and ``maxFreq``.

When using a high value for ``numBands``, you may end up with empty channels (filled with zeros) in the MelBands output. This is because there is not enough information in the FFT analysis to properly calculate values for every MelBand. Increasing the ``fftSize`` will ensure you have values for all the MelBands.
When using a high value for ``numBands``, you may end up with empty channels (filled with zeros) in the MelBands output. This is because there is not enough information in the FFT analysis to properly calculate values for every MelBand. Increasing the ``fftSize`` will ensure you have values for all the MelBands.

Visit https://learn.flucoma.org/reference/melbands to learn more.

:process: This is the method that calls for the spectral shape descriptors to be calculated on a given source buffer.
:output: Nothing, as the destination buffer is declared in the function call.
:process: This is the method that calls for the analysis to be calculated on a given source buffer.

:output: Nothing, as the ``features`` buffer is declared in the function call.

:control source:

The index of the buffer to use as the source material to be described through the various descriptors. The different channels of multichannel buffers will be processing sequentially.
The index of the buffer to use as the source material to be analysed. The different channels of multichannel buffers will be processing sequentially.

:control startFrame:

Where in the srcBuf should the process start, in sample.
Where in the ``source`` to begin the analysis, in samples. The default is 0.

:control numFrames:

How many frames should be processed.
How many frames should be analysed, in samples. The default of -1 indicates to analyse to the end of the buffer.

:control startChan:

For multichannel srcBuf, which channel should be processed first.
For a multichannel ``source``, which channel to begin analysis from. The default is 0.

:control numChans:

For multichannel srcBuf, how many channel should be processed.
For multichannel ``source``, how many channels should be processed, starting from ``startChan`` and counting up. The default of -1 indicates to analyse through the last channel in the ``source``.

:control features:

The destination buffer for the STRONG::numBands:: amplitudes describing the spectral shape.
The buffer to write the MelBands magnitudes into.

:control numBands:

The number of bands that will be perceptually equally distributed between STRONG::minFreq:: and STRONG::maxFreq::. It will decide how many channels are produce per channel of the source.
The number of bands that will be returned. This determines how many channels are in the ``features`` buffer (``numBands`` * ``numChans``). The default is 40.

:control minFreq:

The lower boundary of the lowest band of the model, in Hz.
The lower bound of the frequency band to use in analysis, in Hz. The default is 20.

:control maxFreq:

The highest boundary of the highest band of the model, in Hz.

:control maxNumBands:

The maximum number of Mel bands that can be modelled. This sets the number of channels of the output, and therefore cannot be modulated.
The upper bound of the frequency band to use in analysis, in Hz. The default is 20000.

:control normalize:

This flag enables the scaling of the output to preserve the energy of the window. It is on (1) by default.
This flag indicates whether to use normalized triangle filters, which account for the number of FFT magnitudes used to calculate the MelBands. When normalization is off (`normalize` = 0) the higher MelBands tend to be disproportionately large because they are summing more FFT magnitudes. The default is to have normalization on (`normalize` = 1).

:control scale:

This flag sets the scaling of the output value. It is either linear (0, by default) or in dB (1).
This flag sets the scaling of the output value. It is either linear (0, by default) or in dB (1).

:control windowSize:

The window size. As spectral description relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty

:control hopSize:

The window hop size. As spectral description relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts.
The window hop size. As this analysis relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).

:control fftSize:

The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision.
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.

:control padding:

Controls the zero-padding added to either end of the source buffer or segment. Possible values are 0 (no padding), 1 (default, half the window size), or 2 (window size - hop size). Padding ensures that all input samples are completely analysed: with no padding, the first analysis window starts at time 0, and the samples at either end will be tapered by the STFT windowing function. Mode 1 has the effect of centering the first sample in the analysis window and ensuring that the very start and end of the segment are accounted for in the analysis. Mode 2 can be useful when the overlap factor (window size / hop size) is greater than 2, to ensure that the input samples at either end of the segment are covered by the same number of analysis frames as the rest of the analysed material.

:control maxFFTSize:

How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
Controls the zero-padding added to either end of the source buffer or segment. Possible values are 0 (no padding), 1 (default, half the window size), or 2 (window size - hop size). Padding ensures that all input samples are completely analysed: with no padding, the first analysis window starts at time 0, and the samples at either end will be tapered by the STFT windowing function. Mode 1 has the effect of centring the first sample in the analysis window and ensuring that the very start and end of the segment are accounted for in the analysis. Mode 2 can be useful when the overlap factor (window size / hop size) is greater than 2, to ensure that the input samples at either end of the segment are covered by the same number of analysis frames as the rest of the analysed material.

:control action:

A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [features] as an argument.

25 changes: 12 additions & 13 deletions doc/MelBands.rst
Original file line number Diff line number Diff line change
@@ -1,43 +1,43 @@
:digest: A Perceptually Spread Spectral Contour Descriptor in Real-Time
:digest: A Perceptually Spread Spectral Contour Descriptor
:species: descriptor
:sc-categories: Libraries>FluidDecomposition
:sc-related: Guides/FluidCorpusManipulationToolkit, Classes/FluidMFCC
:see-also: BufMelBands, Pitch, Loudness, MFCC, SpectralShape
:description: Amplitude for a number of equally spread perceptual bands.
:description: Magnitudes for a number of perceptually-evenly spaced bands.
:discussion:
The spread is based on the Mel scale (https://en.wikipedia.org/wiki/Mel_scale) which was one of the first attempts to mimic pitch perception scientifically. This implementation allows to select the range and number of bands dynamically.

The process will return a multichannel control steam of size maxNumBands, which will be repeated if no change happens within the algorithm, i.e. when the hopSize is larger than the signal vector size.
:fluid-obj:`MelBands` returns a Mel-Frequency Spectrum comprised of the user-defined ``numBands``. The Mel-Frequency Spectrum is a histogram of FFT bins bundled according their relationship to the Mel scale (https://en.wikipedia.org/wiki/Mel_scale) which represents frequency space logarithmically, mimicking how humans perceive pitch distance. The name "Mel" derives from the word "melody". The Hz-to-Mel conversion used by :fluid-obj:`MelBands` is ``mel = 1127.01048 * log(hz / 700.0 + 1.0)``. This implementation allows to select the range and number of bands dynamically.

When using a high value for ``numBands``, you may end up with empty channels (filled with zeros) in the MelBands output. This is because there is not enough information in the FFT analysis to properly calculate values for every MelBand. Increasing the ``fftSize`` will ensure you have values for all the MelBands.

Visit https://learn.flucoma.org/reference/melbands to learn more.

:process: The audio rate in, control rate out version of the object.
:output: A KR signal of maxNumBands channels, giving the measure amplitudes for each band. The latency is windowSize.

:output: A KR signal of ``maxNumBands channels``, giving the measured magnitudes for each band. The latency is windowSize.

:control in:

The audio to be processed.

:control numBands:

The number of bands that will be perceptually equally distributed between minFreq and maxFreq. It is limited by the maxNumBands parameter. When the number is smaller than the maximum, the output is zero-padded.
The number of bands that will be perceptually equally distributed between ``minFreq`` and ``maxFreq``. It is limited by the maxNumBands parameter. When the number is smaller than the maximum, the output is zero-padded.

:control minFreq:

The lower boundary of the lowest band of the model, in Hz.
The lower bound of the frequency band to use in analysis, in Hz. The default is 20.

:control maxFreq:

The highest boundary of the highest band of the model, in Hz.
The upper bound of the frequency band to use in analysis, in Hz. The default is 20000.

:control maxNumBands:

The maximum number of Mel bands that can be modelled. This sets the number of channels of the output, and therefore cannot be modulated.
The maximum number of Mel bands that can be modelled. This sets the number of channels of the output, and therefore cannot be modulated. The default is 120.

:control normalize:

This flag enables the scaling of the output to preserve the energy of the window. It is on (1) by default.
This flag indicates whether to use normalized triangle filters, which account for the number of FFT magnitudes used to calculate the MelBands. When normalization is off (`normalize` = 0) the higher MelBands tend to be disproportionately large because they are summing more FFT magnitudes. The default is to have normalization on (`normalize` = 1).

:control scale:

Expand All @@ -57,5 +57,4 @@

:control maxFFTSize:

How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.

How large the FFT can be, by allocating memory at instantiation time. This cannot be modulated.
84 changes: 49 additions & 35 deletions example-code/sc/BufMelBands.scd
Original file line number Diff line number Diff line change
@@ -1,50 +1,64 @@

STRONG::Use a buffer of MelBands to drive a bank of oscillators::
code::
// create some buffers
(
b = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav"));
c = Buffer.new(s);
~bells = Buffer.readChannel(s,FluidFilesPath("Tremblay-CF-ChurchBells.wav"),channels:[0]);
~melBands = Buffer(s);
~numBands = 100;
)

// run the process with basic parameters
// listen to the original
~bells.play;

// analyse
FluidBufMelBands.processBlocking(s,~bells,features:~melBands,numBands:~numBands,action:{"done".postln});

// playback
(
Routine{
t = Main.elapsedTime;
FluidBufMelBands.process(s, b, features: c, numBands:10).wait;
(Main.elapsedTime - t).postln;
}.play
x = {
arg rate = 0.1, freqMul = 1, freqAdd = 0;
var phs = Phasor.kr(0,rate,0,BufFrames.ir(~melBands));
var melBands = BufRd.kr(~numBands,~melBands,phs,1,4);
var lowMel = 1127.010498 * ((20/700) + 1).log; // convert from hz to mels
var highMel = 1127.010498 * ((20000/700) + 1).log; // convert from hz to mels
var rangeMel = highMel - lowMel;
var stepMel = rangeMel / (~numBands+1);
var freqMel = Array.fill(~numBands,{arg i; (stepMel * (i+1)) + lowMel});
var freqHz = ((freqMel/ 1127.01048).exp - 1) * 700; // convert from mel to hz
var sig = SinOsc.ar((freqHz * freqMul) + freqAdd,0,melBands);
Splay.ar(sig) * 24.dbamp;
}.play;
)

// listen to the source and look at the buffer
b.play;
c.plot
::

STRONG::A stereo buffer example.::
CODE::
// manipulate the oscillator bank
x.set(\rate,0.3);
x.set(\rate,0.04);
x.set(\freqMul,0.5);
x.set(\freqAdd,-2000);

// load two very different files
::
STRONG::Look at the MelBands in FluidWaveform (as "features")::
code::
// create some buffers
(
b = Buffer.read(s,FluidFilesPath("Tremblay-SA-UprightPianoPedalWide.wav"));
c = Buffer.read(s,FluidFilesPath("Tremblay-AaS-AcousticStrums-M.wav"));
~src = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav"));
~melBands = Buffer.new(s);
)

// composite one on left one on right as test signals
FluidBufCompose.process(s, c, numFrames:b.numFrames, startFrame:555000,destStartChan:1, destination:b)
b.play

// create a buffer as destinations
c = Buffer.new(s);
// run the process with basic parameters
FluidBufMelBands.processBlocking(s,~src,features:~melBands,action:{"done".postln});

//run the process on them
// look at the mel bands as feature curves (a bit messy...)
FluidWaveform(~src,featuresBuffer:~melBands,bounds:Rect(0,0,1600,400),stackFeatures:true,normalizeFeaturesIndependently:false);
::
STRONG::Do a higher resolution analysis and plot it as an image in FluidWaveform::
code::
// create some buffers
(
Routine{
t = Main.elapsedTime;
FluidBufMelBands.process(s, b, features: c, numBands:10).wait;
(Main.elapsedTime - t).postln;
}.play
~src = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav"));
~melBands = Buffer.new(s);
)

// look at the buffer: 10 bands for left, then 10 bands for right
c.plot(separately:true)
::
FluidBufMelBands.processBlocking(s,~src,features:~melBands,numBands:400,fftSize:4096,action:{"done".postln});

FluidWaveform(imageBuffer:~melBands,bounds:Rect(0,0,1600,400),imageColorScheme:1,imageColorScaling:1);
::
Loading

0 comments on commit 45b00fd

Please sign in to comment.