-
-
Notifications
You must be signed in to change notification settings - Fork 245
Introduction
Audio data on microcontrollers is usually represented as a stream of signed integers that oscillate around 0. This audio format is usually called PCM or RAW.
To specify the format of an audio stream we need to know:
- how many bits are used to represent the integer (e.g. 16 bits = int16_t, 24 bits = int24_t or 32 bits = int32_t)
- how many channels are available (e.g. 2 channels with left and right (=stereo) data or 1 for mono)
- the sample rate (e.g. 44100 Hz)
The number of bits define the supported value range: 16 bit values are between -32768 and 32767. So a typical stream of audio data on Microcontrollers uses 16 bits and 2 channels and looks as follows :
In our library the format is represented with the AudioInfo class.
Please note that on regular desktop computers you often also find floats which are scaled between -1.0 and 1.0. Since floating point operations are expensive, this is however usually avoided on Microcontrollers. Also the processing of 8 bit numbers are not supported by most of the classes, but you can easily read and write floats or signed or unsigned 8 bit numbers by using a Codec.
Recommendation: Set the bits_per_sample to 16 for maximum processing speed!
Logging is important to figure out what the program is actually doing. I suggest that you start with the log level Info.
AudioLogger::instance().begin(Serial, AudioLogger::Info);
Recommendation: If you need to see more details you can switch to the log level Debug and after things work you need to switch to Warning or Error to avoid any negative impact on the audio quality.
Both the input and output of audio are handled via Arduino streams, so you can process them exactly the same way like files, Serial etc. However we should avoid the single byte operations and use the methods which use a block of memory. The most important operations are:
- int availableForWrite ()
- size_t write (const uint8_t *buffer, size_t size)
- int available()
- size_t readBytes(const uint8_t *buffer, size_t size)
It is also important, that the format of the input and output stream are matching. You can determine the default setting of a stream by calling the defaultConfig()
method and you start a stream by calling begin()
passing the configuration as argument.
Recommendation: Keep the bits per sample at 16 for maximum processing speed!
Here is a sample sketch that sets up the input from a SineWaveGenerator and writes the output to a CsvOutput.
#include "AudioTools.h"
AudioInfo info(44100, 2, 16);
SineWaveGenerator<int16_t> sineWave(32000); // subclass of SoundGenerator with max amplitude of 32000
GeneratedSoundStream<int16_t> in(sineWave); // Stream generated from sine wave
CsvOutput<int16_t> out(Serial);
// Arduino Setup
void setup(void) {
// Open Serial
Serial.begin(115200);
AudioLogger::instance().begin(Serial, AudioLogger::Info);
// Define CSV Output
auto config = out.defaultConfig();
config.copyFrom(info);
out.begin(config);
// Setup sine wave
in.begin(info);
sineWave.begin(info, N_B4); // frequency of note B4
}
- Instead of calling sineWave.begin(info, N_B4); you could also call sineWave.setFrequency(N_B4); Please note that N_B4 is the frequency of note B4 which is 493.88f; so instead of N_B4 you could directly give the value 493.88.
- SineGenerator is just one of many other implementations. Try to replace it with some noise generator: Did you know about the different colors of noise ?
- Try to replace the CsvStream with another Audio Sink class.
So far we have set up an audio source and an audio target and all we need to do is to copy the data in the loop.
uint8_t buffer[1024];
void loop() {
size_t read = in.readBytes(buffer, 1024);
out.write(buffer, read);
}
There is one slight complication however: For most Stream implementations a write is not blocking and therefore it is not guaranteed that all byes could be processed. Therefore you need to implement some logic which is re-writing the unprocessed bytes. To simplify things we can use the StreamCopy class, which will take care of this:
StreamCopy copier(out, sound);
void loop() {
copier.copy();
}
The generated sound data will be displayed as csv on the serial monitor. The Arduino Serial Plotter is the perfect tool to visualize the result.
I also suggest that you study the available methods of the StreamCopy class: you will discover that instead of using copy in individual small steps you can call copyAll or you can use specify the time in milliseconds with copyMs.
The current proposed loop is quite lean and only contains a copy call. It is critical that the audio data is provided fast enough! Adding bigger delays can cause that the Audio Sink is running out of data. This is not critical in the CsvStream example above, but if you output e.g. to I2S any buffer underflow will be audible!
Therefore consider the following for you logic in the loop:
- only call methods that are processed fast
- you can optimize the processing by increasing the copy buffer or and/or I2S buffers or by calling copier.copyN(number) instead.
- never add any (long) delays() in the loop!
If your sketch does not allow you to follow this advice, then you can just move the copy() to a separate task!
Some people are confused why the data used by the streams is defined as uint8_t. This has been done, so that we can process any data type! If you know that you have PCM data with bits_per_sample of 16 you can simply cast the data to the proper type if you want to access it as array:
int16_t samples[512];
void loop() {
size_t bytes_read = in.readBytes((uint8_t*) samples, 512 * sizeof(int16_t));
size_t samples_read = bytes_read/sizeof(int16_t);
}
Or
int8_t buffer[1024];
int16_t *samples = (int16_t*) buffer;
void loop() {
size_t bytes_read = in.readBytes(buffer, 1024);
size_t samples_read = bytes_read/sizeof(int16_t);
}
You can e.g. access the first sample with samples[0], so
for(int i=0; i<samples_read; i++){
Serial.println(samples[i]);
}
is printing all received samples. Accessing data this way however is usually not necessary because there are better ways to process the data, as we will see in the next chapter.
You can easily change the signal by chaining transformation stream classes:
#include "AudioTools.h"
AudioInfo info(44100,2,16)
SineWaveGenerator<int16_t> sineWave(32000); // subclass of SoundGenerator with max amplitude of 32000
GeneratedSoundStream<int16_t> in(sineWave); // Stream generated from sine wave
CsvStream<int16_t> csv(Serial);
VolumeStream vol(csv);
StreamCopy copier(vol, in);
// Arduino Setup
void setup(void) {
// Open Serial
Serial.begin(115200);
AudioLogger::instance().begin(Serial, AudioLogger::Info);
// Define CSV Output
auto config = out.defaultConfig();
config.copyFrom(info);
out.begin(config);
// setup volume
auto config_vol = vol.defaultConfig();
config_vol.copyFrom(info);
config_vol.volume = 0.5; // half the volume
volume.begin(config_vol);
// Setup sine wave
sineWave.begin(info, N_B4);
}
void loop() {
copier.copy();
}
In the example above we copy the audio data to the volume control, which forwards the output to the CsvStream after adjusting the volume (=amplitude) of the signal.
You can build your own sketch by replacing the input (source) and/or the output (sink) with an alternative implementation. But before you start with any complicated combination I suggest that you test first
- a new input together with the CsvStream
- a new output together with the GeneratedSoundStream
As “Audio Sources” we will have e.g.:
- Digital Microphones – I2SStream
- Analog Microphones – AnalogAudioStream
- Files on the Internet – URLStream
- Streaming Internet Radios - ICYStream
- Generated Sound – GeneratedSoundStream
- Encoded Audio - EncodedAudioStream
- Mobile Phone A2DP Bluetooth – A2DPStream
- Binary Data in Flash Memory – MemoryStream
- Audio generated by STK Framwork - STKStream
- Desktop Integration - PortAudioStream MiniAudioStream StdioStream
- A Timer based Source - TimerCallbackAudioStream
- ESP32 Lyrat/AudioKit - AudioKitStream
- Input using FIR, IIR Filters - FilteredStream
- Tensorflow Lite - TfLiteAudioStream
- Converting Streams
- Communication - ESPNowStream, UDPStream
- Any other Arduino Classes implementing Streams: SD, Ethernet etc
As “Audio Sinks” we will have e.g:
- external DAC – I2SStream
- Analog output e.g. to an Amplifier – AnalogAudioStream
- Output using PWM – PWMAudioOutput
- Output to SPDIF/TOSLINK - SPDIFOutput
- Encoded Audio - EncodedAudioStream
- Bluetooth Speakers – A2DPStream
- Serial to display the data as CSV – CsvOutput
- Serial to display the data as hex dump - HexDumpOutput
- Encoding and Decoding of Audio EncodedAudioStream
- Desktop Integration - PortAudioStream MiniAudioStream StdioStream
- ID3 Metadata for MP3 - MetaDataID3
- A Timer based Sink - TimerCallbackAudioStream
- ESP32 Lyrat/AudioKit - AudioKitStream
- VS1053 Codec Module - VS1053Stream
- Callback integration e.g. with ESP8266Audio AudioOutputWithCallback
- Output using FIR, IRR Filters - FilteredStream
- Determine the Volume - VolumeOutput
- Split the Output to different Destinations - MultiOutput
- 3 Band Equilizer - Equilizer3Bands
- FFT - AudioRealFFT and AudioKissFFT
- Tensorflow Lite - TfLiteAudioStream
- Converting Streams
- Communication - ESPNowStream, UDPStream
- Multiuser-Webserver for PCM Output - AudioWAVServerEx
- Any other Arduino Classes implementing Streams: SD, Ethernet etc
Happy Coding...
- ESP32 Audio Tutorial with lots of examples from Andreas Spiess
- ESP32 Arduino with arduino audio tools - Using Filters from NA5Y
- HackerBox-0079-Audio-DSP