A Comprehensive Audio Analysis Pipeline
MIRAGE is a Python-based pipeline designed for researchers to analyze, modify, and compare audio files, with a particular focus on music. This pipeline uses Demucs for stem separation, Librosa for feature extraction, and SQL-based data storage to enable high-level data organization and retrieval. MIRAGE is versatile and equipped for tasks ranging from creating formulas to reconstructing audio files from formula-based transformations.
MIRAGE stores song information and extracted features in a SQLite database for easy access and management.
id | artist_name | song_name |
---|---|---|
1 | Artist Name | Song Title |
Each song entry is unique by artist name and song title. MIRAGE checks for duplicate entries before adding a new song to the database.
song_id | stem | feature_name | feature_values |
---|---|---|---|
1 | vocals | mfccs | -123.45, -120.67, ... |
1 | vocals | chroma | 0.12, 0.34, ... |
1 | drums | mfccs | -130.56, -125.78, ... |
... | ... | ... | ... |
Each row corresponds to a specific feature (such as mfccs
or mel_spectrogram
) of a particular stem (e.g., vocals, bass, or drums). Feature values are stored as serialized arrays to enable efficient data access and comparison.
For each stem, MIRAGE extracts the following features:
- MFCCs (Mel-Frequency Cepstral Coefficients): 20 coefficients to capture spectral characteristics.
- Chroma: 12 chroma bins representing the 12 semitones in an octave.
- Spectral Contrast: 7 coefficients capturing contrast between peaks and valleys in each sub-band.
- Tonnetz: A 6-dimensional representation of tonal centroid features.
Additional features such as mel spectrogram are available and stored as needed. The dimensionality of each feature type is determined by standard parameters used in librosa
.
Upon running the pipeline, the user is prompted to select one of the following options:
- User Inputs: Artist name, song title, and file path.
- Process:
- The audio file is loaded, normalized, and saved in stereo format.
- MIRAGE extracts selected features (MFCCs, chroma, mel spectrogram) for both the full song and individual stems (vocals, drums, bass, other).
- Checks for duplicate songs based on artist name, song title, and feature similarity. If a song is nearly identical to an existing song, it only performs stem separation without adding a duplicate entry.
- Output:
- Original audio and stems saved in organized directories.
- OSCR metric (original and reconstructed song similarity) calculated and displayed.
- Song data, stem features, and computed metrics saved to the database.
Displays all songs and features currently stored in the database. This option helps researchers quickly access the stored audio files and their associated features.
- User Prompts:
- User selects two songs from the database.
- Process:
- For each selected feature (mel spectrogram, MFCCs, chroma, spectral contrast, and tonnetz), MIRAGE calculates the cosine similarity between the corresponding stems.
- Output:
- Cosine similarity scores for each feature are displayed, allowing users to quantify similarities between two songs at various levels of the audio structure.
- User Prompts:
- Select a song to reconstruct.
- If the stems are unavailable, the user is prompted to split them.
- User selects which stems (e.g., vocals, drums, bass) to combine into a new audio file.
- Process:
- MIRAGE combines the selected stems and saves the reconstructed audio in a custom directory.
- Output:
- A combined audio file is saved, allowing researchers to analyze specific portions of the song or create new audio samples.
- User Prompts:
- Select a song to convert.
- Process:
- MIRAGE performs a Short-Time Fourier Transform (STFT) on the audio file to represent each time slice as a formula based on frequency, magnitude, and phase.
- The resulting equations are stored in a text file.
- Output:
- A directory containing the song's formula text file. This file serves as an analytical representation of the song's structure and allows researchers to synthesize the song later.
- User Prompts:
- Select a formula text file to synthesize.
- Process:
- MIRAGE uses the saved formula to recreate the song by reconstructing the original waveform.
- Output:
- A synthesized version of the song is saved in the designated directory.
- output/original: Contains original versions of all added songs.
- output/stems: Stores the separated stems (vocals, drums, bass, other) for each song.
- output/reconstructed: Contains audio reconstructed from the saved stems.
- output/formulas: Stores formula-based representations of songs as text files.
- output/synthesized: Contains synthesized versions of songs recreated from formulas.
- output/custom merged stems: Stores custom reconstructions where the user has combined specific stems.
This folder uses the rvc_python package and creates a command line tool to be able to run inference using pre-trained retrieval-based voice conversion models on given input audios. Steps for specifics on how to run rvc_programmatic can be found in {root_directory}/research/rvc_programmatic/README.md
audio_splitter is a command line tool that segments audio files into roughly even length segments to be used as training data for the RVC model. Segments are only roughly even because audio is preprocessed to ensure that nonsilent chunks of audio are discarded, and some chunks of non-silent audio will not be of the same length. Segments are not combined to be even length because audio between silences could vary greatly and make for poor training data. Steps for how to run this command line tool can be found in {root_directory}/audio_splitter/README.md
- Main Folder: Google Drive Link
- Documentation: MIRAGE Documentation
- XAI Project Thought & Task List: Access this on the iPhone Notes app (if shared).