This repository contains code for estimating features of mixed sounds using the pre-computed features of individual sound sources. The code accompanies this paper:
- Jon Gillick, Carmine-Emanuele Cella, and David Bamman, "Estimating Unobserved Audio Features for Target-Based Orchestration", ISMIR 2019.
- We re-ran the FFT prediction experiments presented in Figure 3 in the paper, but this time training for longer and using 200,000 datapoints instead of 7500. After this change, the predictions are substantially improved - updated plots are posted here in the results folder.
This code was developed using the OrchDB dataset of individual instrument samples. This task of predicting the way that signals mix together can be useful particularly in the context of searching for ways to automatically orchestrate or layer sounds together.
To reproduce the experiments in the paper:
-
Download the OrchDB dataset.
-
Generate datasets by randomly combining individual notes together using generate_datasets.py. The script generates notes for a few different fixed values of "M" (2,3,6,12,20,20), where M in the number of notes in each mixture. You can change this value in the script. Precomputed features (energy-weight FFT or MFCC) will be computed for each note in the data. To use other features, you can replace the ones defined in features.py.
python generate_datasets.py --db_path=OrchDB/OrchDB_flat --generated_dataset_path=generated_data --num_train_datapoints=20000 --num_parallel_processes=<your_number_of_parallel_processes>
- Train and save models for prediction using train.py.
python train.py --data_path=generated_data --output_path=saved_models
- You can generate the plots and results from the paper in the Analysis notebook. We've improved the results for the FFT prediction task a bit quite a bit beyond those reported in the paper by generating more data in step 3.