Deep Learning For Monaural Source Separation
https://sites.google.com/site/deeplearningsourceseparation/
Let me know if you have any question Po-Sen Huang ([email protected])
-
The package is modified based on rnn-speech-denoising. Reference: https://github.com/amaas/rnn-speech-denoising
-
The software depends on Mark Schmidt's minFunc package for convex optimization. Reference: http://www.di.ens.fr/~mschmidt/Software/minFunc.html
-
Additionally, we have included Mark Hasegawa-Johnson's HTK write and read functions that are used to handle the MFCC files. Reference: http://www.isle.illinois.edu/sst/software/
-
We use HTK for computing features (MFCC, logmel) (HCopy). Reference: http://htk.eng.cam.ac.uk/
-
We use signal processing functions from labrosa. Reference: http://labrosa.ee.columbia.edu/
-
We use BSS Eval toolbox Version 2.0, 3.0 for evaluation. Reference: http://bass-db.gforge.inria.fr/bss_eval/
-
We use MIR-1K for singing voice separation task. Reference: https://sites.google.com/site/unvoicedsoundseparation/mir-1k
MIR-1K experiment:
-
training: codes/mir1k/train_mir1k_demo.m
-
testing: codes/mir1k/run_test_single_mode.m
-
trained model: http://www.ifp.illinois.edu/~huang146/DNN_separation/model_400.mat -> put the model at codes/mir1k/model_demo
TIMIT experiment:
- codes/timit/train_timit_demo.m
(change baseDir to the path with this README file)
Your data:
-
To try the codes on your data, see mir1k setting - put your data into codes/mir1k/Wavfile accordingly.
-
Look at the unit test parameters below codes/mir1k/train_mir1k_demo.m.
-
Tune the parameters and check the results.
Add more unit tests, comments, timit example
P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, "Singing-Voice Separation From Monaural Recordings Using Deep Recurrent Neural Networks," in International Society for Music Information Retrieval Conference (ISMIR) 2014.
P.-S. Huang, M. Kim, M. Hasegawa-Johnson, P. Smaragdis, "Deep Learning for Monaural Speech Separation," in IEEE International Conference on Acoustic, Speech and Signal Processing 2014.