This repository contians multiple notebooks exploring the dataset from the Freesound General-Purpose Audio Tagging Challenge, which is available to download on Kaggle: https://www.kaggle.com/c/freesound-audio-tagging/overview
What we have created is a model for classifying the different aduio files from the dataset into the 41 different categories. After exporting this model we experimented with transfer learning and used our own pretrained model on a new dataset. For this we used the UrbanSound Dataset. More info on this dataset and download can be found on this site: https://urbansounddataset.weebly.com/urbansound.html
Our methods are mostly built on the fastai library and the methods used in the fastai course.
Librosa is also used a lot to process and manage the audiofiles and make image representations of the audio files in order to classify them as images.
Machine learning with audio can be used in many different situations. From simple usage like creating an application for predicting bird sounds to more complex task like audio transcription. The model we have created might not have a direct purpose in solving a problem. However it does give a good indication on what is possible with audio classification. As mentioned we also did transfer learning with our model which gave us a better result on the new dataset than just starting with the resnet34 from fastai. A lot of the labels in the freesound-audio-tagging dataset where everyday sounds and a lot of human produced noises like laughing or burping. Since the model now recognized these sounds it could potentially be helpful when training a model to transcirbe spoken sentences or words where there might be background noise or other intrusive noises present.
We have also created an application from our model that lets you upload an audiofile as .wav or .mp3 and get a classification from the 41 classes as output. To do this we have used Voilà as a jupyter server extension together with ipywidgets. By using these libraries we can create an application from the notebook which hides all the code cells. From there it is possible to use binder to deploy it for free. We got some problems when connecting it to the github repository and deploying, but it should be very possible with a little bit more time. Right now since the application is not hosted anywhere you will have to use voilà directly from the notebook.
Just run these two lines in the notebook:
!pip install voila
!jupyter serverextension enable voila --sys-prefix
Then change the url from
http://localhost:8888/notebooks/Documents/V22/DAT255/DAT255-project/nbs/Application.ipynb
to
http://localhost:8888/voila/render/Documents/V22/DAT255/DAT255-project/nbs/Application.ipynb
Then you should see something like this:
We also experimented with adding a built in recorder so that you can chose to either upload a file or record the audio directly in the application and classify that. We made a working version in Application_V2.ipynb, however it did not look very good so we decided to include both of the notebooks.
There is also a notebook with the same code as in application but with some documentation as well: Application_doc.ipynb.