Skip to content

oyvindgrutle/DAT255-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAT255-project

Table of contents

Notebooks
0-Downloading-and-Exploration.ipynb
1-Free-Sound-Classifier.ipynb
2-Transfer-Learning-Data-Collection.ipynb
3-Transfer-Learning-Urban-Sound-Classifier.ipynb
4-Free-Sound-Classifier-App.ipynb

Description

This repository contians multiple notebooks exploring the dataset from the Freesound General-Purpose Audio Tagging Challenge, which is available to download on Kaggle: https://www.kaggle.com/c/freesound-audio-tagging/overview

What we have created is a model for classifying the different aduio files from the dataset into the 41 different categories. After exporting this model we experimented with transfer learning and used our own pretrained model on a new dataset. For this we used the UrbanSound Dataset. More info on this dataset and download can be found on this site: https://urbansounddataset.weebly.com/urbansound.html

Our methods are mostly built on the fastai library and the methods used in the fastai course.

Librosa is also used a lot to process and manage the audiofiles and make image representations of the audio files in order to classify them as images.

Motivation

Machine learning with audio can be used in many different situations. From simple usage like creating an application for predicting bird sounds to more complex task like audio transcription. The model we have created might not have a direct purpose in solving a problem. However it does give a good indication on what is possible with audio classification. As mentioned we also did transfer learning with our model which gave us a better result on the new dataset than just starting with the resnet34 from fastai. A lot of the labels in the freesound-audio-tagging dataset where everyday sounds and a lot of human produced noises like laughing or burping. Since the model now recognized these sounds it could potentially be helpful when training a model to transcirbe spoken sentences or words where there might be background noise or other intrusive noises present.

Application

We have also created an application from our model that lets you upload an audiofile as .wav or .mp3 and get a classification from the 41 classes as output. To do this we have used Voilà as a jupyter server extension together with ipywidgets. By using these libraries we can create an application from the notebook which hides all the code cells. From there it is possible to use binder to deploy it for free. We got some problems when connecting it to the github repository and deploying, but it should be very possible with a little bit more time. Right now since the application is not hosted anywhere you will have to use voilà directly from the notebook.

Just run these two lines in the notebook:

!pip install voila

!jupyter serverextension enable voila --sys-prefix

Then change the url from

http://localhost:8888/notebooks/Documents/V22/DAT255/DAT255-project/nbs/Application.ipynb

to

http://localhost:8888/voila/render/Documents/V22/DAT255/DAT255-project/nbs/Application.ipynb

Then you should see something like this:

image

Application with built in recording

We also experimented with adding a built in recorder so that you can chose to either upload a file or record the audio directly in the application and classify that. We made a working version in Application_V2.ipynb, however it did not look very good so we decided to include both of the notebooks.

image

There is also a notebook with the same code as in application but with some documentation as well: Application_doc.ipynb.


@oyvindgrutle

@Jethuestad

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published