-
Notifications
You must be signed in to change notification settings - Fork 85
0.4. Data modeling
This section overviews all the scripts in the Chapter 4: Data modeling folder.
Term | Definition |
---|---|
features | descriptive numerical representations to describe an object. |
machine learning | the process of teaching a machine something that is useful. |
classification model | If the goal is to separate out into classes (e.g. male or female), then this is known as a classification problem. |
regression model | if the end goal is to measure some correlation with a variable and the output is more a numerical range (e.g. often between 0 and 1), then this is more of a regression problem. |
deep learning models | models that are trained using a neural network. |
unsupervised learning | if machines do not require labels (e.g. just need features), this is known as a unsupervised learning problem. |
supervised learning | if machines require labels (e.g. male or female as separate feature arrays), this is known as a supervised learning problem. |
training set | Machines are fed training data in the form of feature arrays and compress patterns in these feature arrays into models through algorithms. |
testing set | data that is left out during training so that the accuracy can be calculated using cross-validation techniques. |
validation set | data that is left out during training to tune hyperparameters (often used in deep learning modeling. |
label | a tag of an featurized audio sample (e.g. male or female) to aid in supervised learning. |
cross-validation | how the performance of ML models are assessed (in terms of accuracy). |
make_playlist.py (from CLI)
cd ~
cd voicebook/chapter_4_modeling/youtube_scrape
python3 make_playlist.py
what is the name of this playlist?
what is the playlist id or URL?
… [‘n’ to stop making playlist]
download_playlist.py (from CLI)
python3 download_playlist.py
what is the name of the playlist to download?
… downloads playlist to /playlist folder
label_samples.py (from CLI)
label_samples.py (from CLI)
python3 label_samples.py
what is the master label (e.g. stressed)?
stressed
sample number: 0
what is the URL of the video?
https://www.youtube.com/watch?v=47HLiAxHgdo
how long is the audio sample in seconds? (e.g. 20)
20
what are the stop and start times of the video (e.g. 0:13-0:33)
0:05-0:25
is this person stressed? 1 for yes, 0 for no
1
is this person a child (c, <13) or adolescent (d, 13-18) or adult (a, >18 <70) or elderly (e, >70)?
a
is this person male (m) or female (f)?
m
does this person have an American (a) or foreign (f) accent?
a
what is the audio quality? (1 - poor, 2 - moderate, 3 - good quality, 4 - high quality)3
is the environment indoors (i) or outdoors (o)?i
sample number: 1
what is the URL of the video?
...After entering [‘’] here, it ends the script and outputs excel sheet below.
y_scrape.py (from CLI)
Run script in terminal...
python3 y_scrape.py
Get file name to parse
what is the file name?
Stressed_1.xlsx
All the files are then downloaded (Pafy module) and converted to .wav format with FFmpeg ...
train_audioclassify.py (from CLI)
cd ~
cd voicebook/chapter_4_modeling
python3 train_audioclassify.py
# insert number of classes and class names
how many classes are you training?2
what is the folder name for class 1?schizophrenia
what is the folder name for class 2?controls
# now all the classes will featurize
SCHIZOPHRENIA - featurizing snipped38_start_2_end_22.wav
making 0.wav
[-4.51487917e+02 1.32250653e+02 -6.48964827e+02 -2.16927909e+02...
9.57062705e-04 4.54699943e-02 -5.85259705e-02 5.74577384e-02]
...
Decision tree accuracy (+/-) 0.20779263167344933
0.5733333333333334
Gaussian NB accuracy (+/-) 0.1305543735171076
0.7866666666666667
SKlearn classifier accuracy (+/-) 0.039999999999999994
0.48
Adaboost classifier accuracy (+/-) 0.22666666666666668
0.6366666666666667
Gradient boosting accuracy (+/-) 0.1319090595827292
0.6599999999999999
Logistic regression accuracy (+/-) 0.07557189365836424
0.7366666666666667
Hard voting accuracy (+/-) 0.2341889076033373
0.6766666666666666
K Nearest Neighbors accuracy (+/-) 0.12666666666666668
0.5633333333333332
Random forest accuracy (+/-) 0.2758824226207808
0.7333333333333333
svm accuracy (+/-) 0.13556466271775172
0.7533333333333333
most accurate classifier is Gaussian NB with audio features (mfcc coefficients).
saving classifier to disk.
Summarizing session…
GaussianNB(priors=None)
['gaussian-nb', 0.7866666666666667, 0.1305543735171076]
load_audioclassify.py (from CLI)
python3 load_audioclassify.py
This results in an output:
{"filename": "348.wav", "filetype": "audio file", "class": ["controls"], "model": ["schizophrenia_controls_sc_audio.pickle"], "model accuracies": [0.7866666666666667], "model deviations": [0.1305543735171076], "model types": ["gaussian-nb"], "features": [[-322.9664360980726, 59.53868288968913, -462.5294083924505, -166.3993076206564, 131.38738649438437, 52.44671783868567, -33.74398658437562, 227.8102207133376, 9.52738149362727, 28.505927165579884, -90.65927286414657, 71.52976680142815, 9.73530102063688, 25.62432182324615, -66.02663398503707, 73.87513246074612, -1.596002360610912, 22.81632350096357, -87.30807566263049, 41.72876898633217, 0.8865486997595385, 17.735652130525168, -65.99456073539176, 52.43567091641821, -14.286216477070838, 14.128449781073533, -59.836804831757654, 18.175026917411316, -9.131276510645463, 13.701302570519355, -57.44541029310883, 25.74622598177111, -4.545971824836885, 10.899138142787697, -42.116927063121395, 29.536967420470695, -3.4558647963609186, 10.31513522815575, -36.17230935229129, 26.551369428146693, -3.6667095757279236, 10.079488079876286, -33.78123311320836, 26.14112294381864, 5.366060779304841, 8.570956061981061, -19.248854886451802, 38.20513572569962, -5.458667628428172, 7.490745204714798, -31.338790159786562, 12.539046082339311, 0.024288590342538358, 10.584946850085212, -34.52340818393254, 38.15078289969128, -0.156898762979172, 11.158828455811786, -34.10403400345244, 30.973152153233336, 0.020648845552068328, 5.827064754672902, -22.052042500906857, 16.81872640844321, -0.06170338085832314, 5.229174923928, -14.518978383592026, 14.845857302315114, -0.04962607796690964, 4.5211806494022735, -14.998074177634704, 12.378100326632655, 0.07415513595268168, 3.724070455888158, -9.939566189661432, 10.85577098792062, -0.017072005372266726, 2.7463908847692204, -6.600475000502117, 6.524786791283427, -0.02310274039018664, 2.7092557498939636, -7.467322311111723, 7.481090337383571, 0.04464197716713606, 2.198722832501255, -6.88438775831641, 7.844106037059699, 0.045382707259550105, 2.0580935158253872, -6.638462605186588, 5.991186816663746, -0.013702557713332408, 1.9496130791163644, -6.458246324901151, 5.7716202748695, -0.007340250450717803, 1.6409103586116958, -5.380714141939734, 5.539025057788075, 0.011411587050311969, 1.3949062816882583, -4.390308824019425, 4.13132941219398, -291.2947346432915, 49.04737058565422, -381.6816501283554, -222.9638855557117, 158.3460978309033, 23.15415034729552, 99.62697329203677, 189.70121020164896, 7.287058326977949, 30.77474443760493, -38.71222832828984, 56.208286170618955, -1.0950341842073796, 21.3498811006992, -34.41685805740065, 31.926254848624147, -9.172025861857653, 11.511454213511039, -29.874153138705573, 8.203596981294625, 2.6663941698626865, 6.753684660513026, -5.9061357505887955, 19.305474480034082, -14.088225581455214, 17.47630600064678, -49.8886801840349, 8.935818425975743, -13.521963272886959, 8.25999525518404, -24.851695100203774, -0.11752456737790722, -12.762992506945213, 8.598616338770906, -29.72115313687536, 0.05275012294025435, -4.531403069755177, 11.8713757531457, -24.376936764599744, 12.207624665298002, -2.6914750628989266, 14.673164819510685, -22.308447521294887, 17.767626038347583, 11.80700932417913, 10.516802160193405, -13.092759032892214, 24.963056992755536, -10.390953114902164, 6.1887066403103965, -20.39253124562046, 2.7941268719402848, -4.41480192601625, 7.0550461587501, -15.045545852884578, 5.8468320221431656, 0.22555437964894862, 4.881477566532211, -4.990490269946867, 8.519079155558249, 3.4745028409138827, 2.7045163211953187, 0.5391937155699558, 8.988399905874912, 0.45051536549204274, 4.824805683998831, -4.424922867740668, 9.67554223394205, -0.8502687288362012, 3.1351941328536777, -4.844124443962841, 4.754766492721427, 0.870140923131266, 1.1137966493666094, -1.4131441258277446, 2.418345086057676, 2.4254793474500635, 1.2058772715931956, 0.5825294849801214, 4.536777131050609, 0.10251353649615984, 1.51146113365032, -1.4592806547585204, 3.291502702928505, -1.075428938064348, 1.0559521971759946, -2.4408814841825865, 1.12308565480587, -0.3002420005778045, 2.4751693616737347, -3.6333810904861688, 3.34737386167248, 0.17805269515377548, 3.7250267108754236, -5.189309157660288, 5.579262003298437, 0.24091712079378458, 2.451817967640338, -5.215650064568107, 2.3865116769275567, 0.003640041240486553, 1.4235044885102617, -2.379919268715038, 1.5581599658532437]], "count": 0, "errorcount": 0}```
## 4.5 - Regression models
train_audioregression.py (from CLI)
cd ~ cd voicebook/chapter_4_modeling/ python3 train_audioregression.py what is the name of the file in /data directory you would like to analyze? africanamerican_controls.json RESULTS: +-------------------------------------------+-----------+----------------------+ | model type | R^2 score | Mean Absolute Errors | +-------------------------------------------+-----------+----------------------+ | linear regression | -1.672 | 0.656 | +-------------------------------------------+-----------+----------------------+ | ridge regression | 0.047 | 0.367 | +-------------------------------------------+-----------+----------------------+ | LASSO | 0.426 | 0.273 | +-------------------------------------------+-----------+----------------------+ | elastic net | 0.483 | 0.255 | +-------------------------------------------+-----------+----------------------+ | Least angle regression (LARS) | 0.065 | 0.478 | +-------------------------------------------+-----------+----------------------+ | LARS lasso | -0.025 | 0.502 | +-------------------------------------------+-----------+----------------------+ | orthogonal matching pursuit (OMP) | -0.032 | 0.39 | +-------------------------------------------+-----------+----------------------+ | logistic regression | -0.019 | 0.253 | +-------------------------------------------+-----------+----------------------+ | stochastic gradient descent (SGD) | -0.153 | 0.41 | +-------------------------------------------+-----------+----------------------+ | perceptron | -7.297 | 1.131 | +-------------------------------------------+-----------+----------------------+ | passive-agressive algorithm | 0.316 | 0.329 | +-------------------------------------------+-----------+----------------------+ | RANSAC | 0.316 | 0.329 | +-------------------------------------------+-----------+----------------------+ | Theil-Sen | -1.672 | 0.674 | +-------------------------------------------+-----------+----------------------+ | huber regression | -0.582 | 0.49 | +-------------------------------------------+-----------+----------------------+ | polynomial (linear regression) | -0.582 | 0.49 | +-------------------------------------------+-----------+----------------------+ logistic regression has the lowest mean absolute error (0.25252525252525254) saving file to disk (africanamerican_controls_regression.pickle)...
### loading regression models
load_audioregression.py
python3 load_audioregression.py 1.0 controls
## Deep learning models
keras_mlp.py
```python3
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
train_audiokeras.py
cd ~
cd voicebook/chapter_4_modeling
python3 train_audiokeras.py
folder name 1
africanamerican
folder name 2
controls
...
[[1.]]
Epoch 1/20
149/149 [==============================] - 0s 2ms/step - loss: 3.8728 - acc: 0.3423
Epoch 2/20
149/149 [==============================] - 0s 29us/step - loss: 0.3178 - acc: 0.3624
Epoch 3/20
149/149 [==============================] - 0s 26us/step - loss: -0.0068 - acc: 0.4228
...
final acc: 50.34%
...
Saved africanamerican_controls_dl_audio.json model to disk
summarizing data...
testing loaded model
'Loaded model from disk'
[[1.]]
train_audioTPOT.py
cd ~
cd voicebook/chapter_4_modeling/
python3 train_audioTPOT.py
classification (c) or regression (r) problem?
c
what is the name of class 1?
africanamerican
what is the name of class 2?
controls
Generation 1 - Current best internal CV score: 0.9056433904259992
Generation 2 - Current best internal CV score: 0.9100878348704435
Generation 3 - Current best internal CV score: 0.9100878348704435
Generation 4 - Current best internal CV score: 0.9100878348704435
Generation 5 - Current best internal CV score: 0.9191787439613526
Best pipeline: LogisticRegression(LogisticRegression(MinMaxScaler(StandardScaler(input_matrix)), C=1.0, dual=False, penalty=l1), C=5.0, dual=True, penalty=l2)
saving classifier to disk
Loading TPOT classification models: load_audioTPOT.py
Jims-MBP:~ jimschwoebel$ cd voicebook/chapter_4_modeling
Jims-MBP:chapter_4_modeling jimschwoebel$ python3 load_audiotpot.py
making 0.wav
making 1.wav
making 2.wav
...
making 36.wav
making 37.wav
making 38.wav
controls
cd ~
cd voicebook/chapter_4_modeling/
python3 train_audioTPOT.py
classification (c) or regression (r) problem?
r
what is the name of class 1?
africanamerican
what is the name of class 2?
Controls
Generation 1 - Current best internal CV score: -0.06707070707070706
Generation 2 - Current best internal CV score: -0.06707070707070706
Generation 3 - Current best internal CV score: -0.06707070707070706
Generation 4 - Current best internal CV score: -0.062207740346188735
Generation 5 - Current best internal CV score: -0.062207740346188735
Best pipeline: KNeighborsRegressor(input_matrix, n_neighbors=4, p=1, weights=distance)
saving classifier to disk
Loading TPOT regression models: load_audioTPOT.py
Jims-MBP:~ jimschwoebel$ cd voicebook/chapter_4_modeling
Jims-MBP:chapter_4_modeling jimschwoebel$ python3 load_audiotpot.py
making 0.wav
making 1.wav
making 2.wav
...
making 36.wav
making 37.wav
making 38.wav
controls
controls
Obtaining training data
- Common Voice Dataset
- Google Audioset
- NeuroLex Disease Dataset
Data labeling
- Pandas
- Xlsxwriter
- Pytube
Featurization
- SpeechRecognition
- Librosa
- PyAudioAnalysis
- Spacy
- NLTK
- Gensim
Building machine learning classifiers
- Numpy
- Scikit-learn
Building regression models
- Statsmodels
- Scikit-learn
Deep learning
- Keras
- Tensorflow
- Deep learning book
- Udacity class
AutoML
- Autokeras
- TPOT
- Devol
- Clarifai
- H20.ai
- DataRobot
- Google Cloud ML engine
- Microsoft Azure ML