-
Notifications
You must be signed in to change notification settings - Fork 85
0.4. Data modeling
This section documents all the scripts in the chapter_4_modeling folder.
Term | Definition |
---|---|
features | descriptive numerical representations to describe an object. |
machine learning | the process of teaching a machine something that is useful. |
classification model | If the goal is to separate out into classes (e.g. male or female), then this is known as a classification problem. |
regression model | if the end goal is to measure some correlation with a variable and the output is more a numerical range (e.g. often between 0 and 1), then this is more of a regression problem. |
deep learning models | models that are trained using a neural network. |
unsupervised learning | if machines do not require labels (e.g. just need features), this is known as a unsupervised learning problem. |
supervised learning | if machines require labels (e.g. male or female as separate feature arrays), this is known as a supervised learning problem. |
training set | Machines are fed training data in the form of feature arrays and compress patterns in these feature arrays into models through algorithms. |
testing set | data that is left out during training so that the accuracy can be calculated using cross-validation techniques. |
validation set | data that is left out during training to tune hyperparameters (often used in deep learning modeling. |
label | a tag of an featurized audio sample (e.g. male or female) to aid in supervised learning. |
cross-validation | how the performance of ML models are assessed (in terms of accuracy). |
make_playlist.py (from CLI)
cd ~
cd voicebook/chapter_4_modeling/youtube_scrape
python3 make_playlist.py
what is the name of this playlist?
what is the playlist id or URL?
… [‘n’ to stop making playlist]
download_playlist.py (from CLI)
python3 download_playlist.py
what is the name of the playlist to download?
… downloads playlist to /playlist folder
label_samples.py (from CLI)
label_samples.py (from CLI)
python3 label_samples.py
what is the master label (e.g. stressed)?
stressed
sample number: 0
what is the URL of the video?
https://www.youtube.com/watch?v=47HLiAxHgdo
how long is the audio sample in seconds? (e.g. 20)
20
what are the stop and start times of the video (e.g. 0:13-0:33)
0:05-0:25
is this person stressed? 1 for yes, 0 for no
1
is this person a child (c, <13) or adolescent (d, 13-18) or adult (a, >18 <70) or elderly (e, >70)?
a
is this person male (m) or female (f)?
m
does this person have an American (a) or foreign (f) accent?
a
what is the audio quality? (1 - poor, 2 - moderate, 3 - good quality, 4 - high quality)3
is the environment indoors (i) or outdoors (o)?i
sample number: 1
what is the URL of the video?
...After entering [‘’] here, it ends the script and outputs excel sheet below.
y_scrape.py (from CLI)
Run script in terminal...
python3 y_scrape.py
Get file name to parse
what is the file name?
Stressed_1.xlsx
All the files are then downloaded (Pafy module) and converted to .wav format with FFmpeg ...
train_audioclassify.py (from CLI)
cd ~
cd voicebook/chapter_4_modeling
python3 train_audioclassify.py
# insert number of classes and class names
how many classes are you training?2
what is the folder name for class 1?schizophrenia
what is the folder name for class 2?controls
# now all the classes will featurize
SCHIZOPHRENIA - featurizing snipped38_start_2_end_22.wav
making 0.wav
[-4.51487917e+02 1.32250653e+02 -6.48964827e+02 -2.16927909e+02...
9.57062705e-04 4.54699943e-02 -5.85259705e-02 5.74577384e-02]
...
Decision tree accuracy (+/-) 0.20779263167344933
0.5733333333333334
Gaussian NB accuracy (+/-) 0.1305543735171076
0.7866666666666667
SKlearn classifier accuracy (+/-) 0.039999999999999994
0.48
Adaboost classifier accuracy (+/-) 0.22666666666666668
0.6366666666666667
Gradient boosting accuracy (+/-) 0.1319090595827292
0.6599999999999999
Logistic regression accuracy (+/-) 0.07557189365836424
0.7366666666666667
Hard voting accuracy (+/-) 0.2341889076033373
0.6766666666666666
K Nearest Neighbors accuracy (+/-) 0.12666666666666668
0.5633333333333332
Random forest accuracy (+/-) 0.2758824226207808
0.7333333333333333
svm accuracy (+/-) 0.13556466271775172
0.7533333333333333
most accurate classifier is Gaussian NB with audio features (mfcc coefficients).
saving classifier to disk.
Summarizing session…
GaussianNB(priors=None)
['gaussian-nb', 0.7866666666666667, 0.1305543735171076]
load_audioclassify.py (from CLI)
python3 load_audioclassify.py
This results in an output:
{"filename": "348.wav", "filetype": "audio file", "class": ["controls"], "model": ["schizophrenia_controls_sc_audio.pickle"], "model accuracies": [0.7866666666666667], "model deviations": [0.1305543735171076], "model types": ["gaussian-nb"], "features": [[-322.9664360980726, 59.53868288968913, -462.5294083924505, -166.3993076206564, 131.38738649438437, 52.44671783868567, -33.74398658437562, 227.8102207133376, 9.52738149362727, 28.505927165579884, -90.65927286414657, 71.52976680142815, 9.73530102063688, 25.62432182324615, -66.02663398503707, 73.87513246074612, -1.596002360610912, 22.81632350096357, -87.30807566263049, 41.72876898633217, 0.8865486997595385, 17.735652130525168, -65.99456073539176, 52.43567091641821, -14.286216477070838, 14.128449781073533, -59.836804831757654, 18.175026917411316, -9.131276510645463, 13.701302570519355, -57.44541029310883, 25.74622598177111, -4.545971824836885, 10.899138142787697, -42.116927063121395, 29.536967420470695, -3.4558647963609186, 10.31513522815575, -36.17230935229129, 26.551369428146693, -3.6667095757279236, 10.079488079876286, -33.78123311320836, 26.14112294381864, 5.366060779304841, 8.570956061981061, -19.248854886451802, 38.20513572569962, -5.458667628428172, 7.490745204714798, -31.338790159786562, 12.539046082339311, 0.024288590342538358, 10.584946850085212, -34.52340818393254, 38.15078289969128, -0.156898762979172, 11.158828455811786, -34.10403400345244, 30.973152153233336, 0.020648845552068328, 5.827064754672902, -22.052042500906857, 16.81872640844321, -0.06170338085832314, 5.229174923928, -14.518978383592026, 14.845857302315114, -0.04962607796690964, 4.5211806494022735, -14.998074177634704, 12.378100326632655, 0.07415513595268168, 3.724070455888158, -9.939566189661432, 10.85577098792062, -0.017072005372266726, 2.7463908847692204, -6.600475000502117, 6.524786791283427, -0.02310274039018664, 2.7092557498939636, -7.467322311111723, 7.481090337383571, 0.04464197716713606, 2.198722832501255, -6.88438775831641, 7.844106037059699, 0.045382707259550105, 2.0580935158253872, -6.638462605186588, 5.991186816663746, -0.013702557713332408, 1.9496130791163644, -6.458246324901151, 5.7716202748695, -0.007340250450717803, 1.6409103586116958, -5.380714141939734, 5.539025057788075, 0.011411587050311969, 1.3949062816882583, -4.390308824019425, 4.13132941219398, -291.2947346432915, 49.04737058565422, -381.6816501283554, -222.9638855557117, 158.3460978309033, 23.15415034729552, 99.62697329203677, 189.70121020164896, 7.287058326977949, 30.77474443760493, -38.71222832828984, 56.208286170618955, -1.0950341842073796, 21.3498811006992, -34.41685805740065, 31.926254848624147, -9.172025861857653, 11.511454213511039, -29.874153138705573, 8.203596981294625, 2.6663941698626865, 6.753684660513026, -5.9061357505887955, 19.305474480034082, -14.088225581455214, 17.47630600064678, -49.8886801840349, 8.935818425975743, -13.521963272886959, 8.25999525518404, -24.851695100203774, -0.11752456737790722, -12.762992506945213, 8.598616338770906, -29.72115313687536, 0.05275012294025435, -4.531403069755177, 11.8713757531457, -24.376936764599744, 12.207624665298002, -2.6914750628989266, 14.673164819510685, -22.308447521294887, 17.767626038347583, 11.80700932417913, 10.516802160193405, -13.092759032892214, 24.963056992755536, -10.390953114902164, 6.1887066403103965, -20.39253124562046, 2.7941268719402848, -4.41480192601625, 7.0550461587501, -15.045545852884578, 5.8468320221431656, 0.22555437964894862, 4.881477566532211, -4.990490269946867, 8.519079155558249, 3.4745028409138827, 2.7045163211953187, 0.5391937155699558, 8.988399905874912, 0.45051536549204274, 4.824805683998831, -4.424922867740668, 9.67554223394205, -0.8502687288362012, 3.1351941328536777, -4.844124443962841, 4.754766492721427, 0.870140923131266, 1.1137966493666094, -1.4131441258277446, 2.418345086057676, 2.4254793474500635, 1.2058772715931956, 0.5825294849801214, 4.536777131050609, 0.10251353649615984, 1.51146113365032, -1.4592806547585204, 3.291502702928505, -1.075428938064348, 1.0559521971759946, -2.4408814841825865, 1.12308565480587, -0.3002420005778045, 2.4751693616737347, -3.6333810904861688, 3.34737386167248, 0.17805269515377548, 3.7250267108754236, -5.189309157660288, 5.579262003298437, 0.24091712079378458, 2.451817967640338, -5.215650064568107, 2.3865116769275567, 0.003640041240486553, 1.4235044885102617, -2.379919268715038, 1.5581599658532437]], "count": 0, "errorcount": 0}
train_audioregression.py (from CLI)
cd ~
cd voicebook/chapter_4_modeling/
python3 train_audioregression.py
what is the name of the file in /data directory you would like to analyze?
africanamerican_controls.json
RESULTS:
+-------------------------------------------+-----------+----------------------+
| model type | R^2 score | Mean Absolute Errors |
+-------------------------------------------+-----------+----------------------+
| linear regression | -1.672 | 0.656 |
+-------------------------------------------+-----------+----------------------+
| ridge regression | 0.047 | 0.367 |
+-------------------------------------------+-----------+----------------------+
| LASSO | 0.426 | 0.273 |
+-------------------------------------------+-----------+----------------------+
| elastic net | 0.483 | 0.255 |
+-------------------------------------------+-----------+----------------------+
| Least angle regression (LARS) | 0.065 | 0.478 |
+-------------------------------------------+-----------+----------------------+
| LARS lasso | -0.025 | 0.502 |
+-------------------------------------------+-----------+----------------------+
| orthogonal matching pursuit (OMP) | -0.032 | 0.39 |
+-------------------------------------------+-----------+----------------------+
| logistic regression | -0.019 | 0.253 |
+-------------------------------------------+-----------+----------------------+
| stochastic gradient descent (SGD) | -0.153 | 0.41 |
+-------------------------------------------+-----------+----------------------+
| perceptron | -7.297 | 1.131 |
+-------------------------------------------+-----------+----------------------+
| passive-agressive algorithm | 0.316 | 0.329 |
+-------------------------------------------+-----------+----------------------+
| RANSAC | 0.316 | 0.329 |
+-------------------------------------------+-----------+----------------------+
| Theil-Sen | -1.672 | 0.674 |
+-------------------------------------------+-----------+----------------------+
| huber regression | -0.582 | 0.49 |
+-------------------------------------------+-----------+----------------------+
| polynomial (linear regression) | -0.582 | 0.49 |
+-------------------------------------------+-----------+----------------------+
logistic regression has the lowest mean absolute error (0.25252525252525254)
saving file to disk (africanamerican_controls_regression.pickle)...
load_audioregression.py
python3 load_audioregression.py
1.0
controls
keras_mlp.py
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=['accuracy'])
# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
train_audiokeras.py
cd ~
cd voicebook/chapter_4_modeling
python3 train_audiokeras.py
folder name 1
africanamerican
folder name 2
controls
...
[[1.]]
Epoch 1/20
149/149 [==============================] - 0s 2ms/step - loss: 3.8728 - acc: 0.3423
Epoch 2/20
149/149 [==============================] - 0s 29us/step - loss: 0.3178 - acc: 0.3624
Epoch 3/20
149/149 [==============================] - 0s 26us/step - loss: -0.0068 - acc: 0.4228
...
final acc: 50.34%
...
Saved africanamerican_controls_dl_audio.json model to disk
summarizing data...
testing loaded model
'Loaded model from disk'
[[1.]]
train_audioTPOT.py
cd ~
cd voicebook/chapter_4_modeling/
python3 train_audioTPOT.py
classification (c) or regression (r) problem?
c
what is the name of class 1?
africanamerican
what is the name of class 2?
controls
Generation 1 - Current best internal CV score: 0.9056433904259992
Generation 2 - Current best internal CV score: 0.9100878348704435
Generation 3 - Current best internal CV score: 0.9100878348704435
Generation 4 - Current best internal CV score: 0.9100878348704435
Generation 5 - Current best internal CV score: 0.9191787439613526
Best pipeline: LogisticRegression(LogisticRegression(MinMaxScaler(StandardScaler(input_matrix)), C=1.0, dual=False, penalty=l1), C=5.0, dual=True, penalty=l2)
saving classifier to disk
Loading TPOT classification models: load_audioTPOT.py
Jims-MBP:~ jimschwoebel$ cd voicebook/chapter_4_modeling
Jims-MBP:chapter_4_modeling jimschwoebel$ python3 load_audiotpot.py
making 0.wav
making 1.wav
making 2.wav
...
making 36.wav
making 37.wav
making 38.wav
controls
train_audioTPOT.py (from CLI)
cd ~
cd voicebook/chapter_4_modeling/
python3 train_audioTPOT.py
classification (c) or regression (r) problem?
r
what is the name of class 1?
africanamerican
what is the name of class 2?
Controls
Generation 1 - Current best internal CV score: -0.06707070707070706
Generation 2 - Current best internal CV score: -0.06707070707070706
Generation 3 - Current best internal CV score: -0.06707070707070706
Generation 4 - Current best internal CV score: -0.062207740346188735
Generation 5 - Current best internal CV score: -0.062207740346188735
Best pipeline: KNeighborsRegressor(input_matrix, n_neighbors=4, p=1, weights=distance)
saving classifier to disk
Loading TPOT regression models: load_audioTPOT.py
Jims-MBP:~ jimschwoebel$ cd voicebook/chapter_4_modeling
Jims-MBP:chapter_4_modeling jimschwoebel$ python3 load_audiotpot.py
making 0.wav
making 1.wav
making 2.wav
...
making 36.wav
making 37.wav
making 38.wav
controls
controls
If you are interested to read more on any of these topics, check out the documentation below.
Datasets
Data labeling
Featurization
Classification models
Regression models
Deep learning
AutoML