-
Notifications
You must be signed in to change notification settings - Fork 35
5. Command Line Interface
AAA lllllll lllllll iiii
A:::A l:::::l l:::::l i::::i
A:::::A l:::::l l:::::l iiii
A:::::::A l:::::l l:::::l
A:::::::::A l::::l l::::l iiiiiii eeeeeeeeeeee
A:::::A:::::A l::::l l::::l i:::::i ee::::::::::::ee
A:::::A A:::::A l::::l l::::l i::::i e::::::eeeee:::::ee
A:::::A A:::::A l::::l l::::l i::::i e::::::e e:::::e
A:::::A A:::::A l::::l l::::l i::::i e:::::::eeeee::::::e
A:::::AAAAAAAAA:::::A l::::l l::::l i::::i e:::::::::::::::::e
A:::::::::::::::::::::A l::::l l::::l i::::i e::::::eeeeeeeeeee
A:::::AAAAAAAAAAAAA:::::A l::::l l::::l i::::i e:::::::e
A:::::A A:::::A l::::::ll::::::li::::::ie::::::::e
A:::::A A:::::A l::::::ll::::::li::::::i e::::::::eeeeeeee
A:::::A A:::::A l::::::ll::::::li::::::i ee:::::::::::::e
AAAAAAA AAAAAAAlllllllllllllllliiiiiiii eeeeeeeeeeeeee
_____ _ _ _
/ __ \ | | | | (_)
| / \/ ___ _ __ ___ _ __ ___ __ _ _ __ __| | | | _ _ __ ___
| | / _ \| '_ ` _ \| '_ ` _ \ / _` | '_ \ / _` | | | | | '_ \ / _ \
| \__/\ (_) | | | | | | | | | | | (_| | | | | (_| | | |___| | | | | __/
\____/\___/|_| |_| |_|_| |_| |_|\__,_|_| |_|\__,_| \_____/_|_| |_|\___|
_____ _ __
|_ _| | | / _|
| | _ __ | |_ ___ _ __| |_ __ _ ___ ___
| || '_ \| __/ _ \ '__| _/ _` |/ __/ _ \
_| || | | | || __/ | | || (_| | (_| __/
\___/_| |_|\__\___|_| |_| \__,_|\___\___|
Allie has a rich command-line interface to perform many of the API functions from it. In this section of the wiki you can learn more about how to use the Allie CLI.
To follow along with these examples, quickly seed some data (51 male files / 51 female files):
cd allie
cd datasets
python3 seed_test.py
To get started, you can explore commands Allie CLI by typing in:
cd ~
cd allie
python3 allie.py -h
Which should output some ways you can use Allie with commands in the API:
Usage: allie.py [options]
Options:
-h, --help show this help message and exit
--c=command, --command=command
the target command (annotate API = 'annotate',
augmentation API = 'augment', cleaning API = 'clean',
datasets API = 'data', features API = 'features',
model prediction API = 'predict', preprocessing API =
'transform', model training API = 'train', testing
API = 'test', visualize API = 'visualize',
list/change default settings = 'settings')
--p=problemtype, --problemtype=problemtype
specify the problem type ('c' = classification or 'r'
= regression)
--s=sampletype, --sampletype=sampletype
specify the type files that you'd like to operate on
(e.g. 'audio', 'text', 'image', 'video', 'csv')
--n=common_name, --name=common_name
specify the common name for the model (e.g. 'gender'
for a male/female problem)
--i=class_, --class=class_
specify the class that you wish to annotate (e.g.
'male')
--t1=tdir1, --tdir1=tdir1
the directory in the ./train_dir that represent a
folder of files that the transform API will operate
upon (e.g. 'males')
--t2=tdir2, --tdir2=tdir2
the directory in the ./train_dir that represent a
folder of files that the transform API will operate
upon (e.g. 'females')
--d=dir, --dir=dir an array of the target directory (or directories) that
contains sample files for the annotation API,
prediction API, features API, augmentation API, and
cleaning API (e.g.
'/Users/jim/desktop/allie/train_dir/teens/')
You can annotate a folder of audio files here as a classification problem with the label male in a directory with a command like this:
python3 allie.py --command annotate --sampletype audio --problemtype c --class male --dir /Users/jim/desktop/allie/train_dir/males
It will then play back audio files for you to annotate around the specified class:
0%| | 0/51 [00:00<?, ?it/s]playing file... 16.WAV
16.wav:
File Size: 137k Bit Rate: 256k
Encoding: Signed PCM
Channels: 1 @ 16-bit
Samplerate: 16000Hz
Replaygain: off
Duration: 00:00:04.29
In:100% 00:00:04.29 [00:00:00.00] Out:189k [ | ] Hd:5.9 Clip:0
Done.
MALE label 1 (yes) or 0 (no)?
yes
error annotating, annotating again...
error - file 16.wav not recognized
2%|▊ | 1/51 [00:07<06:09, 7.39s/it]playing file... 17.WAV
17.wav:
File Size: 229k Bit Rate: 256k
Encoding: Signed PCM
Channels: 1 @ 16-bit
Samplerate: 16000Hz
Replaygain: off
Duration: 00:00:07.17
In:100% 00:00:07.17 [00:00:00.00] Out:316k [ | ] Clip:0
To change to a regression problem, you just need to change the problemtype to -r and the class (--i) to a regression class problem (e.g. age):
python3 allie.py --command annotate --sampletype audio --problemtype r --i age --dir /Users/jim/desktop/allie/train_dir/males
This similarly allows you to annotate for regression problems:
0%| | 0/51 [00:00<?, ?it/s]playing file... 16.WAV
16.wav:
File Size: 137k Bit Rate: 256k
Encoding: Signed PCM
Channels: 1 @ 16-bit
Samplerate: 16000Hz
Replaygain: off
Duration: 00:00:04.29
In:100% 00:00:04.29 [00:00:00.00] Out:189k [ | ] Hd:5.9 Clip:0
Done.
AGE value?
50
[{'age': {'value': 50.0, 'datetime': '2020-08-07 12:22:06.180569', 'filetype': 'audio', 'file': '16.wav', 'problemtype': 'r', 'annotate_dir': '/Users/jim/desktop/allie/train_dir/males'}}]
2%|▊ | 1/51 [00:11<09:37, 11.55s/it]playing file... 17.WAV
17.wav:
File Size: 229k Bit Rate: 256k
Encoding: Signed PCM
Channels: 1 @ 16-bit
Samplerate: 16000Hz
Replaygain: off
Duration: 00:00:07.17
In:100% 00:00:07.17 [00:00:00.00] Out:316k [ | ] Clip:0
Done.
AGE value?
You can augment data like this via the default_augmentation settings:
python3 allie.py --command augment --sampletype audio --dir /Users/jim/desktop/allie/train_dir/males --dir /Users/jim/desktop/allie/train_dir/females
You now have an augmented set of files in both directories:
males: 0%| | 0/52 [00:00<?, ?it/s](87495,)
males: 2%|▋ | 1/52 [00:00<00:46, 1.09it/s](88906,)
males: 4%|█▍ | 2/52 [00:01<00:34, 1.44it/s](94551,)
males: 6%|██▏ | 3/52 [00:01<00:26, 1.87it/s](90317,)
males: 8%|██▊ | 4/52 [00:01<00:20, 2.38it/s](90317,)
males: 10%|███▌ | 5/52 [00:01<00:16, 2.79it/s](158055,)
males: 12%|████▎ | 6/52 [00:02<00:16, 2.73it/s](114308,)
males: 13%|████▉ | 7/52 [00:02<00:15, 2.82it/s](104429,)
males: 15%|█████▋ | 8/52 [00:02<00:14, 2.98it/s](104429,)
males: 17%|██████▍ | 9/52 [00:02<00:12, 3.37it/s](129831,)
males: 19%|██████▉ | 10/52 [00:03<00:12, 3.38it/s](228615,)
males: 21%|███████▌ | 11/52 [00:03<00:13, 3.08it/s](103018,)
males: 23%|████████▎ | 12/52 [00:03<00:13, 2.96it/s](101607,)
males: 25%|█████████ | 13/52 [00:04<00:12, 3.09it/s](87495,)
males: 27%|█████████▋ | 14/52 [00:04<00:10, 3.54it/s](94551,)
males: 29%|██████████▍ | 15/52 [00:04<00:09, 3.75it/s](129831,)
males: 31%|███████████ | 16/52 [00:04<00:09, 3.81it/s](91728,)
males: 33%|███████████▊ | 17/52 [00:05<00:08, 4.30it/s](198980,)
males: 35%|████████████▍ | 18/52 [00:05<00:08, 3.85it/s](143943,)
males: 37%|█████████████▏ | 19/52 [00:05<00:08, 3.76it/s](124186,)
males: 38%|█████████████▊ | 20/52 [00:05<00:08, 3.82it/s](114308,)
males: 40%|██████████████▌ | 21/52 [00:06<00:07, 3.93it/s](107252,)
males: 42%|███████████████▏ | 22/52 [00:06<00:07, 4.25it/s](97373,)
males: 44%|███████████████▉ | 23/52 [00:06<00:06, 4.62it/s](541901,)
males: 46%|████████████████▌ | 24/52 [00:07<00:12, 2.28it/s](203213,)
males: 48%|█████████████████▎ | 25/52 [00:07<00:11, 2.39it/s](214503,)
males: 50%|██████████████████ | 26/52 [00:08<00:09, 2.61it/s](94551,)
males: 52%|██████████████████▋ | 27/52 [00:08<00:08, 3.08it/s](111485,)
males: 54%|███████████████████▍ | 28/52 [00:08<00:07, 3.29it/s](303408,)
males: 56%|████████████████████ | 29/52 [00:09<00:08, 2.71it/s](155232,)
males: 58%|████████████████████▊ | 30/52 [00:09<00:07, 3.02it/s](94551,)
males: 60%|█████████████████████▍ | 31/52 [00:09<00:06, 3.45it/s](90317,)
males: 62%|██████████████████████▏ | 32/52 [00:09<00:05, 3.84it/s](117130,)
males: 63%|██████████████████████▊ | 33/52 [00:09<00:04, 4.05it/s](128420,)
males: 65%|███████████████████████▌ | 34/52 [00:10<00:04, 3.94it/s](115719,)
males: 67%|████████████████████████▏ | 35/52 [00:10<00:04, 3.97it/s](134064,)
males: 69%|████████████████████████▉ | 36/52 [00:10<00:04, 3.70it/s](152410,)
males: 71%|█████████████████████████▌ | 37/52 [00:10<00:03, 3.79it/s](145354,)
males: 73%|██████████████████████████▎ | 38/52 [00:11<00:03, 3.64it/s](90317,)
males: 75%|███████████████████████████ | 39/52 [00:11<00:03, 3.82it/s](108663,)
males: 77%|███████████████████████████▋ | 40/52 [00:11<00:02, 4.13it/s](119952,)
males: 79%|████████████████████████████▍ | 41/52 [00:11<00:02, 4.02it/s](108663,)
males: 81%|█████████████████████████████ | 42/52 [00:12<00:02, 4.35it/s](115719,)
males: 83%|█████████████████████████████▊ | 43/52 [00:12<00:02, 4.26it/s](124186,)
males: 85%|██████████████████████████████▍ | 44/52 [00:12<00:01, 4.51it/s](94551,)
males: 87%|███████████████████████████████▏ | 45/52 [00:12<00:01, 4.72it/s](136887,)
males: 88%|███████████████████████████████▊ | 46/52 [00:13<00:01, 4.54it/s](136887,)
males: 90%|████████████████████████████████▌ | 47/52 [00:13<00:01, 4.41it/s](121364,)
males: 92%|█████████████████████████████████▏ | 48/52 [00:13<00:00, 4.64it/s](403604,)
males: 94%|█████████████████████████████████▉ | 49/52 [00:14<00:01, 2.06it/s](94551,)
males: 96%|██████████████████████████████████▌ | 50/52 [00:15<00:00, 2.09it/s](396548,)
males: 100%|████████████████████████████████████| 52/52 [00:16<00:00, 3.15it/s]
females: 0%| | 0/51 [00:00<?, ?it/s](208858,)
females: 2%|▋ | 1/51 [00:01<00:50, 1.01s/it](224381,)
females: 4%|█▎ | 2/51 [00:01<00:39, 1.23it/s](90317,)
females: 6%|██ | 3/51 [00:01<00:30, 1.59it/s](156644,)
females: 8%|██▋ | 4/51 [00:01<00:24, 1.89it/s](598349,)
females: 10%|███▍ | 5/51 [00:02<00:30, 1.50it/s](93140,)
females: 12%|████ | 6/51 [00:03<00:23, 1.89it/s](248372,)
females: 14%|████▊ | 7/51 [00:03<00:21, 2.02it/s](129831,)
females: 16%|█████▍ | 8/51 [00:03<00:18, 2.37it/s](196157,)
females: 18%|██████▏ | 9/51 [00:04<00:17, 2.47it/s](213092,)
females: 20%|██████▋ | 10/51 [00:04<00:15, 2.60it/s](107252,)
females: 22%|███████▎ | 11/51 [00:04<00:12, 3.12it/s](104429,)
females: 24%|████████ | 12/51 [00:04<00:11, 3.45it/s](129831,)
females: 25%|████████▋ | 13/51 [00:05<00:09, 3.82it/s](118541,)
females: 27%|█████████▎ | 14/51 [00:05<00:09, 4.07it/s](98784,)
females: 29%|██████████ | 15/51 [00:05<00:08, 4.19it/s](103018,)
females: 31%|██████████▋ | 16/51 [00:05<00:08, 4.22it/s](90317,)
females: 33%|███████████▎ | 17/51 [00:05<00:07, 4.31it/s](249783,)
females: 35%|████████████ | 18/51 [00:06<00:08, 3.85it/s](124186,)
females: 37%|████████████▋ | 19/51 [00:06<00:08, 3.88it/s](324576,)
females: 39%|█████████████▎ | 20/51 [00:06<00:09, 3.13it/s](143943,)
females: 41%|██████████████ | 21/51 [00:07<00:09, 3.22it/s](93140,)
females: 43%|██████████████▋ | 22/51 [00:07<00:07, 3.73it/s](153821,)
females: 45%|███████████████▎ | 23/51 [00:07<00:07, 3.62it/s](156644,)
females: 47%|████████████████ | 24/51 [00:07<00:07, 3.60it/s](321754,)
females: 49%|████████████████▋ | 25/51 [00:08<00:08, 3.06it/s](589882,)
females: 51%|█████████████████▎ | 26/51 [00:09<00:12, 2.01it/s](242727,)
females: 53%|██████████████████ | 27/51 [00:09<00:11, 2.09it/s](93140,)
females: 55%|██████████████████▋ | 28/51 [00:09<00:09, 2.55it/s](104429,)
females: 57%|███████████████████▎ | 29/51 [00:10<00:07, 2.90it/s](235671,)
females: 59%|████████████████████ | 30/51 [00:10<00:07, 2.86it/s](101607,)
females: 61%|████████████████████▋ | 31/51 [00:10<00:06, 3.30it/s](87495,)
females: 63%|█████████████████████▎ | 32/51 [00:10<00:05, 3.80it/s](101607,)
females: 65%|██████████████████████ | 33/51 [00:11<00:04, 4.22it/s](122775,)
females: 67%|██████████████████████▋ | 34/51 [00:11<00:04, 4.23it/s](101607,)
females: 69%|███████████████████████▎ | 35/51 [00:11<00:03, 4.50it/s](91728,)
females: 71%|████████████████████████ | 36/51 [00:11<00:03, 4.59it/s](98784,)
females: 73%|████████████████████████▋ | 37/51 [00:11<00:03, 4.62it/s](87495,)
females: 75%|█████████████████████████▎ | 38/51 [00:12<00:02, 5.01it/s](166522,)
females: 76%|██████████████████████████ | 39/51 [00:12<00:02, 4.08it/s](134064,)
females: 78%|██████████████████████████▋ | 40/51 [00:12<00:03, 3.43it/s](118541,)
females: 80%|███████████████████████████▎ | 41/51 [00:13<00:03, 3.26it/s](149588,)
females: 82%|████████████████████████████ | 42/51 [00:13<00:02, 3.37it/s](206036,)
females: 84%|████████████████████████████▋ | 43/51 [00:13<00:02, 3.28it/s](87495,)
females: 86%|█████████████████████████████▎ | 44/51 [00:13<00:01, 3.76it/s](211680,)
females: 88%|██████████████████████████████ | 45/51 [00:14<00:01, 3.45it/s](325988,)
females: 90%|██████████████████████████████▋ | 46/51 [00:14<00:01, 2.55it/s](241316,)
females: 92%|███████████████████████████████▎ | 47/51 [00:15<00:01, 2.71it/s](87495,)
females: 94%|████████████████████████████████ | 48/51 [00:15<00:00, 3.20it/s](90317,)
females: 96%|████████████████████████████████▋ | 49/51 [00:15<00:00, 3.63it/s](90317,)
females: 98%|█████████████████████████████████▎| 50/51 [00:15<00:00, 4.10it/s](101607,)
females: 100%|██████████████████████████████████| 51/51 [00:15<00:00, 3.20it/s]
You can clean data like this via the default_augmentation settings:
python3 allie.py --command clean --sampletype audio --dir /Users/jim/desktop/allie/train_dir/males --dir /Users/jim/desktop/allie/train_dir/females
You now have a set of cleaned files in both directories.
males: 0%| | 0/102 [00:00<?, ?it/s]ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with Apple clang version 11.0.3 (clang-1103.0.32.62)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '6ab789e1-5994-4796-b4ab-63ff9f20cf09.wav':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:04.10, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to '6ab789e1-5994-4796-b4ab-63ff9f20cf09_cleaned.wav':
Metadata:
ISFT : Lavf58.45.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.91.100 pcm_s16le
size= 128kB time=00:00:04.09 bitrate= 256.2kbits/s speed=2.32e+03x
video:0kB audio:128kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.059509%
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with Apple clang version 11.0.3 (clang-1103.0.32.62)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Guessed Channel Layout for Input Stream #0.0 : 5.0
Input #0, wav, from '8caa6f96-04e8-48ee-8817-8b9da97734b2.wav':
Duration: 00:00:08.00, bitrate: 1764 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, 5.0, s16, 1764 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to '8caa6f96-04e8-48ee-8817-8b9da97734b2_cleaned.wav':
Metadata:
ISFT : Lavf58.45.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.91.100 pcm_s16le
size= 250kB time=00:00:08.00 bitrate= 256.1kbits/s speed= 276x
video:0kB audio:250kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.030469%
males: 2%|▋ | 2/102 [00:00<00:09, 10.18it/s]ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with Apple clang version 11.0.3 (clang-1103.0.32.62)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '49b96c77-5971-409a-9cb0-a2abfd9b1f37.wav':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:04.74, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to '49b96c77-5971-409a-9cb0-a2abfd9b1f37_cleaned.wav':
Metadata:
ISFT : Lavf58.45.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.91.100 pcm_s16le
size= 148kB time=00:00:04.73 bitrate= 256.1kbits/s speed=3.48e+03x
video:0kB audio:148kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.051467%
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with Apple clang version 11.0.3 (clang-1103.0.32.62)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '06499054-2859-4861-88d8-841fbaec0365.wav':
Metadata:
encoder : Lavf58.29.100
Duration: 00:00:06.21, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to '06499054-2859-4861-88d8-841fbaec0365_cleaned.wav':
Metadata:
ISFT : Lavf58.45.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.91.100 pcm_s16le
size= 194kB time=00:00:06.20 bitrate= 256.1kbits/s speed=4.28e+03x
video:0kB audio:194kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.039264%
males: 4%|█▍ | 4/102 [00:00<00:08, 11.29it/s]ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with Apple clang version 11.0.3 (clang-1103.0.32.62)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
...
You can call the Allie datasets API with this command:
python3 allie.py --command data
You can then download a dataset quickly following through the website instructions. Note that many datasets have different ways for downloading, so we have only taken you to the websites of interest for these datasets for you to figure this out. In future versions of Allie, these datasets can be downloaded directly through an API.
/usr/local/lib/python3.7/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
what dataset would you like to download? (1-audio, 2-text, 3-image, 4-video, 5-csv)
1
found 34 datasets...
----------------------------
here are the available AUDIO datasets
----------------------------
TIMIT dataset
Parkinson's speech dataset
ISOLET Data Set
AudioSet
Multimodal EmotionLines Dataset (MELD)
Free Spoken Digit Dataset
Speech Accent Archive
2000 HUB5 English
Emotional Voice dataset - Nature
LJ Speech
VoxForge
Million Song Dataset
Free Music Archive
Common Voice
Spoken Commands dataset
Bird audio detection challenge
Environmental audio dataset
Urban Sound Dataset
Ted-LIUM
Noisy Dataset
Librispeech
Emotional Voices Database
CMU Wilderness
Arabic Speech Corpus
Flickr Audio Caption
CHIME
Tatoeba
Freesound dataset
Spoken Wikipeida Corpora
Karoldvl-ESC
Zero Resource Speech Challenge
Speech Commands Dataset
Persian Consonant Vowel Combination (PCVC) Speech Dataset
VoxCeleb
what audio dataset would you like to download?
Speech Commmands
found dataset: Speech Commands Dataset
-speech-commands-dataset.html) - The dataset (1.4 GB) has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website.
just confirming, do you want to download the Speech Commands Dataset dataset? (Y - yes, N - no)
yes
You can featurize data just like augmentation and cleaning:
python3 allie.py --command features --sampletype audio --dir /Users/jim/desktop/allie/train_dir/males --dir /Users/jim/desktop/allie/train_dir/females
This will then featurize both folders with the default_audio_features in the settings.json.
males: 0%| | 0/102 [00:00<?, ?it/s]deepspeech_dict transcribing: 17ebdf90-b6dc-4940-85c3-055e3f0c5e9a_cleaned.wav
--2020-08-07 12:29:42-- https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.pbmm
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/60273704/db3b3f80-84bd-11ea-93d7-1ddb76a21efe?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200807%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200807T162942Z&X-Amz-Expires=300&X-Amz-Signature=75a04415e8839d00e611a7414d420fc4a1a465de88a93e80e9417ee7e55c4325&X-Amz-SignedHeaders=host&actor_id=0&repo_id=60273704&response-content-disposition=attachment%3B%20filename%3Ddeepspeech-0.7.0-models.pbmm&response-content-type=application%2Foctet-stream [following]
--2020-08-07 12:29:42-- https://github-production-release-asset-2e65be.s3.amazonaws.com/60273704/db3b3f80-84bd-11ea-93d7-1ddb76a21efe?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200807%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200807T162942Z&X-Amz-Expires=300&X-Amz-Signature=75a04415e8839d00e611a7414d420fc4a1a465de88a93e80e9417ee7e55c4325&X-Amz-SignedHeaders=host&actor_id=0&repo_id=60273704&response-content-disposition=attachment%3B%20filename%3Ddeepspeech-0.7.0-models.pbmm&response-content-type=application%2Foctet-stream
Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.204.83
Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.204.83|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 188916323 (180M) [application/octet-stream]
Saving to: ‘deepspeech-0.7.0-models.pbmm’
deepspeech-0.7.0-mo 100%[===================>] 180.16M 6.88MB/s in 18s
2020-08-07 12:30:00 (9.94 MB/s) - ‘deepspeech-0.7.0-models.pbmm’ saved [188916323/188916323]
--2020-08-07 12:30:00-- https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.scorer
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/60273704/49dcc500-84df-11ea-9cb6-ec1d98c50dd4?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200807%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200807T163001Z&X-Amz-Expires=300&X-Amz-Signature=ed079c1be3b63caf76b2daf1ad6d62537d0e4a6aa856c2428995993484bd2872&X-Amz-SignedHeaders=host&actor_id=0&repo_id=60273704&response-content-disposition=attachment%3B%20filename%3Ddeepspeech-0.7.0-models.scorer&response-content-type=application%2Foctet-stream [following]
--2020-08-07 12:30:01-- https://github-production-release-asset-2e65be.s3.amazonaws.com/60273704/49dcc500-84df-11ea-9cb6-ec1d98c50dd4?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20200807%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200807T163001Z&X-Amz-Expires=300&X-Amz-Signature=ed079c1be3b63caf76b2daf1ad6d62537d0e4a6aa856c2428995993484bd2872&X-Amz-SignedHeaders=host&actor_id=0&repo_id=60273704&response-content-disposition=attachment%3B%20filename%3Ddeepspeech-0.7.0-models.scorer&response-content-type=application%2Foctet-stream
Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)... 52.216.236.67
Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.236.67|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 953363776 (909M) [application/octet-stream]
Saving to: ‘deepspeech-0.7.0-models.scorer’
deepspeech-0.7.0-mo 100%[===================>] 909.20M 10.4MB/s in 88s
2020-08-07 12:31:29 (10.3 MB/s) - ‘deepspeech-0.7.0-models.scorer’ saved [953363776/953363776]
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with Apple clang version 11.0.3 (clang-1103.0.32.62)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from '17ebdf90-b6dc-4940-85c3-055e3f0c5e9a_cleaned.wav':
Metadata:
encoder : Lavf58.45.100
Duration: 00:00:02.00, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to '17ebdf90-b6dc-4940-85c3-055e3f0c5e9a_cleaned_newaudio.wav':
Metadata:
ISFT : Lavf58.45.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.91.100 pcm_s16le
size= 63kB time=00:00:02.00 bitrate= 256.3kbits/s speed=1.24e+03x
video:0kB audio:62kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.121875%
deepspeech --model /Users/jim/Desktop/allie/allie/features/audio_features/helpers/deepspeech-0.7.0-models.pbmm --scorer /Users/jim/Desktop/allie/allie/features/audio_features/helpers/deepspeech-0.7.0-models.scorer --audio "17ebdf90-b6dc-4940-85c3-055e3f0c5e9a_cleaned_newaudio.wav" >> "17ebdf90-b6dc-4940-85c3-055e3f0c5e9a_cleaned.txt"
Loading model from file /Users/jim/Desktop/allie/allie/features/audio_features/helpers/deepspeech-0.7.0-models.pbmm
TensorFlow: v1.15.0-24-gceb46aae58
DeepSpeech: v0.7.4-0-gfcd9563f
2020-08-07 12:31:30.122614: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.0155s.
Loading scorer from files /Users/jim/Desktop/allie/allie/features/audio_features/helpers/deepspeech-0.7.0-models.scorer
Loaded scorer in 0.00188s.
Running inference.
Inference took 2.470s for 2.000s audio file.
DEEPSPEECH_DICT
-->
librosa featurizing: 17ebdf90-b6dc-4940-85c3-055e3f0c5e9a_cleaned.wav
/usr/local/lib/python3.7/site-packages/librosa/beat.py:306: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
hop_length=hop_length))
[15.0, 44.8, 27.914631766632116, 82.0, 3.0, 52.0, 143.5546875, 1.236462812624379, 0.7251315935053164, 3.334862198133343, 0.0, 1.0859751751040547, 1.0, 0.0, 1.0, 1.0, 1.0, 0.9021154304582399, 0.011871022692166161, 0.9248579351103645, 0.8845252162754503, 0.9007761885347338, 0.8005566025338086, 0.02351835274803656, 0.846290326519876, 0.7666702190624551, 0.7974848758054224, 0.765299804936602, 0.028837831871710528, 0.8210615152089579, 0.723447224850167, 0.7616938878984344, 0.7718090402633874, 0.030151369356669896, 0.8289804789632119, 0.7266534111141562, 0.7686879868973036, 0.7936400196140749, 0.031036953487073464, 0.8507660198788851, 0.7451751937203362, 0.7913818436015906, 0.774629021009383, 0.03215688854479966, 0.8344764767038448, 0.7251595846469461, 0.7719273208794816, 0.7428815548766532, 0.035011430707789774, 0.8084967674473139, 0.6894574657533005, 0.7397077230156677, 0.7471335383622011, 0.03463006342656048, 0.811789409800113, 0.6939570618543236, 0.7441413516783196, 0.7610523695125583, 0.033862478442252354, 0.8235134515207413, 0.7081080052043068, 0.7585638476404928, 0.789820226591492, 0.03384767249624557, 0.8500833679586846, 0.7343945280617632, 0.7885333494859879, 0.8059179015451146, 0.03157630618281619, 0.8615948245740482, 0.7535417225113215, 0.8050269828120753, 0.7935417840439638, 0.031145337061902156, 0.8492406377587695, 0.7427011806455776, 0.7922497709713059, -381.8200653053766, 23.009903107621383, -321.89837910119815, -441.6259904054828, -379.3488122081372, 149.04439804929382, 15.356164419049225, 172.85739766597234, 110.28800952925451, 153.08285883743733, -32.84528000164682, 13.009141709326732, -2.154076829327365, -64.91296682470796, -30.99198128144861, 40.70623621978138, 17.548974836043755, 73.18507958780387, 8.337746078102892, 40.63827945609428, -52.86069958238985, 13.478379908189092, -27.553997955729045, -87.5715612441206, -51.58811003068236, 31.42738418944771, 6.795009930398713, 49.46758858300626, 18.299603573231376, 31.6997571738992, -35.82303959204243, 8.486268198834747, -14.57639089253998, -55.40606622608898, -34.90037102114016, 15.955209103884254, 8.934103499373093, 44.50048758909077, -5.494667426263748, 15.650980212978268, -17.338170873356056, 5.727612678025376, -3.1374263092378176, -33.480526476176806, -17.446097772263684, -0.4383376230378039, 6.2672128421452875, 15.720566519612913, -14.330033302127145, -0.244906066419725, -1.467000393875816, 6.138683208427911, 15.463481385114175, -15.812384056333133, -2.0209526024605786, -6.972125329645311, 4.816668688995419, 6.338229281172403, -17.349015718809397, -6.496008401327131, 6.688298302126343, 6.351559382372022, 20.66368904480788, -9.92049214526477, 7.446377744032864, -3.146738423029468e-05, 1.0080761356243433e-05, -1.2325730935412251e-05, -6.256804392224112e-05, -3.133272391000924e-05, 0.25608241330935894, 0.07833864054721744, 0.5017299271817217, 0.1071248558508767, 0.2536783052990988, 1698.1068839811428, 247.34317762284775, 2332.782952940378, 1298.7956436686768, 1652.8623945849333, 1916.493769636128, 217.42031398082, 2404.6618216796505, 1537.6071015613695, 1882.6348073434424, 15.195013434264473, 4.030761691034897, 26.309048213128285, 5.981068616687288, 15.426375628186392, 0.0004132288449909538, 0.000528702512383461, 0.004117688629776239, 7.970369915710762e-05, 0.0002881725667975843, 3901.964911099138, 915.5047430674098, 5792.431640625, 2196.38671875, 3757.5439453125, 0.0726977819683908, 0.013766841258384812, 0.11962890625, 0.03857421875, 0.0712890625, 0.009517103433609009, 0.0026407463010400534, 0.01786264032125473, 0.004661164246499538, 0.009199550375342369]
males: 1%|▎ | 1/102 [01:51<3:08:13, 111.82s/it]deepspeech_dict transcribing: d2a57cd6-f757-435d-9768-cac1667f79e1_cleaned.wav
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with Apple clang version 11.0.3 (clang-1103.0.32.62)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
libpostproc 55. 7.100 / 55. 7.100
Guessed Channel Layout for Input Stream #0.0 : mono
Input #0, wav, from 'd2a57cd6-f757-435d-9768-cac1667f79e1_cleaned.wav':
Metadata:
encoder : Lavf58.45.100
Duration: 00:00:08.00, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'd2a57cd6-f757-435d-9768-cac1667f79e1_cleaned_newaudio.wav':
Metadata:
ISFT : Lavf58.45.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder : Lavc58.91.100 pcm_s16le
size= 250kB time=00:00:08.00 bitrate= 256.1kbits/s speed=1.37e+03x
video:0kB audio:250kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.030469%
deepspeech --model /Users/jim/Desktop/allie/allie/features/audio_features/helpers/deepspeech-0.7.0-models.pbmm --scorer /Users/jim/Desktop/allie/allie/features/audio_features/helpers/deepspeech-0.7.0-models.scorer --audio "d2a57cd6-f757-435d-9768-cac1667f79e1_cleaned_newaudio.wav" >> "d2a57cd6-f757-435d-9768-cac1667f79e1_cleaned.txt"
Loading model from file /Users/jim/Desktop/allie/allie/features/audio_features/helpers/deepspeech-0.7.0-models.pbmm
TensorFlow: v1.15.0-24-gceb46aae58
DeepSpeech: v0.7.4-0-gfcd9563f
2020-08-07 12:31:34.542205: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.0235s.
Loading scorer from files /Users/jim/Desktop/allie/allie/features/audio_features/helpers/deepspeech-0.7.0-models.scorer
Loaded scorer in 0.000517s.
Running inference.
...
You can train machine learning models quickly with:
python3 allie.py --command train
This will then train models based on the CLI. Note since we have already featurized the folders this will speed up the modeling process.
is this a classification (c) or regression (r) problem?
c
what problem are you solving? (1-audio, 2-text, 3-image, 4-video, 5-csv)
1
OK cool, we got you modeling audio files
how many classes would you like to model? (2 available)
2
these are the available classes:
['females', 'males']
what is class #1
males
what is class #2
females
what is the 1-word common name for the problem you are working on? (e.g. gender for male/female classification)
gender
-----------------------------------
LOADING MODULES
-----------------------------------
Requirement already satisfied: scikit-learn==0.22.2.post1 in /usr/local/lib/python3.7/site-packages (0.22.2.post1)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/site-packages (from scikit-learn==0.22.2.post1) (0.15.1)
Requirement already satisfied: scipy>=0.17.0 in /usr/local/lib/python3.7/site-packages (from scikit-learn==0.22.2.post1) (1.4.1)
Requirement already satisfied: numpy>=1.11.0 in /usr/local/lib/python3.7/site-packages (from scikit-learn==0.22.2.post1) (1.18.4)
WARNING: You are using pip version 20.2; however, version 20.2.1 is available.
You should consider upgrading via the '/usr/local/opt/python/bin/python3.7 -m pip install --upgrade pip' command.
-----------------------------------
______ _____ ___ _____ _ _______ _____ ___________ _ _ _____
| ___| ___|/ _ \_ _| | | | ___ \_ _|___ /_ _| \ | | __ \
| |_ | |__ / /_\ \| | | | | | |_/ / | | / / | | | \| | | \/
| _| | __|| _ || | | | | | / | | / / | | | . ` | | __
| | | |___| | | || | | |_| | |\ \ _| |_./ /____| |_| |\ | |_\ \
\_| \____/\_| |_/\_/ \___/\_| \_|\___/\_____/\___/\_| \_/\____/
______ ___ _____ ___
| _ \/ _ \_ _/ _ \
| | | / /_\ \| |/ /_\ \
| | | | _ || || _ |
| |/ /| | | || || | | |
|___/ \_| |_/\_/\_| |_/
-----------------------------------
-----------------------------------
FEATURIZING MALES
-----------------------------------
males: 0%| | 0/51 [00:00<?, ?it/s]librosa featurizing: 38.wav
...
... [skipping a lot of output in terminal]
...
WARNING: You are using pip version 20.2; however, version 20.2.1 is available.
You should consider upgrading via the '/usr/local/opt/python/bin/python3.7 -m pip install --upgrade pip' command.
Warning: xgboost.XGBClassifier is not available and will not be used by TPOT.
Generation 1 - Current best internal CV score: 0.8882352941176471
Generation 2 - Current best internal CV score: 0.8882352941176471
You need to have a model that you have trained with Allie in the ./models/[sampletype]_models directory. For example, an audio model that is detecting gender may be in this tree structure. Since we have already trained the gender model above, we just need to put a sample file in the ./load_dir to make a prediction (which can be found here).
Now call the CLI:
python3 allie.py --command predict
A gender prediction is now made.
You can make a transformer to reduce or select features with:
python3 allie.py --command transform --tdir1 males --tdir2 females
Where 'males' and 'females' are the two directories in the train_dir that are being used to complete the transformation.
This makes the transformer based defaults set in the settings.json file:
For more information about Allie's preprocessing capabilities, see this link.
You can run unit tests with:
python3 allie.py --command test
You can visualized multi-class problems that have featurized folders with:
python3 allie.py --command visualize
This will then take you through a visualization prompt to set the classes and structure a visualization session, as output in the 'visualization_session" folder.
You can set some new settings within Allie quite easily by doing:
python3 allie.py --command settings
This will then open up a list of questions to allow you to specify new settings within Allie or visualize the existing settings, as set by the settings.json database.
For example, you may want to turn off video_transcribe setting by setting it to False:
{'version': '1.0.0', 'augment_data': False, 'balance_data': True, 'clean_data': False, 'create_csv': True, 'default_audio_augmenters': ['augment_tsaug'], 'default_audio_cleaners': ['clean_mono16hz'], 'default_audio_features': ['librosa_features'], 'default_audio_transcriber': ['deepspeech_dict'], 'default_csv_augmenters': ['augment_ctgan_regression'], 'default_csv_cleaners': ['clean_csv'], 'default_csv_features': ['csv_features_regression'], 'default_csv_transcriber': ['raw text'], 'default_dimensionality_reducer': ['pca'], 'default_feature_selector': ['rfe'], 'default_image_augmenters': ['augment_imaug'], 'default_image_cleaners': ['clean_greyscale'], 'default_image_features': ['image_features'], 'default_image_transcriber': ['tesseract'], 'default_outlier_detector': ['isolationforest'], 'default_scaler': ['standard_scaler'], 'default_text_augmenters': ['augment_textacy'], 'default_text_cleaners': ['remove_duplicates'], 'default_text_features': ['nltk_features'], 'default_text_transcriber': ['raw text'], 'default_training_script': ['tpot'], 'default_video_augmenters': ['augment_vidaug'], 'default_video_cleaners': ['remove_duplicates'], 'default_video_features': ['video_features'], 'default_video_transcriber': ['tesseract (averaged over frames)'], 'dimension_number': 2, 'feature_number': 20, 'model_compress': False, 'reduce_dimensions': False, 'remove_outliers': True, 'scale_features': True, 'select_features': True, 'test_size': 0.1, 'transcribe_audio': True, 'transcribe_csv': True, 'transcribe_image': True, 'transcribe_text': True, 'transcribe_video': True, 'transcribe_videos': True, 'visualize_data': False}
Would you like to change any of these settings? Yes (-y) or No (-n)
y
What setting would you like to change?
transcribe_video
What setting would you like to set here?
False
<class 'bool'>