Skip to content

2.3. Augmenting datasets

Jim Schwoebel edited this page Aug 3, 2020 · 9 revisions

Augmentation

This part of Allie's skills relates to data augmentation.

Data augmentation is used to expand the training dataset in order to improve the performance and ability of a machine learning model to generalize. For example, you may want to shift, flip, brightness, and zoom on images to augment datasets to make models perform better in noisy environments indicative of real-world use. Data augmentation is especially useful when you don't have that much data, as it can greatly expand the amount of training data that you have for machine learning.

Typical augmentation scheme is to take 50% of the data and augment it and leave the rest the same. This is what they did in Tacotron2 architecture.

You can read more about data augmentation here.

Getting started

To augment an entire folder of a certain file type (e.g. audio files of .WAV format), you can run:

cd ~ 
cd allie/features/audio_augmentation
python3 augment.py /Users/jimschwoebel/allie/load_dir

The code above will augment all the audio files in the folderpath via the default_augmenter specified in the settings.json file (e.g. 'augment_tasug').

Note you can extend this to any of the augmentation types. The table below overviews how you could call each as a augmenter. In the code below, you must be in the proper folder (e.g. ./allie/augmentation/audio_augmentations for audio files, ./allie/augmentation/image_augmentation for image files, etc.) for the scripts to work properly.

Data type Supported formats Call to featurizer a folder Current directory must be
audio files .MP3 / .WAV python3 augment.py [folderpath] ./allie/augmentation/audio_augmentation
text files .TXT python3 augment.py [folderpath] ./allie/augmentation/text_augmentation
image files .PNG python3 augment.py [folderpath] ./allie/augmentation/image_augmentation
video files .MP4 python3 augment.py [folderpath] ./allie/augmentation/video_augmentation
csv files .CSV python3 augment.py [folderpath] ./allie/augmentation/csv_augmentation

Implemented

References