Skip to content

1. Getting started

Jim Schwoebel edited this page Jul 9, 2019 · 20 revisions

Installing dependencies

First, clone the repository:

git clone --recurse-submodules -j8 [email protected]:jim-schwoebel/allie.git
cd allie 

Set up virtual environment (to ensure consistent operating mode across operating systems).

python3 -m pip install --user virtualenv
python3 -m venv env
source env/bin/activate

Now install required dependencies:

python3 setup.py

Now do some unit tests to make sure everything works:

cd tests
python3 test.py

Note the test above takes roughly 5-10 minutes to complete and makes sure that you can featurize, model, and load model files (to make predictions) via your default featurizers and modeling techniques.

Navigating folders

Here is a table that describes the folder structure for this repository. These descriptions could help guide how you can quickly get started with featurizing and modeling data samples.

folder name description of folder
datasets an elaborate list of open source datasets that can be used for curating datasets and augmenting datasets.
features a list of audio, text, image, video, and csv featurization scripts (these can be specified in the settings.json files).
load_dir a directory where you can put in audio, text, image, video, or .CSV files and make moel predictions from ./models directory.
models for loading/storing machine learning models and making model predictions for files put in the load_dir.
production a folder for outputting production-ready repositories via the YAML.py script.
tests for running local tests and making sure everything works as expected.
train_dir a directory where you can put in audio, text, image, video, or .CSV files in folders and train machine learning models from the model.py script in the ./training/ directory.
training for training machine learning models via specified model training scripts.

Setting defaults