Skip to content

1. Getting started

Jim Schwoebel edited this page Aug 5, 2020 · 20 revisions

Mac or Linux

First, clone the repository:

git clone [email protected]:jim-schwoebel/allie.git
cd allie 

Set up virtual environment (to ensure consistent operating mode across operating systems).

python3 -m pip install --user virtualenv
python3 -m venv env
source env/bin/activate

Now install required dependencies and perform unit tests to make sure everything works:

python3 setup.py

Note the installatin process and unit tests above takes roughly ~10-15 minutes to complete and makes sure that you can featurize, model, and load model files (to make predictions) via your default featurizers and modeling techniques. It may be best to go grab lunch or coffee while waiting. :-)

After everything is done, you can use the Allie CLI by typing in:

python3 allie.py -h

Which should output some ways you can use Allie with commands in the API:

Usage: allie.py [options]

Options:
  -h, --help            show this help message and exit
  --c=command, --command=command
                        the target command (annotate API = 'annotate',
                        augmentation API = 'augment',  cleaning API = 'clean',
                        datasets API = 'data',  features API = 'features',
                        model prediction API = 'predict',  preprocessing API =
                        'transform',  model training API = 'train',  testing
                        API = 'test',  visualize API = 'visualize')
  --p=problemtype, --problemtype=problemtype
                        specify the problem type ('c' = classification or 'r'
                        = regression)
  --s=sampletype, --sampletype=sampletype
                        specify the type files that you'd like to operate on
                        (e.g. 'audio', 'text', 'image', 'video', 'csv')
  --n=common_name, --name=common_name
                        specify the common name for the model (e.g. 'gender'
                        for a male/female problem)
  --i=class_, --class=class_
                        specify the class that you wish to annotate for (e.g.
                        'male')
  --a=ldir, --adir=ldir
                        the directory full of files to annotate (e.g.
                        '/Users/jim/desktop/allie/train_dir/males/')
  --l=ldir, --ldir=ldir
                        the directory full of files to make model predictions;
                        if not here will default to ./load_dir (e.g.
                        '/Users/jim/desktop/allie/load_dir/newfiles/')
  --t1=tdir1, --tdir1=tdir1
                        the directory in the ./train_dir that represent the
                        folders of files that the transform API will operate
                        upon (e.g. 'males')
  --t2=tdir2, --tdir2=tdir2
                        the directory in the ./train_dir that represent the
                        folders of files that the transform API will operate
                        upon (e.g. 'females')
  --d1=dir1, --dir1=dir1
                        the target directory that contains sample files for
                        the features API, augmentation API, and cleaning API
                        (e.g. '/Users/jim/desktop/allie/train_dir/teens/').
  --d2=dir2, --dir2=dir2
                        the target directory that contains sample files for
                        the features API, augmentation API, and cleaning API
                        (e.g. '/Users/jim/desktop/allie/train_dir/twenties/').
  --d3=dir3, --dir3=dir3
                        the target directory that contains sample files for
                        the features API, augmentation API, and cleaning API
                        (e.g. '/Users/jim/desktop/allie/train_dir/thirties/').
  --d4=dir4, --dir4=dir4
                        the target directory that contains sample files for
                        the features API, augmentation API, and cleaning API
                        (e.g. '/Users/jim/desktop/allie/train_dir/fourties/').

Windows

recommended installation (Docker)

You can run Allie in a Docker container fairly easily (10-11GB container run on top of Linux/Ubuntu):

git clone --recurse-submodules -j8 [email protected]:jim-schwoebel/allie.git
cd allie 
docker build -t allie_image .
docker run -it --entrypoint=/bin/bash allie_image
cd ..

You will then have access to the docker container to use Allie's folder structure. You can then run tests @

cd tests
python3 test.py

Note you can quickly download datasets from AWS buckets and train machine learning models from there.

alternative

Note that there are many incomptible Python libraries with Windows, so I encourage you to instead run Allie in a Docker container with Ubuntu or on Windows Subsystem for Linux.

If you still want to try to use Allie with Windows, you can do so below.

First, install various dependencies:

Now clone Allie and run the setup.py script:

git clone --recurse-submodules -j8 [email protected]:jim-schwoebel/allie.git
git checkout windows
cd allie 
python3 -m pip install --user virtualenv
python3 -m venv env
python3 setup.py

Note that there are some functions that are limited (e.g. featurization / modeling scripts) due to lack of Windows compatibility.