Overview

Grid.ai can seamlessly train 100s of machine learning models on the cloud from your laptop, with zero code change. In this example, we will run a model on a laptop, then run the unmodified model on the cloud. On the cloud, we will run hyperparameter sweeps in parallel 8 ways. The experiment will complete 8x faster with the parallel run. The cost of the run will be reduced by 70% with the spot instance.

Overview

We will use familiar MNIST. Grid.ai is the creators of PyTorch Lightning. Grid.ai is agnostics to Machine Learning frameworks and 3rd party tools. The benefits of Grid.ai are available to other Machine Learning frameworks and tools. To demonstrate this point, we will NOT use PyTorch Lightning's Early Stop. Instead, we will use Optuna for early stopping. We will track progress by viewing PyTorch Lightning's Tensorboard in Grid.ai's Tensorboard interface.

Grid.ai will launch experiments in parallel using Grid Search strategy. Grid.ai Hyperparameter sweep control batchsize, epochs, pruning -- whether Optuna is active or not. Optuna will control the number of layers, hidden units in each layer, and dropouts within each experiment. The following combinations will result in 8 parallel experiments:

batchsize=[32,128]
epochs=[5,10]
pruning=[0,1]

A single Grid.ai CLI command initiates the experiment.

grid run --use_spot pytorch_lightning_simple.py --datadir grid:fashionmnist:1 --pruning="[0,1]"  --batchsize="[32,128]" --epochs="[5,10]"

Step by Step Instruction

This instruction assumes access to a laptop with bash and conda. For those with restricted local environment, please use Jupyter and click on Terminal on Grid.ai Session.

Local python environment setup

# conda init bash 
conda init bash # exit and come back
# create conda env
conda create --name gridai python=3.8
conda activate gridai
# install packages
pip install lightning-grid
pip install optuna
pip install pytorch_lightning
pip install torchvision
# login to grid
grid login --username <username> --key <grid api key>

Run locally

# retrieve the model
git clone https://github.com/robert-s-lee/grid-optuna
cd grid-optuna
mkdir data
# Run without Optuna pruning (takes a while)
python pytorch_lightning_simple.py --datadir ./data
# Run with Optuna pruning (takes a while)
python pytorch_lightning_simple.py --datadir ./data --pruning 1

Prepare Grid.ai Datastore

Setup Grid.ai Datastore so that MNIST data is not downloaded on each run. Note the Version number created. Typically this will be 1.

grid datastore create --source data --name fashionmnist 
grid datastore list # wait until the Status comes back with `Succeeded`
watch -n 10 grid datastore list  # refresh 
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Credential Id ┃              Name ┃ Version ┃     Size ┃          Created ┃    Status ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ cc-qdfdk      │      fashionmnist │       1 │ 141.6 MB │ 2021-06-16 15:13 │ Succeeded │
└───────────────┴───────────────────┴─────────┴──────────┴──────────────────┴───────────┘

Run on Grid

Option 1: with Datastore option so that FashionMNIST is not downloaded again (use on your own or with sharable datastore)

grid run --use_spot pytorch_lightning_simple.py --datadir grid:fashionmnist:1 --pruning="[0,1]"  --batchsize="[32,128]" --epochs="[5,10]"

Option 2: without Datastore and can be shared freely without creating datastore

grid run --use_spot pytorch_lightning_simple.py --pruning="[0,1]"  --batchsize="[32,128]" --epochs="[5,10]"

The above commands will show below (abbreviated)

Run submitted!
`grid status` to list all runs
`grid status smart-dragon-43` to see all experiments for this run

grid status smart-dragon-43 shows experiments running in parallel

% grid status smart-dragon-43
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┓
┃ Experiment           ┃                     Command ┃  Status ┃    Duration ┃                  datadir ┃ pruning ┃ batchsize ┃ epochs ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━┩
│ smart-dragon-43-exp7 │ pytorch_lightning_simple.py │ running │ 0d-00:07:24 │ /datastores/fashionmnist │       1 │        32 │     10 │
│ smart-dragon-43-exp6 │ pytorch_lightning_simple.py │ running │ 0d-00:07:27 │ /datastores/fashionmnist │       1 │        32 │      5 │
│ smart-dragon-43-exp5 │ pytorch_lightning_simple.py │ running │ 0d-00:07:14 │ /datastores/fashionmnist │       1 │       128 │      5 │
│ smart-dragon-43-exp4 │ pytorch_lightning_simple.py │ pending │ 0d-00:12:52 │ /datastores/fashionmnist │       0 │       128 │      5 │
│ smart-dragon-43-exp3 │ pytorch_lightning_simple.py │ running │ 0d-00:07:13 │ /datastores/fashionmnist │       0 │        32 │     10 │
│ smart-dragon-43-exp2 │ pytorch_lightning_simple.py │ running │ 0d-00:07:03 │ /datastores/fashionmnist │       0 │       128 │     10 │
│ smart-dragon-43-exp1 │ pytorch_lightning_simple.py │ running │ 0d-00:07:02 │ /datastores/fashionmnist │       1 │       128 │     10 │
│ smart-dragon-43-exp0 │ pytorch_lightning_simple.py │ pending │ 0d-00:12:52 │ /datastores/fashionmnist │       0 │        32 │      5 │
└──────────────────────┴─────────────────────────────┴─────────┴─────────────┴──────────────────────────┴─────────┴───────────┴────────┘

grid logs smart-dragon-43-exp0 shows logs from that experiment

grid logs smart-dragon-43

Simpler variations to run

grid run --use_spot pytorch_lightning_simple.py
grid run --use_spot pytorch_lightning_simple.py --datadir grid:fashionmnist:1"

Use Grid.ai WebUI for Tensorboard graphs

Example of on-demand pricing (top at $0.09) and spot pricing (bottom at $0.03)

Example Metric from Grid.ai WebUI

Example Metric from Tensorboard

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
.vscode		.vscode
FashionMNIST/raw		FashionMNIST/raw
images		images
.gitignore		.gitignore
BUILD.md		BUILD.md
README.md		README.md
pytorch_lightning_simple.py		pytorch_lightning_simple.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Step by Step Instruction

Local python environment setup

Run locally

Prepare Grid.ai Datastore

Run on Grid

Simpler variations to run

Use Grid.ai WebUI for Tensorboard graphs

About

Releases

Packages

Languages

robert-s-lee/grid-optuna

Folders and files

Latest commit

History

Repository files navigation

Overview

Step by Step Instruction

Local python environment setup

Run locally

Prepare Grid.ai Datastore

Run on Grid

Simpler variations to run

Use Grid.ai WebUI for Tensorboard graphs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages