Skip to content

robert-s-lee/mnist-hydra-grid

Repository files navigation

Grid Hydra

MNIST running on Grid.ai using Hydra and PyTorch Vision

We will show simple steps to take ML code from laptop to scale out hyperparameter sweep on public cloud. Grid.ai allows this without any change to the ML code.

Develop Locally

  • Setup the local environment

These steps follows documentation from Grid.ai Virtual Environment, PyTorch Hydra and Grid.ai requirements.txt. Installation of conda can found here.

# Grid.ai Virtual Environment
conda create --yes --name hydra python=3.8
conda activate hydra
pip install lightning-grid --upgrade
# from PyTorch Hydra
git clone https://github.com/robert-s-lee/mnist-hydra-grid
cd mnist-hydra-grid
pip install -r requirements.txt
  • run the experiment
python mnist-hydra-01.py 

Testing on Grid.ai

Simply change python to grid run. Couple of variations below.

# run the model
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py

# save the checkpoint file
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py save_model=True                            

# save the checkpoint file, override mnistconf.yaml epochs setting
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py save_model=True epochs=3

# alternate config file
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py save_model=True epochs=3 --config-name mnistconf.yaml 
  • Parallel run on a single server
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py epochs=3,10,15 --multirun

grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py epochs=3,10,15 batch_size=32,64 --multirun

[experiment] [2021-11-05T01:21:12.092751+00:00] [2021-11-05 01:21:12,092][HYDRA] Joblib.Parallel(n_jobs=-1,backend=loky,prefer=processes,require=None,verbose=0,timeout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=None,mmap_mode=r) is launching 3 jobs
[experiment] [2021-11-05T01:21:12.092792+00:00] [2021-11-05 01:21:12,092][HYDRA] Launching jobs, sweep output dir : multirun/2021-11-05/01-21-11
[experiment] [2021-11-05T01:21:12.092799+00:00] [2021-11-05 01:21:12,092][HYDRA]        #0 : epochs=3
[experiment] [2021-11-05T01:21:12.092804+00:00] [2021-11-05 01:21:12,092][HYDRA]        #1 : epochs=10
[experiment] [2021-11-05T01:21:12.092808+00:00] [2021-11-05 01:21:12,092][HYDRA]        #2 : epochs=15

Automate using GitHub Actions

unittest.yml runs equivalent of the following commands:

  • grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py
  • grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py save_model=True epochs=3.

NOTE: Remember to add GRIDAI_USERNAME and GRIDAI_KEY repo secrets if forking this repo.

About

MNIST using Hydra running on Grid.ai

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages