MNIST running on Grid.ai using Hydra and PyTorch Vision
We will show simple steps to take ML code from laptop to scale out hyperparameter sweep on public cloud. Grid.ai allows this without any change to the ML code.
- Setup the local environment
These steps follows documentation from
Grid.ai Virtual Environment,
PyTorch Hydra and
Grid.ai requirements.txt. Installation of conda
can found here.
# Grid.ai Virtual Environment
conda create --yes --name hydra python=3.8
conda activate hydra
pip install lightning-grid --upgrade
# from PyTorch Hydra
git clone https://github.com/robert-s-lee/mnist-hydra-grid
cd mnist-hydra-grid
pip install -r requirements.txt
- run the experiment
python mnist-hydra-01.py
Simply change python
to grid run
. Couple of variations below.
# run the model
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py
# save the checkpoint file
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py save_model=True
# save the checkpoint file, override mnistconf.yaml epochs setting
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py save_model=True epochs=3
# alternate config file
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py save_model=True epochs=3 --config-name mnistconf.yaml
- Parallel run on a single server
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py epochs=3,10,15 --multirun
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py epochs=3,10,15 batch_size=32,64 --multirun
[experiment] [2021-11-05T01:21:12.092751+00:00] [2021-11-05 01:21:12,092][HYDRA] Joblib.Parallel(n_jobs=-1,backend=loky,prefer=processes,require=None,verbose=0,timeout=None,pre_dispatch=2*n_jobs,batch_size=auto,temp_folder=None,max_nbytes=None,mmap_mode=r) is launching 3 jobs
[experiment] [2021-11-05T01:21:12.092792+00:00] [2021-11-05 01:21:12,092][HYDRA] Launching jobs, sweep output dir : multirun/2021-11-05/01-21-11
[experiment] [2021-11-05T01:21:12.092799+00:00] [2021-11-05 01:21:12,092][HYDRA] #0 : epochs=3
[experiment] [2021-11-05T01:21:12.092804+00:00] [2021-11-05 01:21:12,092][HYDRA] #1 : epochs=10
[experiment] [2021-11-05T01:21:12.092808+00:00] [2021-11-05 01:21:12,092][HYDRA] #2 : epochs=15
unittest.yml runs equivalent of the following commands:
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py
grid run --localdir --dependency_file requirements.txt mnist-hydra-01.py save_model=True epochs=3
.
NOTE: Remember to add GRIDAI_USERNAME
and GRIDAI_KEY
repo secrets if forking this repo.