SGDR implemented as presented in this paper, built on top of this repo and vgg (with BN) defn borrowed from here.
After storing iterates from SGDR, we run the mode connectivity procedure on a pair of snapshots. Next, we obtain and evaluate models on the plane defined by the two snapshots and their connectivity.
Warmup: In a slightly independent direction, we apply CCA on activations iterates from LB (with and without warmup) and SB training to understand the differences in their training dynamics. Here after training ResNet18 using these 3 different setups, we store activations corresponding to the intialization model and the model after 200 iterations (equal to warmup length). Finally we apply CCA on these 3 pairs of activation sets.
.
├── decoupled_backprop
├── main.py # main file training resnet in decoupled fashion
└── utils.py
└── resnet.py
└── vgg.py
└── find_curve.py
└── store_grid_models.py
└── eval_grid_models.py
└── store_resnet18_acts.py
└── cca_core.py
└── dft_ccas.py
└── compute_dft_cca.py
![alt text]%(https://github.com/epfml/msc-akhilesh-gotmare/blob/master/decoupled_backprop/val_acc_comparison.jpg)
Anaconda, PyTorch