Improved fine-tuning, ConvNeXt support, improved training speed of GHNs #7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Training times
Implementation of some steps in the forward pass of GHNs is improved to speed up the training time of GHNs without altering their overall behavior.
Speed is measured on NVIDIA A100-40GB in terms of seconds per training iteration on ImageNet (averaged for the first 50 iterations). 4xA100 are used for meta-batch size (bm) = 8. Measurements can be noisy because of potentially other users using some computational resources of the same cluster node.
--amp
argument in the code)Fine-tuning and ConvNeXt support
According to the report (Pretraining a Neural Network before Knowing Its Architecture) showing improved fine-tuning results, the following arguments are added to the code:
--opt
,--init
,--imsize
,--beta
,--layer
and file ppuda/utils/init.py with initialization functions. Also argument--val
is added to enable evaluation on the validation data rather than testing data during training.For example, to obtain fine-tuning results of
GHN-orth
for ResNet-50 according to the report:python experiments/sgd/train_net.py --val --split predefined --arch 0 --epochs 300 -d cifar10 --n_shots 100 --lr 0.01 --wd 0.01 --ckpt ./checkpoints/ghn2_imagenet.pt --opt sgd --init orth --imsize 32 --beta 3e-5 --layer 37
For ConvNeXt-Base:
python experiments/sgd/train_net.py --val --arch convnext_base -b 48 --epochs 300 -d cifar10 --n_shots 100 --lr 0.001 --wd 0.1 --ckpt ./checkpoints/ghn2_imagenet.pt --opt adamw --init orth --imsize 32 --beta 3e-5 --layer 94
.Multiple warnings will be printed that some layers (layer_scale) of ConvNeXt are not supported by GHNs, which is intended.
A simple example to try parameter prediction for ConvNeXt is to run:
python examples/torch_models.py cifar10 convnext_base
Code correctness
To make sure that the evaluation results (classification accuracies of predicted parameters) reported in the paper are the same as in this PR, the GHNs were evaluated on selected architectures and the same results were obtained (see the table below).
To further confirm the correctness of the updated code, the training loss and top1 accuracy of training GHN-2 on CIFAR-10 for 3 epochs are reported in the table below. The command used in this benchmark is:
python experiments/train_ghn.py -m 8 -n -v 50 --ln
.These results can be noisy because of several factors like random batches, initialization of GHN, etc.
Other
Python script experiments/train_ghn_stable.py is added to automatically resume training GHNs from the last saved checkpoint (if any) if the run failed for some reason (e.g. OOM, nan loss, etc.).
Now instead of running
python experiments/train_ghn.py -m 8 -n -v 50 --ln
one can usepython experiments/train_ghn_stable.py experiments/train_ghn.py -m 8 -n -v 50 --ln
.