Saving torchvision checkpoints based on staged recipe phase #1499

corey-nm · 2023-03-30T20:06:44Z

Renames BaseManager.phase -> BaseManager.phase_at_end_of
Clarifies behavior
Integrates saving checkpoints based on phases into torchvision

Test Plan

Ran the following recipe:

version: 1.1.0

training_modifiers:
  - !EpochRangeModifier
    start_epoch: 0
    end_epoch: 15

  - !SetLearningRateModifier
    start_epoch: 0.0
    learning_rate: 0.001

pruning_modifiers:
  - !GMPruningModifier
    init_sparsity: 0.05
    final_sparsity: 0.85
    start_epoch: 5.0
    end_epoch: 10.0
    update_frequency: 1.0
    params: ["re:.*conv..weight*"]

quantization_modifiers:
  - !QuantizationModifier
    start_epoch: 11.0
    freeze_bn_stats_epoch: 12.0
    disable_quantization_observer_epoch: 13.0

Which generated the following directory after running:

- best_dense.pth (best_dense.txt contains epoch 3)
- best_pruned_quantized.pth (best_pruned_quantized.txt contains epoch 13)
- best_pruned.pth (best_pruned.txt contains epoch 10)
- last_dense.pth (.txt contains epoch 4)
- last_pruned_quantized.pth (.txt contains epoch 14)
- last_pruned.pth (.txt contains epoch 10)
- last.pth (.txt contains epoch 14)

And the following output:

sparseml.image_classification.train --recipe resnet18-pq.yaml --dataset-path ~/.cache/nm_datasets/imagenette/imagenette-320/ --arch-key resnet18 --output-dir ./runs
INFO:sparseml.pytorch.torchvision.train:Finished epoch 0 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 1 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 2 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 3 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 4 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 5 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 6 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 7 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 8 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 9 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 10 in phase pruned
INFO:sparseml.pytorch.torchvision.train:Finished epoch 11 in phase pruned_quantized
INFO:sparseml.pytorch.torchvision.train:Finished epoch 12 in phase pruned_quantized
INFO:sparseml.pytorch.torchvision.train:Finished epoch 13 in phase pruned_quantized
INFO:sparseml.pytorch.torchvision.train:Finished epoch 14 in phase pruned_quantized

Noting that the following transitions are correct:

0-4 are in dense
start epoch for pruning is 5, so at the END of epoch 5, pruning is in progress -> phase is None
end epoch for pruning is 10, so at the END of epoch 10, pruning is complete -> phase is pruned
start epoch for quantization is 11, so at the END of epoch 11, quantization is complete -> phase is pruned_quantized

dbogunowicz

LGTM, just curious, what was the motivation for the change?

corey-nm · 2023-03-31T14:34:28Z

LGTM, just curious, what was the motivation for the change?

This method of saving was decided for the standardization of integrations. It also makes it more clear when a checkpoint is dense/pruned/quantized. Previously best.pt could contain any of the versions - notably it could still be a dense model

Saving torchvision checkpoints based on staged recipe phase

45d92e8

corey-nm requested review from bfineran, natuan, anmarques and KSGulin March 30, 2023 20:06

dbogunowicz approved these changes Mar 31, 2023

View reviewed changes

Merge branch 'main' into torchvision-phases

1d495c8

corey-nm mentioned this pull request Mar 31, 2023

Saving yolov8 checkpoints based on staged recipe phase #1501

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving torchvision checkpoints based on staged recipe phase #1499

Saving torchvision checkpoints based on staged recipe phase #1499

corey-nm commented Mar 30, 2023 •

edited

Loading

dbogunowicz left a comment

corey-nm commented Mar 31, 2023

Saving torchvision checkpoints based on staged recipe phase #1499

Are you sure you want to change the base?

Saving torchvision checkpoints based on staged recipe phase #1499

Conversation

corey-nm commented Mar 30, 2023 • edited Loading

Test Plan

dbogunowicz left a comment

Choose a reason for hiding this comment

corey-nm commented Mar 31, 2023

corey-nm commented Mar 30, 2023 •

edited

Loading