Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving torchvision checkpoints based on staged recipe phase #1499

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

corey-nm
Copy link
Contributor

@corey-nm corey-nm commented Mar 30, 2023

  • Renames BaseManager.phase -> BaseManager.phase_at_end_of
  • Clarifies behavior
  • Integrates saving checkpoints based on phases into torchvision

Test Plan

Ran the following recipe:

version: 1.1.0

training_modifiers:
  - !EpochRangeModifier
    start_epoch: 0
    end_epoch: 15

  - !SetLearningRateModifier
    start_epoch: 0.0
    learning_rate: 0.001

pruning_modifiers:
  - !GMPruningModifier
    init_sparsity: 0.05
    final_sparsity: 0.85
    start_epoch: 5.0
    end_epoch: 10.0
    update_frequency: 1.0
    params: ["re:.*conv..weight*"]

quantization_modifiers:
  - !QuantizationModifier
    start_epoch: 11.0
    freeze_bn_stats_epoch: 12.0
    disable_quantization_observer_epoch: 13.0

Which generated the following directory after running:

- best_dense.pth (best_dense.txt contains epoch 3)
- best_pruned_quantized.pth (best_pruned_quantized.txt contains epoch 13)
- best_pruned.pth (best_pruned.txt contains epoch 10)
- last_dense.pth (.txt contains epoch 4)
- last_pruned_quantized.pth (.txt contains epoch 14)
- last_pruned.pth (.txt contains epoch 10)
- last.pth (.txt contains epoch 14)

And the following output:

sparseml.image_classification.train --recipe resnet18-pq.yaml --dataset-path ~/.cache/nm_datasets/imagenette/imagenette-320/ --arch-key resnet18 --output-dir ./runs
INFO:sparseml.pytorch.torchvision.train:Finished epoch 0 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 1 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 2 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 3 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 4 in phase dense
INFO:sparseml.pytorch.torchvision.train:Finished epoch 5 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 6 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 7 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 8 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 9 in phase None
INFO:sparseml.pytorch.torchvision.train:Finished epoch 10 in phase pruned
INFO:sparseml.pytorch.torchvision.train:Finished epoch 11 in phase pruned_quantized
INFO:sparseml.pytorch.torchvision.train:Finished epoch 12 in phase pruned_quantized
INFO:sparseml.pytorch.torchvision.train:Finished epoch 13 in phase pruned_quantized
INFO:sparseml.pytorch.torchvision.train:Finished epoch 14 in phase pruned_quantized

Noting that the following transitions are correct:

  • 0-4 are in dense
  • start epoch for pruning is 5, so at the END of epoch 5, pruning is in progress -> phase is None
  • end epoch for pruning is 10, so at the END of epoch 10, pruning is complete -> phase is pruned
  • start epoch for quantization is 11, so at the END of epoch 11, quantization is complete -> phase is pruned_quantized

Copy link
Contributor

@dbogunowicz dbogunowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just curious, what was the motivation for the change?

@corey-nm
Copy link
Contributor Author

LGTM, just curious, what was the motivation for the change?

This method of saving was decided for the standardization of integrations. It also makes it more clear when a checkpoint is dense/pruned/quantized. Previously best.pt could contain any of the versions - notably it could still be a dense model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants