Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resume Training Issue #5505

Closed
yan-roo opened this issue Jul 2, 2021 · 5 comments
Closed

Resume Training Issue #5505

yan-roo opened this issue Jul 2, 2021 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@yan-roo
Copy link

yan-roo commented Jul 2, 2021

Hello,

When I used the command to resume training from epoch 76
bash tools/dist_train.sh configs 8 --resume-from epoch_76.pth

The training resume from epoch 64 instead.

2021-07-02 10:24:28,325 - mmdet - INFO - load checkpoint from work_dirs/model/epoch_76.pth
2021-07-02 10:24:28,325 - mmdet - INFO - Use load_from_local loader
2021-07-02 10:24:30,810 - mmdet - INFO - resumed epoch 64, iter 156416
2021-07-02 10:24:30,811 - mmdet - INFO - Start running, host: u8485326@ppsox5ctr1624351441275-pnh75, work_dir: /home/u8485326/mmdetection/work_dirs/model
2021-07-02 10:24:30,811 - mmdet - INFO - workflow: [('train', 1)], max: 280 epochs

Any suggention?

Thanks!

@jshilong
Copy link
Collaborator

jshilong commented Jul 2, 2021

Is there any modification to the code?

@jshilong jshilong added the bug Something isn't working label Jul 2, 2021
@yan-roo
Copy link
Author

yan-roo commented Jul 2, 2021

@jshilong
Only modification on the config file and backbone.

@jack-luoluo
Copy link

Same problem like this

2021-07-04 13:17:18,745 - mmdet - INFO - load checkpoint from /content/drive/MyDrive/project_0702/test/work_dirs/yolov3_d53_mstrain-608_273e_coco/epoch_91.pth
2021-07-04 13:17:18,745 - mmdet - INFO - Use load_from_local loader
2021-07-04 13:17:19,439 - mmdet - INFO - resumed epoch 42, iter 101934
2021-07-04 13:17:19,440 - mmdet - INFO - Start running, host: root@84359d615290, work_dir: /content/drive/My Drive/project_0702/test/work_dirs/yolov3_d53_mstrain-608_273e_coco
2021-07-04 13:17:19,441 - mmdet - INFO - Hooks will be executed in the following order:

@jshilong

@Klawens
Copy link

Klawens commented Jul 4, 2021

Same here.

2021-07-04 22:58:33,189 - mmdet - INFO - load checkpoint from fold0/epoch_19.pth
2021-07-04 22:58:33,189 - mmdet - INFO - Use load_from_local loader
2021-07-04 22:58:35,149 - mmdet - INFO - resumed epoch 7, iter 22652
2021-07-04 22:58:35,150 - mmdet - INFO - Start running, host: lsc@3090, work_dir: /home/lsc/visdrone/mmdetection/fold0
2021-07-04 22:58:35,150 - mmdet - INFO - workflow: [('train', 1)], max: 24 epochs
2021-07-04 22:59:00,154 - mmdet - INFO - Epoch [8][50/3236] lr: 2.500e-03, eta: 7:37:20, time: 0.499, data_time: 0.052, memory: 7429, loss_rpn_cls: 0.0257, loss_rpn_bbox: 0.0415, s0.loss_cls: 0.3324, s0.acc: 88.0918, s0.loss_bbox: 0.1875, s1.loss_cls: 0.1547, s1.acc: 88.8967, s1.loss_bbox: 0.1939, s2.loss_cls: 0.0772, s2.acc: 89.0294, s2.loss_bbox: 0.1006, loss: 1.1135

@hhaAndroid
Copy link
Collaborator

hhaAndroid commented Jul 5, 2021

I closed this issue because it has been resolved by open-mmlab/mmcv#1108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants