-
-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix learning rate gap on resume #9468
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9468 +/- ##
===========================================
+ Coverage 37.99% 76.67% +38.67%
===========================================
Files 121 120 -1
Lines 15277 15175 -102
===========================================
+ Hits 5805 11635 +5830
+ Misses 9472 3540 -5932
Flags with carried forward coverage won't be shown. Click here to find out more. β View full report in Codecov by Sentry. |
@glenn-jocher this PR fixed the lr issue when resuming. I tested a training locally with interrupted at epoch 5/10 then resume the training. Also I tested time training by: yolo detect train data=runs/data/coco.yaml time=0.01
yolo detect train data=runs/data/coco.yaml time=0.02
yolo detect train data=runs/data/coco.yaml time=0.05 For quick verification, I manually set the train set to
|
@glenn-jocher please take a look and check if I break anything. Thanks! |
@Laughing-q that would be super nice we could get rid of all that timed training logic, but I think I had to include it for a reason. One test for timed training is we need to run with less time and more time than the default epochs count. I'll run some tests on this. |
Oh I think I remember now, we need to update the LR scheduler to hit lrf on If we don't update the LR scheduler then timed training won't be optimal as the LR scheduler will be fixed to hit lrf on the default So timed training does 3 things at the end of each epoch:
|
Signed-off-by: Glenn Jocher <[email protected]>
@Laughing-q ok I moved the scheduler.step() line to the beginning of the train loop instead of the end. The new LR will always be 1 step ahead of the previous one, but I think this fine. Resume looks good now. I CTRL+C a run 3 times and resumed 3 times here: ![]() ![]() ![]() What do you think? |
@glenn-jocher Looks good to me! |
Nice!! |
Co-authored-by: Lakshantha Dissanayake <[email protected]> Co-authored-by: RizwanMunawar <[email protected]> Co-authored-by: Glenn Jocher <[email protected]> Co-authored-by: UltralyticsAssistant <[email protected]> Co-authored-by: gs80140 <[email protected]>
ultralytics 8.1.42
attempt to fix lr issue when resuming
ultralytics 8.1.42
attempt to fix lr issue when resumingultralytics 8.1.42
learning-rate resume fix
ultralytics 8.1.42
learning-rate resume fix
@Laughing-q PR merged! |
Signed-off-by: Glenn Jocher <[email protected]> Co-authored-by: UltralyticsAssistant <[email protected]> Co-authored-by: Glenn Jocher <[email protected]> Co-authored-by: EunChan Kim <[email protected]> Co-authored-by: Lakshantha Dissanayake <[email protected]> Co-authored-by: RizwanMunawar <[email protected]> Co-authored-by: gs80140 <[email protected]>
Signed-off-by: Glenn Jocher <[email protected]> Co-authored-by: UltralyticsAssistant <[email protected]> Co-authored-by: Glenn Jocher <[email protected]> Co-authored-by: EunChan Kim <[email protected]> Co-authored-by: Lakshantha Dissanayake <[email protected]> Co-authored-by: RizwanMunawar <[email protected]> Co-authored-by: gs80140 <[email protected]>
π οΈ PR Summary
Made with β€οΈ by Ultralytics Actions
π Summary
Improvements in training scheduling and more informative logging.
π Key Changes
EarlyStopping:
) for better clarity on training halts due to lack of improvement.π― Purpose & Impact
These changes aim to simplify the training loop for better performance, more precise time management, and clearer communication of important training events, all of which contribute to a more efficient and user-friendly training experience.