-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What hyperparams do I need to tune when I want to continue a previous training? #9257
Comments
@haimat 👋 Hello! Thanks for asking about resuming training. YOLOv5 🚀 Learning Rate (LR) schedulers follow predefined LR curves for the fixed number of If your training was interrupted for any reason you may continue where you left off using the Resume Single-GPUYou may not change settings when resuming, and no additional arguments other than python train.py --resume # automatically find latest checkpoint (searches yolov5/ directory)
python train.py --resume path/to/last.pt # specify resume checkpoint Resume Multi-GPUMulti-GPU DDP trainings must be resumed with the same GPUs and DDP command, i.e. assuming 8 GPUs: python -m torch.distributed.run --nproc_per_node 8 train.py --resume # resume latest checkpoint
python -m torch.distributed.run --nproc_per_node 8 train.py --resume path/to/last.pt # specify resume checkpoint Start from PretrainedIf you would like to start training from a fully trained model, use the python train.py --weights path/to/best.pt # start from pretrained model Good luck 🍀 and let us know if you have any other questions! |
@glenn-jocher Thanks, but this does not answer my question. I know about what you wrote, but what I don't know is how exactly the hyperparams influence the LR. As described in my use case, my question is: what hyperparams do I need to modify, and in whch way, if I want to do a 2nd training using the |
@haimat you don't need to modify anything, you can start a second training on any dataset from previously trained weights on any other dataset. You can choose to experiment with hyperparameter variations, but of course I can't advise on this, the experimentation is on you. If you want an automated way of evolving hyperparameters see our Hyperparameter Evolution tutorial below. If you're just asking how to modify LR these values are here: yolov5/data/hyps/hyp.scratch-low.yaml Lines 6 to 7 in 63ecce6
Tutorials
Good luck 🍀 and let us know if you have any other questions! |
@glenn-jocher Hi Glenn, thanks for your response. In particular I would be interested to know how the first few parameters influence training:
I see their comments. but they are very brief. Is there some more documentation on them anywhere? |
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐! |
@haimat Yes, the hyperparameters you provided play crucial roles in the training process. Here's a brief overview:
For more details and advanced guidance on these hyperparameters and their effects on training, you can refer to our documentation for YOLOv5. I hope this provides a clearer understanding of how these hyperparameters influence the training process. Let me know if you have any more questions! |
Search before asking
Question
Sometimes I want to continue training using the
best.pt
model from a previous YOLOv5 training run. However, everytime I do so after only 2 or 3 epochs in the new training the model performance drops down quite a bit, often even nearly down to below 0.1, even though it has been 0.5 inbest.pt
from the previous training.I assume that is because of the learning rate being too high. But that way I loose nearly all the training work stored in
best.pt
, which is obviously not what I want. So I guess I need to tweak the hyperparams for the second training.Could you please advice, what hyperparams in particular I would need to tweak, and in which direction (up or down), when I want to fine tune a model, i.e. continue from the
best.pt
file from a previous training session?Additional
As an example, let's have a look at the following training performance, showing the mAP value of my model during 500 epochs:
Looking at the linear line it seems mAP performance of this model can be improved even further, let's say for another 500 training epochs. However, every time I continue training from
best.pt
of that training from the image above, within the first 3-5 epochs or so mAP drops down 0.05 or something like that, then it taks some 100s more epochs to get up again. In the end, after 500 training epochs, I am close to where I have been in the first training.Thus I am basically starting again from the start and loosing many many training epochs. So how can I start from that good mAP value of the first training run and continue from there?
The text was updated successfully, but these errors were encountered: