-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training not storing best model #460
Comments
I don't think mmcv & mmaction2 support this for now. But I implement a quick version of this(haven't test yet), please check here Don't forget to set
|
In current version, the best one could do is to use save interval=1, and manually clean bad ckpts periodically. It would be good to implement |
Thank you. I will, for now, use the @innerlee solution. I would really like to use the @irvingzhang0512 solution. Can this be tested before I can adopt? Thank you once again. |
I will test related codes and create a pr on Monday. |
By the way, if you want set maximum checkpoints to keep, you can set |
Assuming I keep maximum checkpoints to be 10, and number of epochs to be 100. If I get best validation accuracy at epoch 20. I don't think the current mmaction2 will store that eopch. I will go with innerlee. Delete everything except best manually for now. Thank you all. I am closing the issue. |
@VJatla Actually, I've tested #464 and you can
#464 is closed because eval hook will be refactored by #395. Hopefully #395 could fix your issue. |
Hello,
I am trying to use mmaction2 to train on my custom dataset. I am able to train i3d, slowfast, slowonly and TSN.
Due to limitation in hard drive space I am not storing all the epochs. For example I created a check point file for every 3 epochs and the best model is at epoch 2. The epoch 2 checkpoint is not created. Is there anything I can do to store the best eopoch checkpoints even tho I write checkpoints for every 3 epochs,
checkpoint_config = dict(interval=3)
?Please let me know if this is possible.
The text was updated successfully, but these errors were encountered: