-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent validation dataloader reloading when using reload_dataloaders_every_n_epochs
and check_val_every_n_epoch
flags
#10948
Comments
Hey @adamviola, Thanks for raising this issue. Would you be interested to contribute a fix? Best, |
Sure, I can give it a go |
When working on the fix, I ran into unexpected behavior when both the Currently, the validation dataloader is reloaded for every validation check in epochs that are multiples of For example, suppose When I would expect the validation dataloader to only reload once every |
Hey @adamviola, Yes, the val data loader shouldn't be reloaded during the train dataloader epoch otherwise the metrics won't make sense. Therefore, |
@adamviola thank you for contributing this fix! |
should it reload at 3 or 2? I think it should be just 2 and at epoch 3 it should use the recently loaded dataloaders. |
If In the case you describe, we would reload the validation dataloader at epochs 0, 2, 4, 6, 8, 10, etc, but we check val at epochs 2, 5, 8, 11, etc. The validation dataloader created at epoch 6 is never used because it is reloaded at epoch 8 before validating at epoch 8. |
🐛 Bug
When the
reload_dataloaders_every_n_epochs
andcheck_val_every_n_epoch
flags of the Trainer are used, the validation dataloader may reload inconsistently or not reload at all.A few examples:
When
reload_dataloaders_every_n_epochs
= 2 andcheck_val_every_n_epoch
= 3, the validation dataloader reloads every 6 epochs (should be 3).When
reload_dataloaders_every_n_epochs
= 2 andcheck_val_every_n_epoch
= 2, the validation dataloader never reloads.To Reproduce
BoringModel Demo
Expected behavior
When
reload_dataloaders_every_n_epochs
<=check_val_every_n_epoch
, I expect the validation dataloader to reload before each validation run.Environment
Additional context
The culprit is probably
_should_reload_dl_epoch
. The logicnot self.current_epoch % n_epochs
doesn't consider when the validation dataloader was last reloaded.cc @Borda @justusschock @awaelchli @ninginthecloud @tchaton
The text was updated successfully, but these errors were encountered: