You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
As the title says, the current number of batches and epochs are calculated for each split as follows:
...
# Process training data
if self.train_data is not None or self.train_loader is not None:
# Calculate the number of update steps during training given the
# local_update_steps
num_train_batch, num_train_batch_last_epoch, num_train_epoch, \
num_total_train_batch = self.pre_calculate_batch_epoch_num(
self.cfg.train.local_update_steps)
self.num_train_epoch = num_train_epoch
self.num_train_batch = num_train_batch
self.num_train_batch_last_epoch = num_train_batch_last_epoch
self.num_total_train_batch = num_total_train_batch
# Process evaluation data
for mode in ["val", "test"]:
setattr(self, "num_{}_epoch".format(mode), 1)
if self.get("{}_data".format(mode)) is not None or self.get(
"{}_loader".format(mode)) is not None:
setattr(
self, "num_{}_batch".format(mode),
getattr(self, "num_{}_data".format(mode)) //
self.cfg.data.batch_size +
int(not self.cfg.data.drop_last and bool(
getattr(self, "num_{}_data".format(mode)) %
self.cfg.data.batch_size)))
...
and the fintune and training routine stops at
def _run_routine(self, ...):
...
# Break in the final epoch
if self.ctx.cur_mode == 'train' and epoch_i == \
self.ctx.num_train_epoch - 1:
if batch_i >= self.ctx.num_train_batch_last_epoch - 1:
break
...
The problems are
If we choose test or validate split for training routine, the num_train_batch_last_epoch and num_train_epoch are all wrong(since they are calculated for the training split).
If we set different parameters (say local update steps) for finetune and training, they should have different num_train_batch_last_epoch and num_train_epoch.
Expected behavior
The number of batches and epochs should follow the combination of mode and split.
The text was updated successfully, but these errors were encountered:
DavdGao
changed the title
The actual number of batches within the finetune and training routines is wrong.
The combination of different mode and split leads to wrong calculation for number of batches and number of epochs
Jul 27, 2022
Describe the bug
As the title says, the current number of batches and epochs are calculated for each split as follows:
and the fintune and training routine stops at
The problems are
num_train_batch_last_epoch
andnum_train_epoch
are all wrong(since they are calculated for the training split).num_train_batch_last_epoch
andnum_train_epoch
.Expected behavior
The number of batches and epochs should follow the combination of mode and split.
The text was updated successfully, but these errors were encountered: