You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.
'frames ({:.1f}% kept), current batch average objf: {:.6f} over {} frames ({:.1f}% kept) '
'avg time waiting for batch {:.3f}s'.format(
batch_idx, current_epoch, num_epochs,
total_objf/total_frames, total_frames,
100.0*total_frames/total_all_frames,
curr_batch_objf/ (curr_batch_frames+0.001),
curr_batch_frames,
100.0*curr_batch_frames/curr_batch_all_frames,
time_waiting_for_batch/max(1, batch_idx)))
It looks like it wasn't intended. I think the latter makes more sense to me to use in backprop (but we'd probably need to re-tune learning rates etc.) - WDYT?
The text was updated successfully, but these errors were encountered:
Yes, I think normalizing by number of frames would be OK. With Adam optimizers this shouldn't make a difference to the results.
My preference in the abstract would be to simply not normalize at all, which would mean we wouldn't have to take into account accum_grad. But I think it's traditional in machine learning to normalize somehow, so IDK whether people feel this might be confusing to readers.
The loss we backprop is normalized by the number of supervisions:
snowfall/egs/librispeech/asr/simple_v1/mmi_att_transformer_train.py
Lines 114 to 117 in 5d1b00d
But the loss we report is normalized by the number of frames:
snowfall/egs/librispeech/asr/simple_v1/mmi_att_transformer_train.py
Lines 267 to 278 in 5d1b00d
It looks like it wasn't intended. I think the latter makes more sense to me to use in backprop (but we'd probably need to re-tune learning rates etc.) - WDYT?
The text was updated successfully, but these errors were encountered: