How did the authors perform hyperparameter selection for the pretrained mdoels? #164
Replies: 1 comment 1 reply
-
No early stopping, checkpoints mainly for resuming if the job fails. Also no hyperparam tuning. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I found that the pretraining phase does not involve any validation data, and instead, saves checkpoints per epoch. I wonder how the authors decided early stop and hyperparameter tuning. Did the authors run all checkpoints on the evaluation benchmarks?
Beta Was this translation helpful? Give feedback.
All reactions