-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unstable training with OpusTrainer #314
Comments
I redirected output of OpusTrainer mixing (original 0.7, backtranslated 0.3) to a file instead of Marian and it looks quite normal. https://firefox-ci-tc.services.mozilla.com/tasks/EPMUZ0nSTJe3awZ2II_zmg/runs/0 |
After disabling back-translations completely and training teachers only on the original corpus, we can still see the same behavior. It might be a bug in OpusTrainer. The student has trained properly this time (due to fixed splitting), even with enabled augmentations. https://firefox-ci-tc.services.mozilla.com/tasks/groups/K1iHndFUSxSEDRLg_H9l1A |
More pictures! Old en-ru run (no OpusTrainer, pre-training on back-translations separately): en-hu (no OpusTrainer, pre-training on back-translations separately):: en-ca (with OpusTrainer and back-translaitons mixed in a dedicated stage) It seems we always had a similar issue for pre-training on noisier data. It's just we ran pre-training for the fixed number of epochs and then fine-tuned to early stopping so it didn't affect the overall training run. Two ways of fixing this:
|
After this investigation I plan on re-starting Train en-ca (#284) from scratch. I feel like this is the last blocker in my previous attempts. |
It seems one of the training stages in OpusTrainer (the one that includes back-translations) reduces performance of the model. Probably it's because now we start training with the original corpus and then switch to the mixed one. Maybe it's not the best idea because the training will likely stop with early-stopping and we can't control training parameters separately for now. Related to #293
For now we can either disable back-translations completely until proving in an experiment that they help or try changing the stages to something like a little of mixed back-translations first and then fine-tuning on the original corpus + increasing early-stopping to 30 or 40 from the default 20.
Current config:
Graphs for en-hu:
The text was updated successfully, but these errors were encountered: