You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ok, it might take a while since we're focusing on general cleaning now and only when it's done we'll start a full retraining. I think let's close some of those robustness issues as we clearly saw performance improvements on evaluation datasets.
Our models are trained mostly on data that has proper capitalisation, but in the wild people and websites sometimes use ALL CAPS when typing. Since our models haven't seen those words during training they mostly end up copying them to the target as opposed to translating them. We could probably fix this with
--all-caps-every
option:https://github.com/marian-nmt/marian-dev/blob/601c9ac9807b5ffcbed298952435d9a17d954575/src/common/config_parser.cpp#L909
We should investigate what would be good values for that. Every 100? Every 75? Every 50?
The text was updated successfully, but these errors were encountered: