OpusTrainer should disable early stopping until the final stage #293

gregtatum · 2023-12-11T21:34:50Z

While training one of my teachers in the ensemble started out very poorly, and was stopped due to early stopping before it could get to the later stages. Here this is the yellow graph of bad behavior.

Loss:

chrF:

This is the in task group PCOkERaaRtu6s6I7-xE5aA.

gregtatum · 2023-12-18T20:44:03Z

@eu9ene I'm thinking the work here is to have two different training steps. The first step would be to apply N-1 of the training schedule with early-stopping set to 0. Once that completes, we then would run the final stage with early-stopping being taken from the config.

eu9ene · 2023-12-18T20:51:51Z

@eu9ene I'm thinking the work here is to have two different training steps. The first step would be to apply N-1 of the training schedule with early-stopping set to 0. Once that completes, we then would run the final stage with early-stopping being taken from the config.

Yes, to do that we should implement support of training parameters for each stage on the OpusTrainer side. However we used to train on the mixed dataset for 2 epochs with default early stopping (20) and it worked fine. So, we should investigate what changed here. Maybe in fact it did early stop sometimes but since we used a different task for finetuning it didn't affect it. I think the main issue might be just the proportion of the back-translated data + pre-training on the original. We might be able to fix it even without using different parameters for now. See #314.

eu9ene · 2023-12-18T21:50:04Z

Created an issue for OpusTrainer: hplt-project/OpusTrainer#44

eu9ene · 2024-01-16T00:15:28Z

Closing. See discussion in hplt-project/OpusTrainer#44. The issue was #352 and not early stopping. It should train fine with the same parameters now.

gregtatum added the bug Something is broken or not correct label Dec 11, 2023

gregtatum mentioned this issue Dec 16, 2023

[meta] Make the pipeline reliable enough to train many languages #311

Open

eu9ene mentioned this issue Dec 18, 2023

Unstable training with OpusTrainer #314

Closed

eu9ene added the quality Improving robustness and translation quality label Dec 18, 2023

eu9ene self-assigned this Dec 22, 2023

eu9ene closed this as completed Jan 16, 2024

gregtatum mentioned this issue Jan 16, 2024

[meta] Ship 30 languages #369

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpusTrainer should disable early stopping until the final stage #293

OpusTrainer should disable early stopping until the final stage #293

gregtatum commented Dec 11, 2023

gregtatum commented Dec 18, 2023

eu9ene commented Dec 18, 2023

eu9ene commented Dec 18, 2023

eu9ene commented Jan 16, 2024

OpusTrainer should disable early stopping until the final stage #293

OpusTrainer should disable early stopping until the final stage #293

Comments

gregtatum commented Dec 11, 2023

gregtatum commented Dec 18, 2023

eu9ene commented Dec 18, 2023

eu9ene commented Dec 18, 2023

eu9ene commented Jan 16, 2024