Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpusTrainer should disable early stopping until the final stage #293

Closed
Tracked by #369 ...
gregtatum opened this issue Dec 11, 2023 · 4 comments
Closed
Tracked by #369 ...

OpusTrainer should disable early stopping until the final stage #293

gregtatum opened this issue Dec 11, 2023 · 4 comments
Assignees
Labels
bug Something is broken or not correct quality Improving robustness and translation quality

Comments

@gregtatum
Copy link
Member

While training one of my teachers in the ensemble started out very poorly, and was stopped due to early stopping before it could get to the later stages. Here this is the yellow graph of bad behavior.

Loss:
image

chrF:
image

This is the in task group PCOkERaaRtu6s6I7-xE5aA.

@gregtatum gregtatum added the bug Something is broken or not correct label Dec 11, 2023
@eu9ene eu9ene added the quality Improving robustness and translation quality label Dec 18, 2023
@gregtatum
Copy link
Member Author

@eu9ene I'm thinking the work here is to have two different training steps. The first step would be to apply N-1 of the training schedule with early-stopping set to 0. Once that completes, we then would run the final stage with early-stopping being taken from the config.

@eu9ene
Copy link
Collaborator

eu9ene commented Dec 18, 2023

@eu9ene I'm thinking the work here is to have two different training steps. The first step would be to apply N-1 of the training schedule with early-stopping set to 0. Once that completes, we then would run the final stage with early-stopping being taken from the config.

Yes, to do that we should implement support of training parameters for each stage on the OpusTrainer side. However we used to train on the mixed dataset for 2 epochs with default early stopping (20) and it worked fine. So, we should investigate what changed here. Maybe in fact it did early stop sometimes but since we used a different task for finetuning it didn't affect it. I think the main issue might be just the proportion of the back-translated data + pre-training on the original. We might be able to fix it even without using different parameters for now. See #314.

@eu9ene
Copy link
Collaborator

eu9ene commented Dec 18, 2023

Created an issue for OpusTrainer: hplt-project/OpusTrainer#44

@eu9ene eu9ene self-assigned this Dec 22, 2023
@eu9ene
Copy link
Collaborator

eu9ene commented Jan 16, 2024

Closing. See discussion in hplt-project/OpusTrainer#44. The issue was #352 and not early stopping. It should train fine with the same parameters now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken or not correct quality Improving robustness and translation quality
Projects
None yet
Development

No branches or pull requests

2 participants