Investigate removing teacher ensemble training #778

gregtatum · 2024-07-30T21:47:01Z

Training a second teacher improves performance only slightly. It may be more cost efficient to take the quality hit and remove it.

Comet Change	Average Type
+00.15	Mean
+00.14	Median

Spreadsheet

For instance, if we spent 1000 gpu hours synthesizing student data, it could drop it to 500 gpu hours. Then if we spent 100 gpu hours training teachers, this would drop it to 50 gpu hours. We also wouldn't have a gap of training time where we train 1 teacher first, determine the quality, and then have to train a second teacher before going to the student step.

It would be worth testing this on student training to see if we get an unexpected hit in the distillation quality gap.

gregtatum · 2024-12-02T15:29:32Z

The results are in #931. We took a -0.25 COMET hit, which is barely below the ±0.12 standard deviation. CTranslate2 was moderately worse at -0.32.

gregtatum · 2024-12-20T15:46:23Z

I think the general consensus here is that we can take the quality hit on removing the ensemble, especially with the gains in model quality from larger student models. CTranslate2 and Marian single models are equivalent since the difference is within the standard deviation.

gregtatum added the cost & perf Speeding up and lowering cost for the pipeline label Jul 30, 2024

This was referenced Jul 30, 2024

[meta] Cost efficiency #453

Open

Investigate CTranslate2 for translating sentences with teacher model #165

Open

gregtatum added the experiment A training experiment with hypothesis and results label Oct 30, 2024

gregtatum self-assigned this Oct 30, 2024

gregtatum mentioned this issue Oct 30, 2024

[meta] Kick off a 2024-H2 training run #912

Open

gregtatum mentioned this issue Nov 15, 2024

Experiment with distillation data inference #931

Closed

gregtatum closed this as completed Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate removing teacher ensemble training #778

Investigate removing teacher ensemble training #778

gregtatum commented Jul 30, 2024 •

edited

Loading

gregtatum commented Dec 2, 2024 •

edited

Loading

gregtatum commented Dec 20, 2024

Investigate removing teacher ensemble training #778

Investigate removing teacher ensemble training #778

Comments

gregtatum commented Jul 30, 2024 • edited Loading

gregtatum commented Dec 2, 2024 • edited Loading

gregtatum commented Dec 20, 2024

gregtatum commented Jul 30, 2024 •

edited

Loading

gregtatum commented Dec 2, 2024 •

edited

Loading