Improve GPU utilization for "translate" tasks #785

eu9ene · 2024-07-31T21:29:44Z

Currently, it's ~70%. We could try using a bigger batch but it also depends on language.

GCP console for translate-mono task

eu9ene · 2024-07-31T21:37:43Z

It appears to be even lower for translate-corpus: GCP console

@gregtatum FYI

gregtatum · 2024-08-02T23:54:47Z

Is it possible to dynamically determine this value? Like run N translations, measure and adjust?

ZJaume · 2024-10-22T14:22:05Z

Noticed also this and it's been the same always. I think the bottleneck is decoding. Doing n-best with beam 8 it seems to make much less use of GPU than not doing n-best and about 6-4 beam.

This won't increment the use of GPU, but I've been using --fp16 during inference and training without any significant quality drop. Haven't compared n-best generation though.

ZJaume · 2024-10-22T14:23:54Z

Another alternative would be comparing with ctranslate2, that has faster inference than marian.

eu9ene · 2024-10-22T17:16:50Z

Related to #165

gregtatum · 2024-11-19T13:50:43Z

Training uses dynamic batch sizes, so it changes the batch size over time to find the best value, so there's not really a need to adjust it. It starts somewhat inefficient, but quickly dials in the number to be as efficient as it can.

Translate tasks however are not dynamic for batching size. I played with the them in #931 and got it optimized to be about as efficient as training by adjust the batching behavior. I think this 70% is just the cap for Marian's ability to utilize the GPUs. CTranslate2 was able to get ~96% utilization and was much faster given the same beam size.

It'll take a bit more time to get COMET scores for using CTranslate2 to cross-compare. CTranslate2 doesn't support ensemble decoding, so we'll have to compare with Marian single teacher decoding.

gregtatum · 2025-01-10T17:41:31Z

I think the only thing left to do here is change the config generation to use CTranslate2 by default.

eu9ene added the cost & perf Speeding up and lowering cost for the pipeline label Jul 31, 2024

eu9ene mentioned this issue Jul 31, 2024

[meta] Cost efficiency #453

Open

gregtatum mentioned this issue Oct 30, 2024

[meta] Kick off a 2025-H1 training run #912

Open

gregtatum mentioned this issue Nov 15, 2024

Experiment with distillation data inference #931

Closed

gregtatum mentioned this issue Dec 17, 2024

Rewrite the train scripts and add config support for ctranslate2 #922

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve GPU utilization for "translate" tasks #785

Improve GPU utilization for "translate" tasks #785

eu9ene commented Jul 31, 2024 •

edited

Loading

eu9ene commented Jul 31, 2024

gregtatum commented Aug 2, 2024

ZJaume commented Oct 22, 2024

ZJaume commented Oct 22, 2024

eu9ene commented Oct 22, 2024

gregtatum commented Nov 19, 2024

gregtatum commented Jan 10, 2025

Improve GPU utilization for "translate" tasks #785

Improve GPU utilization for "translate" tasks #785

Comments

eu9ene commented Jul 31, 2024 • edited Loading

eu9ene commented Jul 31, 2024

gregtatum commented Aug 2, 2024

ZJaume commented Oct 22, 2024

ZJaume commented Oct 22, 2024

eu9ene commented Oct 22, 2024

gregtatum commented Nov 19, 2024

gregtatum commented Jan 10, 2025

eu9ene commented Jul 31, 2024 •

edited

Loading