-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve GPU utilization for "translate" tasks #785
Comments
It appears to be even lower for translate-corpus: GCP console @gregtatum FYI |
Is it possible to dynamically determine this value? Like run N translations, measure and adjust? |
Noticed also this and it's been the same always. I think the bottleneck is decoding. Doing n-best with beam 8 it seems to make much less use of GPU than not doing n-best and about 6-4 beam. This won't increment the use of GPU, but I've been using |
Another alternative would be comparing with ctranslate2, that has faster inference than marian. |
Related to #165 |
Training uses dynamic batch sizes, so it changes the batch size over time to find the best value, so there's not really a need to adjust it. It starts somewhat inefficient, but quickly dials in the number to be as efficient as it can. Translate tasks however are not dynamic for batching size. I played with the them in #931 and got it optimized to be about as efficient as training by adjust the batching behavior. I think this 70% is just the cap for Marian's ability to utilize the GPUs. CTranslate2 was able to get ~96% utilization and was much faster given the same beam size. It'll take a bit more time to get COMET scores for using CTranslate2 to cross-compare. CTranslate2 doesn't support ensemble decoding, so we'll have to compare with Marian single teacher decoding. |
Currently, it's ~70%. We could try using a bigger batch but it also depends on language.
GCP console for translate-mono task
The text was updated successfully, but these errors were encountered: