From 05a2702c7ee456013723b86c82d6700af56cbc26 Mon Sep 17 00:00:00 2001 From: thucpham Date: Mon, 4 Mar 2024 12:23:35 +0100 Subject: [PATCH] small fix --- docs/parallel.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/parallel.md b/docs/parallel.md index 887a1d744..ba827d7b2 100644 --- a/docs/parallel.md +++ b/docs/parallel.md @@ -42,8 +42,8 @@ Parallelization with multiple Python threads is possible because all computation ``` ## Model and tensor parallelism -Models as the [`Translator`](python/ctranslate2.Translator.rst) and [`Generator`](python/ctranslate2.Generator.rst) can be split into multiple GPUs different. -This is very helpful when the model is too big to be load in only 1 GPU. +Models used with [`Translator`](python/ctranslate2.Translator.rst) and [`Generator`](python/ctranslate2.Generator.rst) can be split into multiple GPUs. +This is very useful when the model is too big to be loaded in only 1 GPU. ```python translator = ctranslate2.Translator(model_path, device="cuda", tensor_parallel=True) @@ -58,12 +58,12 @@ Setup environment: ``` Run: -* Run the application in multiprocess to using tensor parallel: +* Run the application in multiprocess to use tensor parallel: ```bash mpirun -np nbGPUExpected -hostfile hostfile python3 script ``` -If you're trying to run the tensor parallelism in multiple machine, there are additional configuration is needed: +If you're trying to use tensor parallelism in multiple machines, some additional configuration is needed: * Make sure Master and Slave can connect to each other as a pair with ssh + pubkey * Export all necessary environment variables from Master to Slave like the example below: ```bash @@ -71,20 +71,20 @@ mpirun -x VIRTUAL_ENV_PROMPT -x PATH -x VIRTUAL_ENV -x _ -x LD_LIBRARY_PATH -np ``` Read more [open-mpi docs](https://www.open-mpi.org/doc/) for more information. -* In this mode, the application will be run in multiprocess. We can filter out the master process by using: +* In this mode, the application will run in multiprocess. We can filter out the master process by using: ```python if ctranslate2.MpiInfo.getCurRank() == 0: print(...) ``` ```{note} -Running model in tensor parallel mode in one machine can boost the performance but if running the model shared between multiple -machine could be slower because of the latency in the connectivity. +Running model in tensor parallel mode in one machine can boost the performance but if the model shared between multiple machines +could be slower because of the latency in the connectivity. ``` ```{note} In mode tensor parallel, `inter_threads` is always supported to run multiple workers. Otherwise, `device_index` no longer has any effect -because tensor parallel mode will check only available gpus on the system and number of gpu that you want to use. +because tensor parallel mode will check only for available gpus on the system and the number of gpus you want to use. ``` ## Asynchronous execution