Skip to content

Commit

Permalink
small fix
Browse files Browse the repository at this point in the history
  • Loading branch information
minhthuc2502 committed Mar 4, 2024
1 parent ac8f7ae commit 05a2702
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions docs/parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ Parallelization with multiple Python threads is possible because all computation
```

## Model and tensor parallelism
Models as the [`Translator`](python/ctranslate2.Translator.rst) and [`Generator`](python/ctranslate2.Generator.rst) can be split into multiple GPUs different.
This is very helpful when the model is too big to be load in only 1 GPU.
Models used with [`Translator`](python/ctranslate2.Translator.rst) and [`Generator`](python/ctranslate2.Generator.rst) can be split into multiple GPUs.
This is very useful when the model is too big to be loaded in only 1 GPU.

```python
translator = ctranslate2.Translator(model_path, device="cuda", tensor_parallel=True)
Expand All @@ -58,33 +58,33 @@ Setup environment:
```

Run:
* Run the application in multiprocess to using tensor parallel:
* Run the application in multiprocess to use tensor parallel:
```bash
mpirun -np nbGPUExpected -hostfile hostfile python3 script
```

If you're trying to run the tensor parallelism in multiple machine, there are additional configuration is needed:
If you're trying to use tensor parallelism in multiple machines, some additional configuration is needed:
* Make sure Master and Slave can connect to each other as a pair with ssh + pubkey
* Export all necessary environment variables from Master to Slave like the example below:
```bash
mpirun -x VIRTUAL_ENV_PROMPT -x PATH -x VIRTUAL_ENV -x _ -x LD_LIBRARY_PATH -np nbGPUExpected -hostfile hostfile python3 script
```
Read more [open-mpi docs](https://www.open-mpi.org/doc/) for more information.

* In this mode, the application will be run in multiprocess. We can filter out the master process by using:
* In this mode, the application will run in multiprocess. We can filter out the master process by using:
```python
if ctranslate2.MpiInfo.getCurRank() == 0:
print(...)
```

```{note}
Running model in tensor parallel mode in one machine can boost the performance but if running the model shared between multiple
machine could be slower because of the latency in the connectivity.
Running model in tensor parallel mode in one machine can boost the performance but if the model shared between multiple machines
could be slower because of the latency in the connectivity.
```

```{note}
In mode tensor parallel, `inter_threads` is always supported to run multiple workers. Otherwise, `device_index` no longer has any effect
because tensor parallel mode will check only available gpus on the system and number of gpu that you want to use.
because tensor parallel mode will check only for available gpus on the system and the number of gpus you want to use.
```

## Asynchronous execution
Expand Down

0 comments on commit 05a2702

Please sign in to comment.