Skip to content

Commit

Permalink
Merge pull request #579 from zhenghh04/main
Browse files Browse the repository at this point in the history
oneCCL environment variables rewording
  • Loading branch information
kaushikvelusamy authored Dec 12, 2024
2 parents a008057 + f02a8a7 commit 8e6173e
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 7 deletions.
10 changes: 4 additions & 6 deletions docs/aurora/data-science/frameworks/oneCCL.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,14 @@ kaushikvelusamy@aurora-uan-0012:~> module load frameworks
```

<!-- --8<-- [start:onecclenv] -->
**OneCCL mandatory environment variables**
**oneCCL environment variables**

The parameters below are recommended to be set all the time as it eigher gives the best performance for all applications or are requires to address potential hang / crash at large scale.
We have identified a set of environment settings that give better performance or address potential hang / crash at large scale. These setup are still **EXPERIMENTAL** and might change as we discover better setup. The users are encouraged to check this page regularly.

```bash
export CCL_PROCESS_LAUNCHER=pmix
export CCL_ATL_TRANSPORT=mpi
export CCL_ALLREDUCE=topo
export CCL_ALLREDUCE_SCALEOUT=rabenseifner # currently best allreduce algorithm at large scale
export CCL_ALLREDUCE_SCALEOUT=direct:0-1048576;rabenseifner:1048577-max # currently best allreduce algorithm at large scale
export CCL_BCAST=double_tree # currently best bcast algorithm at large scale

export CCL_KVS_MODE=mpi
Expand All @@ -44,8 +43,7 @@ unset MPIR_CVAR_CH4_COLL_SELECTION_TUNING_JSON_FILE
unset MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE
```

**OneCCL optional environment variables**
The impact of the following environment variable might be application dependent. Users are encourage to try to set them and see whether they help their applications.
The following additional set of environment variable setup might be application dependent. Users are encourage to try to set them and see whether they help their applications.

```bash
ulimit -c unlimited
Expand Down
2 changes: 1 addition & 1 deletion docs/aurora/data-science/frameworks/pytorch.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ The following environmental variables should be set on the batch submission
script (PBSPro script) in the case of attempting to run beyond 16 nodes.

<!-- --8<-- [start:commononecclenv] -->
#### oneCCL environment variable
#### oneCCL environment variables
--8<-- "./docs/aurora/data-science/frameworks/oneCCL.md:onecclenv"

These environment variable settings will probably be included in the framework module file in the future. But for now, users need to explicitly set these in the submission script.
Expand Down

0 comments on commit 8e6173e

Please sign in to comment.