Skip to content

Commit

Permalink
Merge branch 'main' into smartsim
Browse files Browse the repository at this point in the history
  • Loading branch information
rickybalin committed Nov 8, 2024
2 parents 99a1a60 + a680538 commit c5ceb1d
Show file tree
Hide file tree
Showing 14 changed files with 291 additions and 119 deletions.
61 changes: 33 additions & 28 deletions docs/aurora/data-management/daos/daos-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ DAOS is fully integrated with the wider Aurora compute fabric as can be seen in



# DAOS Overview
## DAOS Overview

The first step in using DAOS is to get DAOS POOL space allocated for your project.
Users should submit a request as noted below to have a DAOS pool created for your project.
Expand Down Expand Up @@ -66,20 +66,18 @@ Total size: 6.0 TB
Rebuild done, 4 objs, 0 recs
```



## DAOS Container

The container is the basic unit of storage. A POSIX container can contain hundreds of millions of files, you can use it to store all of your data.
You only need a small set of containers; perhaps just one per major unit of project work is sufficient.

There are 3 modes with which we can operate with the DAOS containers
1. Posix container Posix Mode
2. Posix Container MPI-IO Mode
1. POSIX container POSIX Mode
2. POSIX Container MPI-IO Mode
3. DFS container through DAOS APIs.


### Create a posix container
### Create a POSIX container


```bash
Expand Down Expand Up @@ -128,11 +126,11 @@ daos container check --pool=$DAOS_POOL_NAME --cont=$DAOS_CONT_NAME
```
### Mount a posix container
### Mount a POSIX container
Currently, you must manually mount your container prior to use on any node you are working on.
In the future, we hope to automate some of this via additional `qsub` options.
#### To mount a posix container on a login node
#### To mount a POSIX container on a login node
```bash
Expand All @@ -151,7 +149,7 @@ fusermount3 -u /tmp/${DAOS_POOL}/${DAOS_CONT} # To unmount
```
#### To mount a posix container on Compute Nodes
#### To mount a POSIX container on Compute Nodes
You need to mount the container on all compute nodes.
Expand Down Expand Up @@ -196,7 +194,7 @@ CPU_BINDING1=list:4:9:14:19:20:25:56:61:66:71:74:79
## Interception library for posix containers
## Interception library for POSIX containers
The interception library (IL) is a next step in improving DAOS performance. This provides kernel-bypass for I/O data, leading to improved performance.
The libioil IL will intercept basic read and write POSIX calls while all metadata calls still go through dFuse. The libpil4dfs IL should be used for both data and metadata calls to go through dFuse.
Expand Down Expand Up @@ -377,7 +375,7 @@ Each DAOS server nodes is based on the Intel Coyote Pass platform.
## Darshan profiler for DAOS
Currently, you need to install your own local darshan-daos profiler
You need to use DFS mode (3) or Posix with interception library to profile
You need to use DFS mode (3) or POSIX with interception library to profile
```bash
module use /soft/modulefiles
Expand All @@ -403,7 +401,7 @@ cd /home/kaushikvelusamy/soft/profilers/darshan-daos/darshan-logs

```
Preload darshan first then daos interception library
Preload darshan first then DAOS interception library:
```
mpiexec --env LD_PRELOAD=~/soft/profilers/darshan-daos/darshan-install/lib/libdarshan.so:/usr/lib64/libpil4dfs.so
Expand All @@ -413,7 +411,7 @@ mpiexec --env LD_PRELOAD=~/soft/profilers/darshan-daos/darshan-install/lib/libda
```
install darshan-util from laptop
Install darshan-util from laptop:
```bash
Expand All @@ -432,7 +430,7 @@ python3 -m darshan summary ~/Downloads/kaushikv_ior_id917110-44437_10-23-55830-6
## Cluster Size
DAOS Cluster size is the number of available DAOS servers. While we are working towards bringing up the entire 1024 daos server available users, currently different number of daos nodes could be up. Please check with support or run an IOR test to get an estimate on the current number of daos servers available.
DAOS cluster size is the number of available DAOS servers. While we are working towards bringing up the entire 1024 DAOS server available users, currently different number of DAOS nodes could be up. Please check with support or run an IOR test to get an estimate on the current number of DAOS servers available.
![expected Bandwidth](expectedBW.png "Expected number of daos servers and its approximate expected bandwidth")
Expand All @@ -441,18 +439,25 @@ DAOS Cluster size is the number of available DAOS servers. While we are working
## Best practices
```bash
Check qsub –l daos=default
Daos sanity checks mentioned above
Did you load DAOS module? module load daos
Do you have your DAOS pool allocated? daos pool query datascience
Is Daos client running on all your nodes? ps –ef | grep daos
Is your container mounted on all nodes? mount | grep dfuse
Can you ls in your container? ls /tmp/${DAOS_POOL}/${DAOS_CONT}
Did your I/O Actually fail?
What is the health property in your container? daos container get-prop $DAOS_POOL $CONT
Is your space full? Min and max daos pool query datascience
Does your query show failed targets or rebuild in process? daos pool query datascience
daos pool autotest
Daos container check

Check that you requested DAOS
qsub –l daos=default
Did you load DAOS module?
module load daos
Do you have your DAOS pool allocated?
daos pool query datascience
Is DAOS client running on all your nodes?
ps –ef | grep daos
Is your container mounted on all nodes?
mount | grep dfuse
Can you ls in your container?
ls /tmp/${DAOS_POOL}/${DAOS_CONT}
Did your I/O actually fail?
What is the health property in your container?
daos container get-prop $DAOS_POOL $CONT
Is your space full? Min and max
daos pool query datascience
Does your query show failed targets or rebuild in process?
daos pool query datascience
daos pool autotest
daos container check
```
Binary file not shown.
1 change: 1 addition & 0 deletions docs/aurora/data-management/lustre/flare.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@

Home is 12 PB **Gecko** Lustre Filesystem with 32 OSTs and 12 MDTs.

[Follow this link for more basic information on I/O optimization for the Lustre Filesystem I/O](https://anl.box.com/s/uqmgnkn7i3z22c9xrwef8nn702wl22uy)
21 changes: 10 additions & 11 deletions docs/aurora/data-science/frameworks/oneCCL.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,12 @@ kaushikvelusamy@aurora-uan-0012:~> module load frameworks
/opt/aurora/24.180.0/CNDA/oneapi/ccl/2021.13.1_20240808.145507
```


<!-- --8<-- [start:onecclenv] -->
**OneCCL mandatory environment variables**

```bash
module load frameworks
echo $CCL_ROOT
export LD_LIBRARY_PATH=$CCL_ROOT/lib:$LD_LIBRARY_PATH
export CPATH=$CCL_ROOT/include:$CPATH
export LIBRARY_PATH=$CCL_ROOT/lib:$LIBRARY_PATH
The parameters below are recommended to be set all the time as it eigher gives the best performance for all applications or are requires to address potential hang / crash at large scale.

```bash
export CCL_PROCESS_LAUNCHER=pmix
export CCL_ATL_TRANSPORT=mpi
export CCL_ALLREDUCE=topo
Expand All @@ -41,9 +37,15 @@ export CCL_KVS_CONNECTION_TIMEOUT=600

export CCL_ZE_CACHE_OPEN_IPC_HANDLES_THRESHOLD=1024
export CCL_KVS_USE_MPI_RANKS=1

export MPI_PROVIDER=$FI_PROVIDER
unset MPIR_CVAR_CH4_POSIX_COLL_SELECTION_TUNING_JSON_FILE
unset MPIR_CVAR_CH4_COLL_SELECTION_TUNING_JSON_FILE
unset MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE
```

**OneCCL optional environment variables**
The impact of the following environment variable might be application dependent. Users are encourage to try to set them and see whether they help their applications.

```bash
ulimit -c unlimited
Expand All @@ -53,17 +55,14 @@ export FI_CXI_RX_MATCH_MODE=hybrid
export FI_CXI_OFLOW_BUF_SIZE=8388608
export FI_CXI_DEFAULT_CQ_SIZE=1048576
export FI_CXI_CQ_FILL_PERCENT=30
export MPI_PROVIDER=$FI_PROVIDER
unset MPIR_CVAR_CH4_COLL_SELECTION_TUNING_JSON_FILE
unset MPIR_CVAR_COLL_SELECTION_TUNING_JSON_FILE
export INTELGT_AUTO_ATTACH_DISABLE=1
export PALS_PING_PERIOD=240
export PALS_RPC_TIMEOUT=240
export MPIR_CVAR_GATHERV_INTER_SSEND_MIN_PROCS=-1 #to solve the sync send issue in Horovod seg fault
export CCL_ATL_SYNC_COLL=1 #to avoid potential hang at large scale
export CCL_OP_SYNC=1 #to avoid potential hang at large scale
```

<!-- --8<-- [end:onecclenv] -->

**Algorithm selection**

Expand Down
59 changes: 40 additions & 19 deletions docs/aurora/data-science/frameworks/pytorch.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@ the frameworks module. To use it from a compute node, please load the following

```
module use /soft/modulefiles/
module load frameworks/2023.12.15.001
module load frameworks
```
Then you can `import` PyTorch as usual, the following is an output from the
`frameworks/2023.12.15.001` module
`frameworks` module

```
>>> import torch
>>> torch.__version__
'2.0.1a0+cxx11.abi'
'2.3.1+cxx11.abi'
```
A simple but useful check could be to use PyTorch to get device information on
a compute node. You can do this the following way:
Expand Down Expand Up @@ -128,22 +128,12 @@ Some of the Aurora specific details might be helpful to you:
The following environmental variables should be set on the batch submission
script (PBSPro script) in the case of attempting to run beyond 16 nodes.

```shell
# This is a fix for running over 16 nodes:
export FI_CXI_DEFAULT_CQ_SIZE=131072
export FI_CXI_OFLOW_BUF_SIZE=8388608
export FI_CXI_CQ_FILL_PERCENT=20
<!-- --8<-- [start:commononecclenv] -->
#### oneCCL environment variable
--8<-- "./docs/aurora/data-science/frameworks/oneCCL.md:onecclenv"

export FI_LOG_LEVEL=warn
#export FI_LOG_PROV=tcp
export FI_LOG_PROV=cxi

export MPIR_CVAR_ENABLE_GPU=0
# This is to disable certain GPU optimizations like the use of XeLinks between
# GPUs, collectives with GPU-placed data etc., in order to reduce `MPI_Init`
# overheads. Benefits are application dependent.
export CCL_KVS_GET_TIMEOUT=600
```
These environment variable settings will probably be included in the framework module file in the future. But for now, users need to explicitly set these in the submission script.
<!-- --8<-- [end:commononecclenv] -->

In order to run an application with `TF32` precision type, one must set the
following environmental parameter:
Expand Down Expand Up @@ -314,7 +304,7 @@ export IPEX_FP32_MATH_MODE=TF32
#####################################################################

module use /soft/modulefiles
module load frameworks/2023.12.15.001
module load frameworks

export NUMEXPR_NUM_THREADS=64
# This is to resolve an issue due to a package called "numexpr".
Expand All @@ -333,6 +323,37 @@ export NUMEXPR_NUM_THREADS=64
# JOB LAUNCH
######################################################################


## CCL setup
export FI_CXI_DEFAULT_CQ_SIZE=131072
export FI_CXI_OVFLOW_BUF_SIZE=8388608
export FI_CXI_CQ_FILL_PERCENT=20

export FI_LOG_LEVEL=warn
#export FI_LOG_PROV=tcp
export FI_LOG_PROV=cxi

export CCL_KVS_GET_TIMEOUT=600

export LD_LIBRARY_PATH=$CCL_ROOT/lib:$LD_LIBRARY_PATH
export CPATH=$CCL_ROOT/include:$CPATH
export LIBRARY_PATH=$CCL_ROOT/lib:$LIBRARY_PATH

export CCL_PROCESS_LAUNCHER=pmix
export CCL_ATL_TRANSPORT=mpi
export CCL_ALLREDUCE=topo
export CCL_ALLREDUCE_SCALEOUT=rabenseifner # currently best allreduce algorithm at large scale
export CCL_BCAST=double_tree # currently best bcast algorithm at large scale

export CCL_KVS_MODE=mpi
export CCL_CONFIGURATION_PATH=""
export CCL_CONFIGURATION=cpu_gpu_dpcpp
export CCL_KVS_CONNECTION_TIMEOUT=600

export CCL_ZE_CACHE_OPEN_IPC_HANDLES_THRESHOLD=1024
export CCL_KVS_USE_MPI_RANKS=1


export CCL_LOG_LEVEL="WARN"
export CPU_BIND="verbose,list:2-4:10-12:18-20:26-28:34-36:42-44:54-56:62-64:70-72:78-80:86-88:94-96"
HOROVOD_THREAD_AFFINITY="4,12,20,28,36,44,56,64,72,80,88,96"
Expand Down
Loading

0 comments on commit c5ceb1d

Please sign in to comment.