Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

[AWS] updates for P4 & EFA #857

Merged
merged 1 commit into from
Jan 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions alpa/device_mesh.py
Original file line number Diff line number Diff line change
Expand Up @@ -1092,6 +1092,7 @@ def launch_xla_servers(self):
"FI_EFA_USE_DEVICE_RDMA": "1",
"LD_LIBRARY_PATH": os.environ.get("LD_LIBRARY_PATH",
""), # For libnccl-net.so
"NCCL_PROTO": "simple",
})

bundle_index = device_bundle_idx_list[i]
Expand Down
1 change: 1 addition & 0 deletions benchmark/cupy/profile_communication.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ def sync(self):
"FI_PROVIDER": "efa",
"FI_EFA_USE_DEVICE_RDMA": "1",
"LD_LIBRARY_PATH": os.environ.get("LD_LIBRARY_PATH", ""), # For libnccl-net.so
"NCCL_PROTO": "simple",
}
elif args.ib:
env_vars = {
Expand Down
7 changes: 5 additions & 2 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ Regardless of installing from wheels or from source, there are a few prerequisit
# Update pip
pip3 install --upgrade pip

# Use your own CUDA version. Here cuda-cuda114 means cuda 11.4
pip3 install cupy-cuda114
# Install cupy
pip3 install cupy-cuda11x

Then, check whether your system already has NCCL installed.

Expand All @@ -32,6 +32,9 @@ Regardless of installing from wheels or from source, there are a few prerequisit
If it prints nothing, then NCCL has already been installed.
Otherwise, follow the printed instructions to install NCCL.

.. code:: bash

python3 -m cupyx.tools.install_library --cuda 11.x --library nccl

Methods
-------
Expand Down