Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Huawei Ascend NPU setup guide #6445

Merged
merged 7 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ repos:
# Do not check files that are automatically generated
'--skip=docs/Gemfile.lock,tests/unit/gpt2-merges.txt,tests/unit/gpt2-vocab.json',
'--ignore-regex=\\n', # Do not count the 'n' in an escaped newline as part of a word
'--ignore-words-list=youn,unsupport,noe', # Word used in error messages that need rewording
'--ignore-words-list=youn,unsupport,noe,cann', # Word used in error messages that need rewording
--check-filenames,
--check-hidden
]
Expand Down
1 change: 1 addition & 0 deletions docs/_tutorials/accelerator-abstraction-interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ torch.distributed.init_process_group(get_accelerator().communication_backend_nam
[Accelerator Setup Guide](accelerator-setup-guide.md) provides a guide on how to setup different accelerators for DeepSpeed. It also comes with simple example how to run deepspeed for different accelerators. The following guides are provided:
1. Run DeepSpeed model on CPU
2. Run DeepSpeed model on XPU
3. Run DeepSpeed model on Huawei Ascend NPU

# Implement new accelerator extension
It is possible to implement a new DeepSpeed accelerator extension to support new accelerator in DeepSpeed. An example to follow is _[Intel Extension For DeepSpeed](https://github.com/intel/intel-extension-for-deepspeed/)_. An accelerator extension contains the following components:
Expand Down
113 changes: 113 additions & 0 deletions docs/_tutorials/accelerator-setup-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ tags: getting-started
- [Introduction](#introduction)
- [Intel Architecture (IA) CPU](#intel-architecture-ia-cpu)
- [Intel XPU](#intel-xpu)
- [Huawei Ascend NPU](#huawei-ascend-npu)

# Introduction
DeepSpeed supports different accelerators from different companies. Setup steps to run DeepSpeed on certain accelerators might be different. This guide allows user to lookup setup instructions for the accelerator family and hardware they are using.
Expand Down Expand Up @@ -132,3 +133,115 @@ accelerator: xpu

## More example for using DeepSpeed on Intel XPU
Refer to https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.1.40/examples/gpu/inference/python/llm for more extensive guide.


# Huawei Ascend NPU

DeepSpeed has been verified on the following Huawei Ascend NPU products:
* Atlas 300T A2

## Installation steps for Huawei Ascend NPU

The following steps outline the process for installing DeepSpeed on an Huawei Ascend NPU:
1. Install the Huawei Ascend NPU Driver and Firmware
<details>
<summary>Click to expand</summary>

Before proceeding with the installation, please download the necessary files from [Huawei Ascend NPU Driver and Firmware](https://www.hiascend.com/en/hardware/firmware-drivers/commercial?product=4&model=11).

The following instructions below are sourced from the [Ascend Community](https://www.hiascend.com/document/detail/en/canncommercial/700/quickstart/quickstart/quickstart_18_0002.html) (refer to the [Chinese version](https://www.hiascend.com/document/detail/zh/canncommercial/700/quickstart/quickstart/quickstart_18_0002.html)):

- Execute the following command to install the driver:
```
./Ascend-hdk-<soc_version>-npu-driver_x.x.x_linux-{arch}.run --full --install-for-all
```

- Execute the following command to install the firmware:
```
./Ascend-hdk-<soc_version>-npu-firmware_x.x.x.x.X.run --full
```
</details>

2. Install CANN
<details>
<summary>Click to expand</summary>

Prior to installation, download the [CANN Toolkit](https://www.hiascend.com/en/software/cann/commercial).

- Install third-party dependencies.
- Ubuntu (The operations are the same for Debian, UOS20, and Linux.)
```
apt-get install -y gcc g++ make cmake zlib1g zlib1g-dev openssl libsqlite3-dev libssl-dev libffi-dev unzip pciutils net-tools libblas-dev gfortran libblas3
```
- openEuler (The operations are the same for EulerOS, CentOS, and BC-Linux.)
```
yum install -y gcc gcc-c++ make cmake unzip zlib-devel libffi-devel openssl-devel pciutils net-tools sqlite-devel lapack-devel gcc-gfortran
```
- Install the required Python dependencies:
```
pip3 install attrs numpy decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions
```
- Install the CANN Toolkit.
```
./Ascend-cann-toolkit_x.x.x_linux-{arch}.run --install
```
</details>

3. Install PyTorch \
`pip install torch torch_npu`

4. Install DeepSpeed \
`pip install deepspeed`

You can view the installation results using the `ds_report` command, Here is an example:
```
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/root/miniconda3/envs/ds/lib/python3.10/site-packages/torch']
torch version .................... 2.2.0
deepspeed install path ........... ['/root/miniconda3/envs/ds/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.14.4, unknown, unknown
deepspeed wheel compiled w. ...... torch 2.2
torch_npu install path ........... ['/root/miniconda3/envs/ds/lib/python3.10/site-packages/torch_npu']
torch_npu version ................ 2.2.0
ascend_cann version .............. 8.0.RC2.alpha002
shared memory (/dev/shm) size .... 20.00 GB
```

## How to launch DeepSpeed on Huawei Ascend NPU

To validate the Huawei Ascend NPU availability and if the accelerator is correctly chosen, here is an example(Huawei Ascend NPU detection is automatic starting with DeepSpeed v0.12.6):
```
>>> import torch
>>> print('torch:',torch.__version__)
torch: 2.2.0
>>> import torch_npu
>>> print('torch_npu:',torch.npu.is_available(),",version:",torch_npu.__version__)
torch_npu: True ,version: 2.2.0
>>> from deepspeed.accelerator import get_accelerator
>>> print('accelerator:', get_accelerator()._name)
accelerator: npu
```

## Multi-card parallel training using Huawei Ascend NPU

To perform model training across multiple Huawei Ascend NPU cards using DeepSpeed, see the examples provided in [DeepSpeed Examples](https://github.com/microsoft/DeepSpeedExamples/blob/master/training/cifar/cifar10_deepspeed.py).
Loading