diff --git a/docs/model_serving_framework/GPU_support.md b/docs/model_serving_framework/GPU_support.md new file mode 100644 index 0000000000..accda7ed36 --- /dev/null +++ b/docs/model_serving_framework/GPU_support.md @@ -0,0 +1,436 @@ +With model serving framework(released in 2.4 as experimental feature), user can upload deep learning NLP model(support +text embedding model only now) to OpenSearch cluster and run on [ML node](https://opensearch.org/docs/latest/ml-commons-plugin/index/#ml-node). +To get better performance, we need GPU acceleration. We will support GPU ML node from 2.5. This doc explains how to +prepare GPU ML node to run model serving framework (the setup is one-time effort). This doc focus on two types of GPU +device: NVIDIA GPU and AWS Inferentia. + +# 1. NVIDIA GPU + +Test on AWS EC2 `g5.xlarge`, 64-bit(x86) + +- Ubuntu AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Ubuntu 20.04) 20221114` +- Amazon Linux AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Amazon Linux 2) 20221114` +- PyTorch: 1.12.1 + +## 1.1 mount nvidia-uvm device + +Check if you can see `nvidia-uvm` and `nvidia-uvm-tools` under `/dev` by running +``` +ls -al /dev | grep nvidia-uvm +``` + +If not found, run script `nvidia-uvm-init.sh`. You may need to run with sudo. + +Content of `nvidia-uvm-init.sh` (refer to [nvidia doc](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications)): + +``` +#!/bin/bash +## Script to initialize nvidia device nodes. +## https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications +/sbin/modprobe nvidia +if [ "$?" -eq 0 ]; then + # Count the number of NVIDIA controllers found. + NVDEVS=`lspci | grep -i NVIDIA` + N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l` + NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l` + N=`expr $N3D + $NVGA - 1` + for i in `seq 0 $N`; do + mknod -m 666 /dev/nvidia$i c 195 $i + done + mknod -m 666 /dev/nvidiactl c 195 255 +else + exit 1 +fi +/sbin/modprobe nvidia-uvm +if [ "$?" -eq 0 ]; then + # Find out the major device number used by the nvidia-uvm driver + D=`grep nvidia-uvm /proc/devices | awk '{print $1}'` + mknod -m 666 /dev/nvidia-uvm c $D 0 + mknod -m 666 /dev/nvidia-uvm-tools c $D 0 +else + exit 1 +fi +``` + +If you can see `nvidia-uvm` and `nvidia-uvm-tools` under `/dev`, then you can start OpenSearch. + +# 2. AWS Inferentia + +Test on AWS EC2 `inf1.xlarge`, 64-bit(x86) +- Ubuntu AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Ubuntu 20.04) 20221114` +- Amazon Linux AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Amazon Linux 2) 20221114` +- PyTorch: 1.12.1 + +## 2.1 Fresh setup script + +You can use these scripts to setup new ML node. You can also check [2.2 Manual way](#22-manual-way) for more details. + +### 2.1.1 Ubuntu 20.04 + +Test on AWS EC2 `inf1.xlarge`, 64-bit(x86) + +Ubuntu AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Ubuntu 20.04) 20221114` + +Download OpenSearch and set `OS_HOME` first. In this example, we install OpenSearch in home folder. + +``` +cd ~; wget https://artifacts.opensearch.org/releases/bundle/opensearch/2.5.0/opensearch-2.5.0-linux-x64.tar.gz +tar -xvf opensearch-2.5.0-linux-x64.tar.gz + +echo "export OS_HOME=~/opensearch-2.5.0" | tee -a ~/.bash_profile +echo "export PYTORCH_VERSION=1.12.1" | tee -a ~/.bash_profile +source ~/.bash_profile +``` + +Create shell script file `prepare_torch_neuron.sh` and run it. + +Content of `prepare_torch_neuron.sh`: +``` +# Configure Linux for Neuron repository updates +. /etc/os-release +sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <=2MB +echo "-Xss2m" | tee -a $OS_HOME/config/jvm.options +# Increase max file descriptors to 65535 +echo "$(whoami) - nofile 65535" | sudo tee -a /etc/security/limits.conf +# max virtual memory areas vm.max_map_count to 262144 +sudo sysctl -w vm.max_map_count=262144 +``` + +Exit current terminal or open a new terminal to start OpenSearch. + +### 2.1.2 Amazon Linux2 + +Test on AWS EC2 `inf1.xlarge`, 64-bit(x86) + +Amazon Linux AMI: `Deep Learning AMI GPU PyTorch 1.12.1 (Amazon Linux 2) 20221114` + +Download OpenSearch and set `OS_HOME` first. In this example, we install OpenSearch in home folder. + +``` +cd ~; wget https://artifacts.opensearch.org/releases/bundle/opensearch/2.5.0/opensearch-2.5.0-linux-x64.tar.gz +tar -xvf opensearch-2.5.0-linux-x64.tar.gz + +echo "export OS_HOME=~/opensearch-2.5.0" | tee -a ~/.bash_profile +echo "export PYTORCH_VERSION=1.12.1" | tee -a ~/.bash_profile +source ~/.bash_profile +``` + +Create shell script file `prepare_torch_neuron.sh` and run it. + +Content of `prepare_torch_neuron.sh`: +``` +# Configure Linux for Neuron repository updates +sudo tee /etc/yum.repos.d/neuron.repo > /dev/null <=2MB +echo "-Xss2m" | tee -a $OS_HOME/config/jvm.options +# Increase max file descriptors to 65535 +echo "$(whoami) - nofile 65535" | sudo tee -a /etc/security/limits.conf +# max virtual memory areas vm.max_map_count to 262144 +sudo sysctl -w vm.max_map_count=262144 +``` + +Exit current terminal or open a new terminal to start OpenSearch. + +## 2.2 Manual way +### 2.2.1 Install Driver + +Refer to [Deploy on AWS ML accelerator instance](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuron/setup/pytorch-install.html#deploy-on-aws-ml-accelerator-instance), choose tab “**Ubuntu 18 AMI/Ubuntu 20 AMI**”, if you are using different operation system, choose different tab accordingly. +Copy the content here for easy reference. + +``` +# Configure Linux for Neuron repository updates +. /etc/os-release +sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null < +# For example, if you install OS_HOME in your home folder, it will be +# OS_HOME=~/opensearch-2.5.0 + +# Activate pytorch_venv first if you haven't. Refer to "Install Driver" part +source pytorch_venv/bin/activate + + +# Set pytorch neuron lib path. In this example, we create pytorch_venv in home folder, so +PYTORCH_NEURON_LIB_PATH=~/pytorch_venv/lib/python3.7/site-packages/torch_neuron/lib/ + + +mkdir -p $OS_HOME/lib/torch_neuron; cp -r $PYTORCH_NEURON_LIB_PATH/ $OS_HOME/lib/torch_neuron +export PYTORCH_EXTRA_LIBRARY_PATH=$OS_HOME/lib/torch_neuron/lib/libtorchneuron.so +``` + +Increase JVM stack size to >=2MB + +``` +echo "-Xss2m" | sudo tee -a $OS_HOME/config/jvm.options +``` + +Then you can start OpenSearch and upload/load traced neuron model. + +You may see such error when start OpenSearch + +``` +[1]: max file descriptors [8192] for opensearch process is too low, increase to at least [65535] +[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144] +``` + +For the first one, run this command (need to login a new terminal to take effect) + +``` +echo "$(whoami) - nofile 65535" | sudo tee -a /etc/security/limits.conf +``` + +For the second one run this + +``` +sudo sysctl -w vm.max_map_count=262144 +``` + + + + +