From 0d3d696087a959dac33b7d30c942770b77388d5f Mon Sep 17 00:00:00 2001 From: Praveen Date: Thu, 17 Aug 2023 16:01:13 -0700 Subject: [PATCH] Documentation and example for running simple NLP service on kuberay (#1340) * add service yaml for nlp * Documentation fixes * Fix instructions * Apply suggestions from code review Co-authored-by: Kai-Hsun Chen Signed-off-by: Praveen * Fix tolerations comment * review comments * Update docs/guidance/stable-diffusion-rayservice.md Signed-off-by: Kai-Hsun Chen --------- Signed-off-by: Praveen Signed-off-by: Kai-Hsun Chen Co-authored-by: Kai-Hsun Chen --- docs/guidance/aws-eks-gpu-cluster.md | 2 +- docs/guidance/stable-diffusion-rayservice.md | 10 ++- docs/guidance/text-summarizer-rayservice.md | 69 ++++++++++++++++ .../samples/ray-service.text-sumarizer.yaml | 79 +++++++++++++++++++ 4 files changed, 158 insertions(+), 2 deletions(-) create mode 100644 docs/guidance/text-summarizer-rayservice.md create mode 100644 ray-operator/config/samples/ray-service.text-sumarizer.yaml diff --git a/docs/guidance/aws-eks-gpu-cluster.md b/docs/guidance/aws-eks-gpu-cluster.md index e77470403d2..f18998b3ae2 100644 --- a/docs/guidance/aws-eks-gpu-cluster.md +++ b/docs/guidance/aws-eks-gpu-cluster.md @@ -32,7 +32,7 @@ Create a GPU node group for Ray GPU workers. > **Note:** If you encounter permission issues with `kubectl`, follow "Step 2: Configure your computer to communicate with your cluster" in the [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#). -2. Please install the NVIDIA device plugin. +2. Please install the NVIDIA device plugin. Note: You don't need this if you used `BOTTLEROCKET_x86_64_NVIDIA` image in above step * Install the DaemonSet for NVIDIA device plugin to run GPU enabled containers in your Amazon EKS cluster. You can refer to the [Amazon EKS optimized accelerated Amazon Linux AMIs](https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html#gpu-ami) or [NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin) repository for more details. * If the GPU nodes have taints, add `tolerations` to `nvidia-device-plugin.yml` to enable the DaemonSet to schedule Pods on the GPU nodes. diff --git a/docs/guidance/stable-diffusion-rayservice.md b/docs/guidance/stable-diffusion-rayservice.md index 5421660476e..94532531972 100644 --- a/docs/guidance/stable-diffusion-rayservice.md +++ b/docs/guidance/stable-diffusion-rayservice.md @@ -21,7 +21,7 @@ kubectl apply -f ray-service.stable-diffusion.yaml This RayService configuration contains some important settings: -* Its `tolerations` for workers match the taints on the GPU node group. Without the tolerations, worker Pods won't be scheduled on GPU nodes. +* The `tolerations` for workers allow them to be scheduled on nodes without any taints or on nodes with specific taints. However, workers will only be scheduled on GPU nodes because we set `nvidia.com/gpu: 1` in the Pod's resource configurations. ```yaml # Please add the following taints to the GPU node. tolerations: @@ -34,6 +34,14 @@ This RayService configuration contains some important settings: ## Step 4: Forward the port of Serve +First get the service name from this command. + +```sh +kubectl get services +``` + +Then, port forward to the serve. + ```sh kubectl port-forward svc/stable-diffusion-serve-svc 8000 ``` diff --git a/docs/guidance/text-summarizer-rayservice.md b/docs/guidance/text-summarizer-rayservice.md new file mode 100644 index 00000000000..ca758de7687 --- /dev/null +++ b/docs/guidance/text-summarizer-rayservice.md @@ -0,0 +1,69 @@ +# Serve a text summarizer using RayService + +> **Note:** The Python files for the Ray Serve application and its client are in the [ray-project/serve_config_examples](https://github.com/ray-project/serve_config_examples) repo. + +## Step 1: Create a Kubernetes cluster with GPUs + +Follow [aws-eks-gpu-cluster.md](./aws-eks-gpu-cluster.md) or [gcp-gke-gpu-cluster.md](./gcp-gke-gpu-cluster.md) to create a Kubernetes cluster with 1 CPU node and 1 GPU node. + +## Step 2: Install KubeRay operator + +Follow [this document](../../helm-chart/kuberay-operator/README.md) to install the latest stable KubeRay operator via Helm repository. +Please note that the YAML file in this example uses `serveConfigV2`, which is supported starting from KubeRay v0.6.0. + +## Step 3: Install a RayService + +```sh +# path: ray-operator/config/samples/ +kubectl apply -f ray-service.text-sumarizer.yaml +``` + +This RayService configuration contains some important settings: + +* The `tolerations`` for workers allow them to be scheduled on nodes without any taints or on nodes with specific taints. However, workers will only be scheduled on GPU nodes because we set `nvidia.com/gpu: 1` in the Pod's resource configurations. + ```yaml + # Please add the following taints to the GPU node. + tolerations: + - key: "ray.io/node-type" + operator: "Equal" + value: "worker" + effect: "NoSchedule" + ``` + +## Step 4: Forward the port of Serve + +First get the service name from this command. + +```sh +kubectl get services +``` + +Then, port forward to the serve. + +```sh +kubectl port-forward svc/text-summarizer-serve-svc 8000 +``` + +Note that the RayService's Kubernetes service will be created after the Serve applications are ready and running. This process may take approximately 1 minute after all Pods in the RayCluster are running. + +## Step 5: Send a request to the text_summarizer model + +```sh +# Step 5.1: Download `text_summarizer_req.py` +curl -LO https://raw.githubusercontent.com/ray-project/serve_config_examples/master/text_summarizer/text_summarizer_req.py + +# Step 5.2: Send a request to the Summarizer model. +python text_summarizer_req.py +# Check printed to console +``` + +## Step 6: Delete your service + +```sh +# path: ray-operator/config/samples/ +kubectl delete -f ray-service.text-sumarizer.yaml +``` + +## Step 7: Uninstall your kuberay operator + +Follow [this document](../../helm-chart/kuberay-operator/README.md) to uninstall the latest stable KubeRay operator via Helm repository. \ No newline at end of file diff --git a/ray-operator/config/samples/ray-service.text-sumarizer.yaml b/ray-operator/config/samples/ray-service.text-sumarizer.yaml new file mode 100644 index 00000000000..fbf9d3e6464 --- /dev/null +++ b/ray-operator/config/samples/ray-service.text-sumarizer.yaml @@ -0,0 +1,79 @@ +apiVersion: ray.io/v1alpha1 +kind: RayService +metadata: + name: text-summarizer +spec: + serviceUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray Serve applications. Default value is 900. + deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for Ray dashboard agent. Default value is 300. + serveConfigV2: | + applications: + - name: text_summarizer + import_path: text_summarizer.text_summarizer:deployment + runtime_env: + working_dir: "https://github.com/ray-project/serve_config_examples/archive/refs/heads/master.zip" + rayClusterConfig: + rayVersion: '2.6.3' # Should match the Ray version in the image of the containers + ######################headGroupSpecs################################# + # Ray head pod template. + headGroupSpec: + # The `rayStartParams` are used to configure the `ray start` command. + # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay. + # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`. + rayStartParams: + dashboard-host: '0.0.0.0' + # Pod template + template: + spec: + containers: + - name: ray-head + image: rayproject/ray-ml:2.6.3 + ports: + - containerPort: 6379 + name: gcs + - containerPort: 8265 + name: dashboard + - containerPort: 10001 + name: client + - containerPort: 8000 + name: serve + volumeMounts: + - mountPath: /tmp/ray + name: ray-logs + resources: + limits: + cpu: "2" + memory: "8G" + requests: + cpu: "2" + memory: "8G" + volumes: + - name: ray-logs + emptyDir: {} + workerGroupSpecs: + # The pod replicas in this group typed worker + - replicas: 1 + minReplicas: 1 + maxReplicas: 10 + groupName: gpu-group + rayStartParams: {} + # Pod template + template: + spec: + containers: + - name: ray-worker + image: rayproject/ray-ml:2.6.3 + resources: + limits: + cpu: 4 + memory: "16G" + nvidia.com/gpu: 1 + requests: + cpu: 3 + memory: "12G" + nvidia.com/gpu: 1 + # Please add the following taints to the GPU node. + tolerations: + - key: "ray.io/node-type" + operator: "Equal" + value: "worker" + effect: "NoSchedule"