Documentation and example for running simple NLP service on kuberay (r…

…ay-project#1340) * add service yaml for nlp * Documentation fixes * Fix instructions * Apply suggestions from code review Co-authored-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Praveen <[email protected]> * Fix tolerations comment * review comments * Update docs/guidance/stable-diffusion-rayservice.md Signed-off-by: Kai-Hsun Chen <[email protected]> --------- Signed-off-by: Praveen <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]>
blublinsky · Aug 25, 2023 · 0d3d696 · 0d3d696
1 parent 106490e
commit 0d3d696
Show file tree

Hide file tree

Showing 4 changed files with 158 additions and 2 deletions.
diff --git a/docs/guidance/aws-eks-gpu-cluster.md b/docs/guidance/aws-eks-gpu-cluster.md
@@ -32,7 +32,7 @@ Create a GPU node group for Ray GPU workers.
 > **Note:** If you encounter permission issues with `kubectl`, follow "Step 2: Configure your computer to communicate with your cluster"
 in the [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#).
 
-2. Please install the NVIDIA device plugin.
+2. Please install the NVIDIA device plugin. Note: You don't need this if you used `BOTTLEROCKET_x86_64_NVIDIA` image in above step
    * Install the DaemonSet for NVIDIA device plugin to run GPU enabled containers in your Amazon EKS cluster. You can refer to the [Amazon EKS optimized accelerated Amazon Linux AMIs](https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html#gpu-ami)
    or [NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin) repository for more details.
    * If the GPU nodes have taints, add `tolerations` to `nvidia-device-plugin.yml` to enable the DaemonSet to schedule Pods on the GPU nodes.

diff --git a/docs/guidance/stable-diffusion-rayservice.md b/docs/guidance/stable-diffusion-rayservice.md
@@ -21,7 +21,7 @@ kubectl apply -f ray-service.stable-diffusion.yaml
 
 This RayService configuration contains some important settings:
 
-* Its `tolerations` for workers match the taints on the GPU node group. Without the tolerations, worker Pods won't be scheduled on GPU nodes.
+* The `tolerations` for workers allow them to be scheduled on nodes without any taints or on nodes with specific taints. However, workers will only be scheduled on GPU nodes because we set `nvidia.com/gpu: 1` in the Pod's resource configurations.
     ```yaml
     # Please add the following taints to the GPU node.
     tolerations:
@@ -34,6 +34,14 @@ This RayService configuration contains some important settings:
 
 ## Step 4: Forward the port of Serve
 
+First get the service name from this command.
+
+```sh
+kubectl get services
+```
+
+Then, port forward to the serve.
+
 ```sh
 kubectl port-forward svc/stable-diffusion-serve-svc 8000
 ```

diff --git a/docs/guidance/text-summarizer-rayservice.md b/docs/guidance/text-summarizer-rayservice.md
@@ -0,0 +1,69 @@
+# Serve a text summarizer using RayService
+
+> **Note:** The Python files for the Ray Serve application and its client are in the [ray-project/serve_config_examples](https://github.com/ray-project/serve_config_examples) repo.
+
+## Step 1: Create a Kubernetes cluster with GPUs
+
+Follow [aws-eks-gpu-cluster.md](./aws-eks-gpu-cluster.md) or [gcp-gke-gpu-cluster.md](./gcp-gke-gpu-cluster.md) to create a Kubernetes cluster with 1 CPU node and 1 GPU node.
+
+## Step 2: Install KubeRay operator
+
+Follow [this document](../../helm-chart/kuberay-operator/README.md) to install the latest stable KubeRay operator via Helm repository.
+Please note that the YAML file in this example uses `serveConfigV2`, which is supported starting from KubeRay v0.6.0.
+
+## Step 3: Install a RayService
+
+```sh
+# path: ray-operator/config/samples/
+kubectl apply -f ray-service.text-sumarizer.yaml
+```
+
+This RayService configuration contains some important settings:
+
+* The `tolerations`` for workers allow them to be scheduled on nodes without any taints or on nodes with specific taints. However, workers will only be scheduled on GPU nodes because we set `nvidia.com/gpu: 1` in the Pod's resource configurations.
+    ```yaml
+    # Please add the following taints to the GPU node.
+    tolerations:
+        - key: "ray.io/node-type"
+        operator: "Equal"
+        value: "worker"
+        effect: "NoSchedule"
+    ```
+
+## Step 4: Forward the port of Serve
+
+First get the service name from this command.
+
+```sh
+kubectl get services
+```
+
+Then, port forward to the serve.
+
+```sh
+kubectl port-forward svc/text-summarizer-serve-svc 8000
+```
+
+Note that the RayService's Kubernetes service will be created after the Serve applications are ready and running. This process may take approximately 1 minute after all Pods in the RayCluster are running.
+
+## Step 5: Send a request to the text_summarizer model
+
+```sh
+# Step 5.1: Download `text_summarizer_req.py` 
+curl -LO https://raw.githubusercontent.com/ray-project/serve_config_examples/master/text_summarizer/text_summarizer_req.py
+
+# Step 5.2: Send a request to the Summarizer model.
+python text_summarizer_req.py
+# Check printed to console
+```
+
+## Step 6: Delete your service
+
+```sh
+# path: ray-operator/config/samples/
+kubectl delete -f ray-service.text-sumarizer.yaml
+```
+
+## Step 7: Uninstall your kuberay operator
+
+Follow [this document](../../helm-chart/kuberay-operator/README.md) to uninstall the latest stable KubeRay operator via Helm repository.
diff --git a/ray-operator/config/samples/ray-service.text-sumarizer.yaml b/ray-operator/config/samples/ray-service.text-sumarizer.yaml
@@ -0,0 +1,79 @@
+apiVersion: ray.io/v1alpha1
+kind: RayService
+metadata:
+  name: text-summarizer
+spec:
+  serviceUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray Serve applications. Default value is 900.
+  deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for Ray dashboard agent. Default value is 300.
+  serveConfigV2: |
+    applications:
+      - name: text_summarizer
+        import_path: text_summarizer.text_summarizer:deployment
+        runtime_env:
+          working_dir: "https://github.com/ray-project/serve_config_examples/archive/refs/heads/master.zip"
+  rayClusterConfig:
+    rayVersion: '2.6.3' # Should match the Ray version in the image of the containers
+    ######################headGroupSpecs#################################
+    # Ray head pod template.
+    headGroupSpec:
+      # The `rayStartParams` are used to configure the `ray start` command.
+      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
+      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
+      rayStartParams:
+        dashboard-host: '0.0.0.0'
+      # Pod template
+      template:
+        spec:
+          containers:
+          - name: ray-head
+            image: rayproject/ray-ml:2.6.3
+            ports:
+            - containerPort: 6379
+              name: gcs
+            - containerPort: 8265
+              name: dashboard
+            - containerPort: 10001
+              name: client
+            - containerPort: 8000
+              name: serve
+            volumeMounts:
+              - mountPath: /tmp/ray
+                name: ray-logs
+            resources:
+              limits:
+                cpu: "2"
+                memory: "8G"
+              requests:
+                cpu: "2"
+                memory: "8G"
+          volumes:
+            - name: ray-logs
+              emptyDir: {}
+    workerGroupSpecs:
+    # The pod replicas in this group typed worker
+    - replicas: 1
+      minReplicas: 1
+      maxReplicas: 10
+      groupName: gpu-group
+      rayStartParams: {}
+      # Pod template
+      template:
+        spec:
+          containers:
+          - name: ray-worker
+            image: rayproject/ray-ml:2.6.3
+            resources:
+              limits:
+                cpu: 4
+                memory: "16G"
+                nvidia.com/gpu: 1
+              requests:
+                cpu: 3
+                memory: "12G"
+                nvidia.com/gpu: 1
+          # Please add the following taints to the GPU node.
+          tolerations:
+            - key: "ray.io/node-type"
+              operator: "Equal"
+              value: "worker"
+              effect: "NoSchedule"