Skip to content

Commit

Permalink
Documentation and example for running simple NLP service on kuberay (r…
Browse files Browse the repository at this point in the history
…ay-project#1340)

* add service yaml for nlp

* Documentation fixes

* Fix instructions

* Apply suggestions from code review

Co-authored-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Praveen <[email protected]>

* Fix tolerations comment

* review comments

* Update docs/guidance/stable-diffusion-rayservice.md

Signed-off-by: Kai-Hsun Chen <[email protected]>

---------

Signed-off-by: Praveen <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
  • Loading branch information
2 people authored and blublinsky committed Aug 25, 2023
1 parent 106490e commit 0d3d696
Show file tree
Hide file tree
Showing 4 changed files with 158 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/guidance/aws-eks-gpu-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Create a GPU node group for Ray GPU workers.
> **Note:** If you encounter permission issues with `kubectl`, follow "Step 2: Configure your computer to communicate with your cluster"
in the [AWS documentation](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#).

2. Please install the NVIDIA device plugin.
2. Please install the NVIDIA device plugin. Note: You don't need this if you used `BOTTLEROCKET_x86_64_NVIDIA` image in above step
* Install the DaemonSet for NVIDIA device plugin to run GPU enabled containers in your Amazon EKS cluster. You can refer to the [Amazon EKS optimized accelerated Amazon Linux AMIs](https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html#gpu-ami)
or [NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin) repository for more details.
* If the GPU nodes have taints, add `tolerations` to `nvidia-device-plugin.yml` to enable the DaemonSet to schedule Pods on the GPU nodes.
Expand Down
10 changes: 9 additions & 1 deletion docs/guidance/stable-diffusion-rayservice.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ kubectl apply -f ray-service.stable-diffusion.yaml

This RayService configuration contains some important settings:

* Its `tolerations` for workers match the taints on the GPU node group. Without the tolerations, worker Pods won't be scheduled on GPU nodes.
* The `tolerations` for workers allow them to be scheduled on nodes without any taints or on nodes with specific taints. However, workers will only be scheduled on GPU nodes because we set `nvidia.com/gpu: 1` in the Pod's resource configurations.
```yaml
# Please add the following taints to the GPU node.
tolerations:
Expand All @@ -34,6 +34,14 @@ This RayService configuration contains some important settings:

## Step 4: Forward the port of Serve

First get the service name from this command.

```sh
kubectl get services
```

Then, port forward to the serve.

```sh
kubectl port-forward svc/stable-diffusion-serve-svc 8000
```
Expand Down
69 changes: 69 additions & 0 deletions docs/guidance/text-summarizer-rayservice.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Serve a text summarizer using RayService

> **Note:** The Python files for the Ray Serve application and its client are in the [ray-project/serve_config_examples](https://github.com/ray-project/serve_config_examples) repo.
## Step 1: Create a Kubernetes cluster with GPUs

Follow [aws-eks-gpu-cluster.md](./aws-eks-gpu-cluster.md) or [gcp-gke-gpu-cluster.md](./gcp-gke-gpu-cluster.md) to create a Kubernetes cluster with 1 CPU node and 1 GPU node.

## Step 2: Install KubeRay operator

Follow [this document](../../helm-chart/kuberay-operator/README.md) to install the latest stable KubeRay operator via Helm repository.
Please note that the YAML file in this example uses `serveConfigV2`, which is supported starting from KubeRay v0.6.0.

## Step 3: Install a RayService

```sh
# path: ray-operator/config/samples/
kubectl apply -f ray-service.text-sumarizer.yaml
```

This RayService configuration contains some important settings:

* The `tolerations`` for workers allow them to be scheduled on nodes without any taints or on nodes with specific taints. However, workers will only be scheduled on GPU nodes because we set `nvidia.com/gpu: 1` in the Pod's resource configurations.
```yaml
# Please add the following taints to the GPU node.
tolerations:
- key: "ray.io/node-type"
operator: "Equal"
value: "worker"
effect: "NoSchedule"
```
## Step 4: Forward the port of Serve
First get the service name from this command.
```sh
kubectl get services
```

Then, port forward to the serve.

```sh
kubectl port-forward svc/text-summarizer-serve-svc 8000
```

Note that the RayService's Kubernetes service will be created after the Serve applications are ready and running. This process may take approximately 1 minute after all Pods in the RayCluster are running.

## Step 5: Send a request to the text_summarizer model

```sh
# Step 5.1: Download `text_summarizer_req.py`
curl -LO https://raw.githubusercontent.com/ray-project/serve_config_examples/master/text_summarizer/text_summarizer_req.py

# Step 5.2: Send a request to the Summarizer model.
python text_summarizer_req.py
# Check printed to console
```

## Step 6: Delete your service

```sh
# path: ray-operator/config/samples/
kubectl delete -f ray-service.text-sumarizer.yaml
```

## Step 7: Uninstall your kuberay operator

Follow [this document](../../helm-chart/kuberay-operator/README.md) to uninstall the latest stable KubeRay operator via Helm repository.
79 changes: 79 additions & 0 deletions ray-operator/config/samples/ray-service.text-sumarizer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
apiVersion: ray.io/v1alpha1
kind: RayService
metadata:
name: text-summarizer
spec:
serviceUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray Serve applications. Default value is 900.
deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for Ray dashboard agent. Default value is 300.
serveConfigV2: |
applications:
- name: text_summarizer
import_path: text_summarizer.text_summarizer:deployment
runtime_env:
working_dir: "https://github.com/ray-project/serve_config_examples/archive/refs/heads/master.zip"
rayClusterConfig:
rayVersion: '2.6.3' # Should match the Ray version in the image of the containers
######################headGroupSpecs#################################
# Ray head pod template.
headGroupSpec:
# The `rayStartParams` are used to configure the `ray start` command.
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
rayStartParams:
dashboard-host: '0.0.0.0'
# Pod template
template:
spec:
containers:
- name: ray-head
image: rayproject/ray-ml:2.6.3
ports:
- containerPort: 6379
name: gcs
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: serve
volumeMounts:
- mountPath: /tmp/ray
name: ray-logs
resources:
limits:
cpu: "2"
memory: "8G"
requests:
cpu: "2"
memory: "8G"
volumes:
- name: ray-logs
emptyDir: {}
workerGroupSpecs:
# The pod replicas in this group typed worker
- replicas: 1
minReplicas: 1
maxReplicas: 10
groupName: gpu-group
rayStartParams: {}
# Pod template
template:
spec:
containers:
- name: ray-worker
image: rayproject/ray-ml:2.6.3
resources:
limits:
cpu: 4
memory: "16G"
nvidia.com/gpu: 1
requests:
cpu: 3
memory: "12G"
nvidia.com/gpu: 1
# Please add the following taints to the GPU node.
tolerations:
- key: "ray.io/node-type"
operator: "Equal"
value: "worker"
effect: "NoSchedule"

0 comments on commit 0d3d696

Please sign in to comment.