forked from ray-project/kuberay
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Documentation and example for running simple NLP service on kuberay (r…
…ay-project#1340) * add service yaml for nlp * Documentation fixes * Fix instructions * Apply suggestions from code review Co-authored-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Praveen <[email protected]> * Fix tolerations comment * review comments * Update docs/guidance/stable-diffusion-rayservice.md Signed-off-by: Kai-Hsun Chen <[email protected]> --------- Signed-off-by: Praveen <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]>
- Loading branch information
1 parent
106490e
commit 0d3d696
Showing
4 changed files
with
158 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Serve a text summarizer using RayService | ||
|
||
> **Note:** The Python files for the Ray Serve application and its client are in the [ray-project/serve_config_examples](https://github.com/ray-project/serve_config_examples) repo. | ||
## Step 1: Create a Kubernetes cluster with GPUs | ||
|
||
Follow [aws-eks-gpu-cluster.md](./aws-eks-gpu-cluster.md) or [gcp-gke-gpu-cluster.md](./gcp-gke-gpu-cluster.md) to create a Kubernetes cluster with 1 CPU node and 1 GPU node. | ||
|
||
## Step 2: Install KubeRay operator | ||
|
||
Follow [this document](../../helm-chart/kuberay-operator/README.md) to install the latest stable KubeRay operator via Helm repository. | ||
Please note that the YAML file in this example uses `serveConfigV2`, which is supported starting from KubeRay v0.6.0. | ||
|
||
## Step 3: Install a RayService | ||
|
||
```sh | ||
# path: ray-operator/config/samples/ | ||
kubectl apply -f ray-service.text-sumarizer.yaml | ||
``` | ||
|
||
This RayService configuration contains some important settings: | ||
|
||
* The `tolerations`` for workers allow them to be scheduled on nodes without any taints or on nodes with specific taints. However, workers will only be scheduled on GPU nodes because we set `nvidia.com/gpu: 1` in the Pod's resource configurations. | ||
```yaml | ||
# Please add the following taints to the GPU node. | ||
tolerations: | ||
- key: "ray.io/node-type" | ||
operator: "Equal" | ||
value: "worker" | ||
effect: "NoSchedule" | ||
``` | ||
## Step 4: Forward the port of Serve | ||
First get the service name from this command. | ||
```sh | ||
kubectl get services | ||
``` | ||
|
||
Then, port forward to the serve. | ||
|
||
```sh | ||
kubectl port-forward svc/text-summarizer-serve-svc 8000 | ||
``` | ||
|
||
Note that the RayService's Kubernetes service will be created after the Serve applications are ready and running. This process may take approximately 1 minute after all Pods in the RayCluster are running. | ||
|
||
## Step 5: Send a request to the text_summarizer model | ||
|
||
```sh | ||
# Step 5.1: Download `text_summarizer_req.py` | ||
curl -LO https://raw.githubusercontent.com/ray-project/serve_config_examples/master/text_summarizer/text_summarizer_req.py | ||
|
||
# Step 5.2: Send a request to the Summarizer model. | ||
python text_summarizer_req.py | ||
# Check printed to console | ||
``` | ||
|
||
## Step 6: Delete your service | ||
|
||
```sh | ||
# path: ray-operator/config/samples/ | ||
kubectl delete -f ray-service.text-sumarizer.yaml | ||
``` | ||
|
||
## Step 7: Uninstall your kuberay operator | ||
|
||
Follow [this document](../../helm-chart/kuberay-operator/README.md) to uninstall the latest stable KubeRay operator via Helm repository. |
79 changes: 79 additions & 0 deletions
79
ray-operator/config/samples/ray-service.text-sumarizer.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
apiVersion: ray.io/v1alpha1 | ||
kind: RayService | ||
metadata: | ||
name: text-summarizer | ||
spec: | ||
serviceUnhealthySecondThreshold: 900 # Config for the health check threshold for Ray Serve applications. Default value is 900. | ||
deploymentUnhealthySecondThreshold: 300 # Config for the health check threshold for Ray dashboard agent. Default value is 300. | ||
serveConfigV2: | | ||
applications: | ||
- name: text_summarizer | ||
import_path: text_summarizer.text_summarizer:deployment | ||
runtime_env: | ||
working_dir: "https://github.com/ray-project/serve_config_examples/archive/refs/heads/master.zip" | ||
rayClusterConfig: | ||
rayVersion: '2.6.3' # Should match the Ray version in the image of the containers | ||
######################headGroupSpecs################################# | ||
# Ray head pod template. | ||
headGroupSpec: | ||
# The `rayStartParams` are used to configure the `ray start` command. | ||
# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay. | ||
# See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`. | ||
rayStartParams: | ||
dashboard-host: '0.0.0.0' | ||
# Pod template | ||
template: | ||
spec: | ||
containers: | ||
- name: ray-head | ||
image: rayproject/ray-ml:2.6.3 | ||
ports: | ||
- containerPort: 6379 | ||
name: gcs | ||
- containerPort: 8265 | ||
name: dashboard | ||
- containerPort: 10001 | ||
name: client | ||
- containerPort: 8000 | ||
name: serve | ||
volumeMounts: | ||
- mountPath: /tmp/ray | ||
name: ray-logs | ||
resources: | ||
limits: | ||
cpu: "2" | ||
memory: "8G" | ||
requests: | ||
cpu: "2" | ||
memory: "8G" | ||
volumes: | ||
- name: ray-logs | ||
emptyDir: {} | ||
workerGroupSpecs: | ||
# The pod replicas in this group typed worker | ||
- replicas: 1 | ||
minReplicas: 1 | ||
maxReplicas: 10 | ||
groupName: gpu-group | ||
rayStartParams: {} | ||
# Pod template | ||
template: | ||
spec: | ||
containers: | ||
- name: ray-worker | ||
image: rayproject/ray-ml:2.6.3 | ||
resources: | ||
limits: | ||
cpu: 4 | ||
memory: "16G" | ||
nvidia.com/gpu: 1 | ||
requests: | ||
cpu: 3 | ||
memory: "12G" | ||
nvidia.com/gpu: 1 | ||
# Please add the following taints to the GPU node. | ||
tolerations: | ||
- key: "ray.io/node-type" | ||
operator: "Equal" | ||
value: "worker" | ||
effect: "NoSchedule" |