diff --git a/README.md b/README.md index 606ac06e2f0..01dd24db70f 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@
Archived - + - [July, 2023] Self-Hosted **Llama-2 Chatbot** on Any Cloud: [**example**](./llm/llama-2/) - [April, 2023] [SkyPilot YAMLs](./llm/vicuna/) for finetuning & serving the [Vicuna LLM](https://lmsys.org/blog/2023-03-30-vicuna/) with a single command! @@ -164,7 +164,7 @@ Runnable examples: - [LocalGPT](./llm/localgpt) - [Falcon](./llm/falcon) - Add yours here & see more in [`llm/`](./llm)! -- Framework examples: [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [DeepSpeed](./examples/deepspeed-multinode/sky.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [Ray Train](examples/distributed_ray_train/ray_train.yaml), [NeMo](https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo.yaml), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), and [many more (`examples/`)](./examples). +- Framework examples: [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [DeepSpeed](./examples/deepspeed-multinode/sky.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [Ray Train](examples/distributed_ray_train/ray_train.yaml), [NeMo](https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo.yaml), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), [Cog](https://github.com/skypilot-org/skypilot/blob/master/examples/cog/), and [many more (`examples/`)](./examples). Follow updates: - [Twitter](https://twitter.com/skypilot_org) diff --git a/docs/source/index.rst b/docs/source/index.rst index fbf03b3f552..3ad6d158267 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -85,7 +85,7 @@ Runnable examples: * `Falcon `_ * Add yours here & see more in `llm/ `_! -* Framework examples: `PyTorch DDP `_, `DeepSpeed `_, `JAX/Flax on TPU `_, `Stable Diffusion `_, `Detectron2 `_, `Distributed `_ `TensorFlow `_, `NeMo `_, `programmatic grid search `_, `Docker `_, and `many more `_. +* Framework examples: `PyTorch DDP `_, `DeepSpeed `_, `JAX/Flax on TPU `_, `Stable Diffusion `_, `Detectron2 `_, `Distributed `_ `TensorFlow `_, `NeMo `_, `programmatic grid search `_, `Docker `_, `Cog `_, and `many more `_. Follow updates: diff --git a/examples/cog/README.md b/examples/cog/README.md new file mode 100644 index 00000000000..4fa4890420f --- /dev/null +++ b/examples/cog/README.md @@ -0,0 +1,35 @@ +# Example: Cog + SkyPilot + +Use SkyPilot to self-host any Cog-packaged projects. + +This is the "Blur" example from https://github.com/replicate/cog-examples/blob/main/blur/README.md + +## Serve using a single instance +```console +sky launch -c cog ./sky.yaml + +IP=$(sky status --ip cog) + +curl http://$IP:5000/predictions -X POST \ + -H 'Content-Type: application/json' \ + -d '{"input": {"image": "https://blog.skypilot.co/introducing-sky-serve/images/sky-serve-thumbnail.png"}}' \ + | jq -r '.output | split(",")[1]' | base64 --decode > output.png +``` + +## Scale up the deployment using SkyServe +We can use SkyServe (`sky serve`) to scale up the deployment to multiple instances, while enjoying load balancing, autoscaling, and other [SkyServe features](https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html). +```console +sky serve up -n cog ./sky.yaml +``` + +Notice the only change is from `sky launch` to `sky serve up`. The same YAML can be used without changes. + +After the service is launched, access the deployment with the following: +```console +ENDPOINT=$(sky serve status --endpoint cog) + +curl -L http://$ENDPOINT/predictions -X POST \ + -H 'Content-Type: application/json' \ + -d '{"input": {"image": "https://blog.skypilot.co/introducing-sky-serve/images/sky-serve-thumbnail.png"}}' \ + | jq -r '.output | split(",")[1]' | base64 --decode > output.png +``` diff --git a/examples/cog/cog.yaml b/examples/cog/cog.yaml new file mode 100644 index 00000000000..38de2f9f940 --- /dev/null +++ b/examples/cog/cog.yaml @@ -0,0 +1,8 @@ +build: + python_version: "3.8" + python_packages: + - "pillow==8.2.0" + system_packages: + - "libpng-dev" + - "libjpeg-dev" +predict: "predict.py:Predictor" diff --git a/examples/cog/predict.py b/examples/cog/predict.py new file mode 100644 index 00000000000..21637d26b3a --- /dev/null +++ b/examples/cog/predict.py @@ -0,0 +1,21 @@ +import tempfile + +import cog +from PIL import Image +from PIL import ImageFilter + + +class Predictor(cog.BasePredictor): + + def predict( + self, + image: cog.Path = cog.Input(description='Input image'), + blur: float = cog.Input(description='Blur radius', default=5), + ) -> cog.Path: + if blur == 0: + return input + im = Image.open(str(image)) + im = im.filter(ImageFilter.BoxBlur(blur)) + out_path = cog.Path(tempfile.mkdtemp()) / 'out.png' + im.save(str(out_path)) + return out_path diff --git a/examples/cog/sky.yaml b/examples/cog/sky.yaml new file mode 100644 index 00000000000..06f8e3656eb --- /dev/null +++ b/examples/cog/sky.yaml @@ -0,0 +1,39 @@ +# Example: Cog + SkyPilot. +# +# This is the "Blur" example from https://github.com/replicate/cog-examples/blob/main/blur/README.md +# +# Usage (1 serving instance): +# +# sky launch -c cog ./sky.yaml +# +# IP=$(sky status --ip cog) +# curl http://$IP:5000/predictions -X POST \ +# -H 'Content-Type: application/json' \ +# -d '{"input": {"image": "https://blog.skypilot.co/introducing-sky-serve/images/sky-serve-thumbnail.png"}}' \ +# | jq -r '.output | split(",")[1]' | base64 --decode > output.png +# +# Usage (SkyServe): See README.md + +service: + readiness_probe: + path: /predictions + post_data: + input: {"image": "https://blog.skypilot.co/introducing-sky-serve/images/sky-serve-thumbnail.png"} + replicas: 2 + +resources: + accelerators: {L4, T4, A10G} + ports: + - 5000 + +workdir: . + +setup: | + set -e + sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)" + sudo chmod +x /usr/local/bin/cog + + cog build -t my-model + +run: | + docker run -d -p 5000:5000 --gpus all my-model diff --git a/sky/backends/cloud_vm_ray_backend.py b/sky/backends/cloud_vm_ray_backend.py index fc5d3d34b56..5054b2288b9 100644 --- a/sky/backends/cloud_vm_ray_backend.py +++ b/sky/backends/cloud_vm_ray_backend.py @@ -287,7 +287,7 @@ def get_or_fail(futures, pg) -> List[int]: sys.stdout.flush() sys.stderr.flush() return returncodes - + run_fn = None futures = [] """), @@ -3273,7 +3273,7 @@ def _execute( # Handle multiple resources exec case. task_copy.set_resources(valid_resource) if len(task.resources) > 1: - logger.info('Multiple resources are specified' + logger.info('Multiple resources are specified ' f'for the task, using: {valid_resource}') task_copy.best_resources = None resources_str = backend_utils.get_task_resources_str(task_copy)