Skip to content

Commit

Permalink
Update ingress gateway access instruction and add KFServing repo badg…
Browse files Browse the repository at this point in the history
…es (kubeflow#1008)

* Add report cards

* Add shields.io badges

* Fix badges

* Update ingress access instruction

* Update min requirement

* Add link for minikube

* Update ingress access instruction for examples

* Update link

* Add istio link
  • Loading branch information
yuzisun authored Aug 11, 2020
1 parent bf7b559 commit 9b3ef3f
Show file tree
Hide file tree
Showing 19 changed files with 147 additions and 111 deletions.
98 changes: 78 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
# KFServing
[![go.dev reference](https://img.shields.io/badge/go.dev-reference-007d9c?logo=go&logoColor=white)](https://pkg.go.dev/github.com/kubeflow/kfserving)
[![Coverage Status](https://coveralls.io/repos/github/kubeflow/kfserving/badge.svg?branch=master)](https://coveralls.io/github/kubeflow/kfserving?branch=master)
[![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/kfserving)](https://goreportcard.com/report/github.com/kubeflow/kfserving)
[![Releases](https://img.shields.io/github/release-pre/kubeflow/kfserving.svg?sort=semver)](https://github.com/kubeflow/kfserving/releases)
[![LICENSE](https://img.shields.io/github/license/kubeflow/kfserving.svg)](https://github.com/kubeflow/kfserving/blob/master/LICENSE)
[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://kubeflow.slack.com/join/shared_invite/zt-cpr020z4-PfcAue_2nw67~iIDy7maAQ)

KFServing provides a Kubernetes [Custom Resource Definition](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.

It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability. KFServing is being [used across various organizations.](./ADOPTERS.md)
Expand Down Expand Up @@ -62,29 +69,29 @@ If you are using Kubeflow dashboard or [profile controller](https://www.kubeflow
#### Install KFServing in 5 Minutes (On your local machine)

Make sure you have
[kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-on-linux),
[helm 3](https://helm.sh/docs/intro/install) installed before you start.(2 mins for setup)
1) If you do not have an existing kubernetes cluster you can create a quick kubernetes local cluster with [kind](https://github.com/kubernetes-sigs/kind#installation-and-usage).(this takes 30s)
[kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-on-linux) installed.

1) If you do not have an existing kubernetes cluster,
you can create a quick kubernetes local cluster with [kind](https://github.com/kubernetes-sigs/kind#installation-and-usage).

Note that the minimal requirement for running KFServing is 4 cpus and 8Gi memory,
so you need to change the [docker resource setting](https://docs.docker.com/docker-for-mac/#advanced) to use 4 cpus and 8Gi memory.
```bash
kind create cluster
```
alternatively you can use [Minikube](https://kubernetes.io/docs/setup/learning-environment/minikube)
```bash
minikube start --cpus 4 --memory 8192
```

2) Install Istio lean version, Knative Serving, KFServing all in one.(this takes 30s)
```bash
./hack/quick_install.sh
```
#### Ingress Setup and Monitoring Stack
- [Configure Custom Ingress Gateway](https://knative.dev/docs/serving/setting-up-custom-ingress-gateway/)
- In addition you need to update [KFServing configmap](config/default/configmap/inferenceservice.yaml) to use the custom ingress gateway.
- [Configure HTTPS Connection](https://knative.dev/docs/serving/using-a-tls-cert/)
- [Configure Custom Domain](https://knative.dev/docs/serving/using-a-custom-domain/)
- [Metrics](https://knative.dev/docs/serving/accessing-metrics/)
- [Tracing](https://knative.dev/docs/serving/accessing-traces/)
- [Logging](https://knative.dev/docs/serving/accessing-logs/)
- [Dashboard for ServiceMesh](https://istio.io/latest/docs/tasks/observability/kiali/)

### Test KFServing Installation

1) To check if KFServing Controller is installed correctly, please run the following command
#### Check KFServing controller installation
```shell
kubectl get po -n kfserving-system
NAME READY STATUS RESTARTS AGE
Expand All @@ -93,28 +100,65 @@ kfserving-controller-manager-0 2/2 Running 2 13m

Please refer to our [troubleshooting section](docs/DEVELOPER_GUIDE.md#troubleshooting) for recommendations and tips for issues with installation.

2) Wait all pods to be ready then launch KFServing `InferenceService`.(wait 1 min for everything to be ready and 40s to
launch the `InferenceService`)
#### Create KFServing test inference service
```bash
kubectl create namespace kfserving-test
kubectl apply -f docs/samples/sklearn/sklearn.yaml -n kfserving-test
```
3) Check KFServing `InferenceService` status.
#### Check KFServing `InferenceService` status.
```bash
kubectl get inferenceservices sklearn-iris -n kfserving-test
NAME URL READY DEFAULT TRAFFIC CANARY TRAFFIC AGE
sklearn-iris http://sklearn-iris.kfserving-test.example.com/v1/models/sklearn-iris True 100 109s
```
4) Curl the `InferenceService`

#### Determine the ingress IP and ports
Execute the following command to determine if your kubernetes cluster is running in an environment that supports external load balancers
```bash
$ kubectl get svc istio-ingressgateway -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 172.21.109.129 130.211.10.121 ... 17h
```
If the EXTERNAL-IP value is set, your environment has an external load balancer that you can use for the ingress gateway.

```bash
export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
```

If the EXTERNAL-IP value is none (or perpetually pending), your environment does not provide an external load balancer for the ingress gateway. In this case, you can access the gateway using the service’s node port.
```bash
# GKE
export INGRESS_HOST=worker-node-address
# Minikube
export INGRESS_HOST=$(minikube ip)
# Other environment(On Prem)
export INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}')

export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
```

Alternatively you can do `Port Forward` for testing purpose
```bash
INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80

# start another terminal
export INGRESS_HOST=localhost
export INGRESS_PORT=8080
```

#### Curl the `InferenceService`
Curl from ingress gateway
```bash
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kfserving-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://localhost:8080/v1/models/sklearn-iris:predict -d @./docs/samples/sklearn/iris-input.json
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:{INGRESS_PORT}/v1/models/sklearn-iris:predict -d @./docs/samples/sklearn/iris-input.json
```
5) Run Performance Test
Curl from local cluster gateway
```bash
curl -v http://sklearn-iris.kfserving-test/v1/models/sklearn-iris:predict -d @./docs/samples/sklearn/iris-input.json
```

#### Run Performance Test
```bash
kubectl create -f docs/samples/sklearn/perf.yaml -n kfserving-test
# wait the job to be done and check the log
Expand All @@ -129,6 +173,19 @@ Status Codes [code:count] 200:30000
Error Set:
```

### Setup Ingress Gateway
If the default ingress gateway setup does not fit your need, you can choose to setup a custom ingress gateway
- [Configure Custom Ingress Gateway](https://knative.dev/docs/serving/setting-up-custom-ingress-gateway/)
- In addition you need to update [KFServing configmap](config/default/configmap/inferenceservice.yaml) to use the custom ingress gateway.
- [Configure Custom Domain](https://knative.dev/docs/serving/using-a-custom-domain/)
- [Configure HTTPS Connection](https://knative.dev/docs/serving/using-a-tls-cert/)

### Setup Monitoring
- [Metrics](https://knative.dev/docs/serving/accessing-metrics/)
- [Tracing](https://knative.dev/docs/serving/accessing-traces/)
- [Logging](https://knative.dev/docs/serving/accessing-logs/)
- [Dashboard for ServiceMesh](https://istio.io/latest/docs/tasks/observability/kiali/)

### Use KFServing SDK
* Install the SDK
```
Expand All @@ -141,6 +198,7 @@ Error Set:
### KFServing Features and Examples
[KFServing Features and Examples](./docs/samples/README.md)

### KFServing Presentations and Demoes
[KFServing Presentations and Demoes](./docs/PRESENTATIONS.md)

### KFServing Roadmap
Expand Down
19 changes: 10 additions & 9 deletions docs/samples/autoscaling/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
# Autoscale InferenceService with your inference workload
## Setup
1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
2. Your cluster's Istio Ingress gateway must be network accessible.
2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).
3. [Metrics installation](https://knative.dev/docs/serving/installing-logging-metrics-traces) for viewing scaling graphs (optional).
4. The [hey](https://github.com/rakyll/hey) load generator installed (go get -u github.com/rakyll/hey).

Expand All @@ -41,14 +41,14 @@ $ inferenceservice.serving.kubeflow.org/flowers-sample configured
```

### Load InferenceService with concurrent requests
The first step is to [determine the ingress IP and ports](../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`

Send traffic in 30 seconds spurts maintaining 5 in-flight requests.
```
MODEL_NAME=flowers-sample
INPUT_PATH=../tensorflow/input.json
INGRESS_GATEWAY=istio-ingressgateway
CLUSTER_IP=$(kubectl -n istio-system get service $INGRESS_GATEWAY -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
HOST=$(kubectl get inferenceservice $MODEL_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
hey -z 30s -c 5 -m POST -host ${HOST} -D $INPUT_PATH http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict
hey -z 30s -c 5 -m POST -host ${HOST} -D $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
```
Expected Output
```shell
Expand Down Expand Up @@ -134,14 +134,14 @@ $ inferenceservice.serving.kubeflow.org/flowers-sample configured
```

### Load InferenceService with target QPS
The first step is to [determine the ingress IP and ports](../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`

Send 30 seconds of traffic maintaining 50 qps.
```bash
MODEL_NAME=flowers-sample
INPUT_PATH=../tensorflow/input.json
INGRESS_GATEWAY=istio-ingressgateway
CLUSTER_IP=$(kubectl -n istio-system get service $INGRESS_GATEWAY -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
HOST=$(kubectl get inferenceservice $MODEL_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
hey -z 30s -q 50 -m POST -host ${HOST} -D $INPUT_PATH http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict
hey -z 30s -q 50 -m POST -host ${HOST} -D $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
```

```shell
Expand Down Expand Up @@ -233,13 +233,14 @@ kubectl apply -f autoscale_gpu.yaml
```

### Load InferenceService with concurrent requests
The first step is to [determine the ingress IP and ports](../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`

Send 30 seconds of traffic maintaining 5 in-flight requests.
```
MODEL_NAME=flowers-sample-gpu
INPUT_PATH=../tensorflow/input.json
CLUSTER_IP=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
HOST=$(kubectl get inferenceservice $MODEL_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
hey -z 30s -c 5 -m POST -host ${HOST} -D $INPUT_PATH http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict
hey -z 30s -c 5 -m POST -host ${HOST} -D $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
```
Expected output
```shell
Expand Down
5 changes: 2 additions & 3 deletions docs/samples/batcher/basic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,13 @@ kubectl create -f pytorch-batcher.yaml
```

We can now send requests to the pytorch model using hey.
The first step is to [determine the ingress IP and ports](../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`

```
MODEL_NAME=pytorch-cifar10
INPUT_PATH=@./input.json
INGRESS_GATEWAY=istio-ingressgateway
CLUSTER_IP=$(kubectl -n istio-system get service $INGRESS_GATEWAY -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice pytorch-cifar10 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
hey -z 10s -c 5 -m POST -host "${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -D ./input.json "http://$CLUSTER_IP/v1/models/$MODEL_NAME:predict"
hey -z 10s -c 5 -m POST -host "${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -D ./input.json "http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict"
```

The request will go to the batcher container first, and then the batcher container will do batching and send the batching request to the predictor container.
Expand Down
9 changes: 4 additions & 5 deletions docs/samples/bentoml/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ workflow, with DevOps best practices baked in.
Before starting this guide, make sure you have the following:

* Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
* Your cluster's Istio Ingress gateway must be network accessible.
* Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).
* Docker and Docker hub must be properly configured on your local system
* Python 3.6 or above
* Install required packages `bentoml` and `scikit-learn` on your local system:
Expand Down Expand Up @@ -158,18 +158,17 @@ kubectl apply -f bentoml.yaml
```
### Run prediction
The first step is to [determine the ingress IP and ports](../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`
```shell
MODEL_NAME=iris-classifier
INGRESS_GATEWAY=istio-ingressgateway
CLUSTER_IP=$(kubectl -n istio-system get service $INGRESS_GATEWAY -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
SERVICE_HOSTNAME=$(kubectl get route ${MODEL_NAME}-predictor-default -o jsonpath='{.status.url}' | cut -d "/" -f 3)
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" \
--header "Content-Type: application/json" \
--request POST \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
http://$CLUSTER_IP/predict
http://${INGRESS_HOST}:${INGRESS_PORT}/predict
```
### Delete deployment
Expand Down
7 changes: 3 additions & 4 deletions docs/samples/custom/hello-world/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Setup

1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
2. Your cluster's Istio Ingress gateway must be network accessible.
2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).

## Build and push the sample Docker Image

Expand Down Expand Up @@ -36,14 +36,13 @@ $ inferenceservice.serving.kubeflow.org/custom-sample created
```

## Run a prediction
The first step is to [determine the ingress IP and ports](../../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`

```
MODEL_NAME=custom-sample
INGRESS_GATEWAY=istio-ingressgateway
CLUSTER_IP=$(kubectl -n istio-system get service $INGRESS_GATEWAY -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://$CLUSTER_IP/v1/models/${MODEL_NAME}:predict
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict
```

Expected Output
Expand Down
7 changes: 3 additions & 4 deletions docs/samples/custom/kfserving-custom-model/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Follow the instructions in the notebook to deploy the InferenseService with the
### Setup

1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
2. Your cluster's Istio Ingress gateway must be network accessible.
2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).

### Build and push the sample Docker Image

Expand Down Expand Up @@ -62,15 +62,14 @@ $ inferenceservice.serving.kubeflow.org/kfserving-custom-model created
```

### Run a prediction
The first step is to [determine the ingress IP and ports](../../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`

```
MODEL_NAME=kfserving-custom-model
INPUT_PATH=@./input.json
INGRESS_GATEWAY=istio-ingressgateway
CLUSTER_IP=$(kubectl -n istio-system get service $INGRESS_GATEWAY -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${CLUSTER_IP}/v1/models/${MODEL_NAME}:predict -d $INPUT_PATH
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/${MODEL_NAME}:predict -d $INPUT_PATH
```

Expected Output:
Expand Down
7 changes: 3 additions & 4 deletions docs/samples/custom/prebuilt-image/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Setup

1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
2. Your cluster's Istio Ingress gateway must be network accessible.
2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).

## Create the InferenceService

Expand All @@ -20,15 +20,14 @@ inferenceservice.serving.kubeflow.org/custom-prebuilt-image
```

## Run a prediction
The first step is to [determine the ingress IP and ports](../../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`

This example uses the [codait/max-object-detector](https://github.com/IBM/MAX-Object-Detector) image. The Max Object Detector api server expects a POST request to the `/model/predict` endpoint that includes an `image` multipart/form-data and an optional `threshold` query string.

```
MODEL_NAME=custom-prebuilt-image
INGRESS_GATEWAY=istio-ingressgateway
CLUSTER_IP=$(kubectl -n istio-system get service $INGRESS_GATEWAY -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v -F "[email protected]" http://${CLUSTER_IP}/model/predict -H "Host: ${SERVICE_HOSTNAME}"
curl -v -F "[email protected]" http://${INGRESS_HOST}:${INGRESS_PORT}/model/predict -H "Host: ${SERVICE_HOSTNAME}"
```

Expected output
Expand Down
Loading

0 comments on commit 9b3ef3f

Please sign in to comment.