Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update Triton model support #485

Merged
merged 9 commits into from
Mar 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions config/example-isvcs/example-triton-xgboost-isvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Copyright 2022 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: example-xgboost-mushroom-fil
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
model:
modelFormat:
name: xgboost
runtime: triton-2.x
storage:
key: localMinIO
path: xgboost/mushroom-fil
9 changes: 9 additions & 0 deletions config/runtimes/triton-2.x.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,15 @@ spec:
- name: tensorrt
version: "7" # 7.2.1
autoSelect: true
- name: sklearn
version: "0" # v0.23.1
autoSelect: false
- name: xgboost
version: "1" # v1.1.1
autoSelect: false
- name: lightgbm
version: "3" # v3.2.1
autoSelect: false

protocolVersions:
- grpc-v2
Expand Down
39 changes: 39 additions & 0 deletions docs/example-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ s3://modelmesh-example-models/
│ └── mnist.h5
├── lightgbm
│ └── mushroom.bst
│ └── mushroom-fil
│ ├── 1
│ │ └── model.txt
│ └── config.pbtxt
├── onnx
│ └── mnist.onnx
├── pytorch
Expand All @@ -45,6 +49,10 @@ s3://modelmesh-example-models/
│ └── variables.index
└── xgboost
└── mushroom.json
└── mushroom-fil
├── 1
│ └── xgboost.json
└── config.pbtxt
```

### Example Inference Requests
Expand Down Expand Up @@ -277,3 +285,34 @@ Response:
]
}
```

#### XGBoost (Triton FIL):

This is a sample inference request to an XGBoost model trained on a [mushroom dataset](https://archive.ics.uci.edu/ml/datasets/Mushroom) and served using the [FIL backend for Triton](https://github.com/triton-inference-server/fil_backend):

```shell
MODEL_NAME=example-xgboost-mushroom-fil
grpcurl \
-plaintext \
-proto fvt/proto/kfs_inference_v2.proto \
-d '{ "model_name": "'"${MODEL_NAME}"'", "inputs": [{ "name": "input__0", "shape": [1, 126], "datatype": "FP32", "contents": { "fp32_contents": [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0] }}]}' \
localhost:8033 \
inference.GRPCInferenceService.ModelInfer
```

Response:

```json
{
"modelName": "example-xgboost-mushroom-fil__isvc-ffe6a3f20b",
"modelVersion": "1",
"outputs": [
{
"name": "output__0",
"datatype": "FP32",
"shape": ["1"]
}
],
"rawOutputContents": ["B1xLPA=="]
}
```
22 changes: 11 additions & 11 deletions docs/model-formats/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,16 @@ By leveraging existing third-party model servers, we support a number of standar
- [TensorFlow](tensorflow.md)
- [XGBoost](xgboost.md)

| Model Type | Framework | Supported via ServingRuntime |
| ----------- | ---------------- | ---------------------------- |
| keras | TensorFlow | Triton (C++) |
| lightgbm | LightGBM | MLServer (python) |
| onnx | ONNX | Triton (C++), OVMS (C++) |
| openvino_ir | Intel OpenVINO\* | OVMS (C++) |
| pytorch | PyTorch | Triton (C++) |
| sklearn | scikit-learn | MLServer (python) |
| tensorflow | TensorFlow | Triton (C++) |
| xgboost | XGBoost | MLServer (python) |
| any | Custom | [Custom](../runtimes) (any) |
| Model Type | Framework | Supported via ServingRuntime |
| ----------- | ---------------- | ------------------------------- |
| keras | TensorFlow | Triton (C++) |
| lightgbm | LightGBM | MLServer (python), Triton (C++) |
| onnx | ONNX | Triton (C++), OVMS (C++) |
| openvino_ir | Intel OpenVINO\* | OVMS (C++) |
| pytorch | PyTorch | Triton (C++) |
| sklearn | scikit-learn | MLServer (python), Triton (C++) |
| tensorflow | TensorFlow | Triton (C++) |
| xgboost | XGBoost | MLServer (python), Triton (C++) |
| any | Custom | [Custom](../runtimes) (any) |

(\*)Many ML frameworks can have models converted to the OpenVINO IR format, such as Caffe, TensorFlow, MXNet, PaddlePaddle and ONNX, doc [here](https://docs.openvino.ai/latest/ovms_what_is_openvino_model_server.html).
31 changes: 30 additions & 1 deletion docs/model-formats/lightgbm.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,18 @@ The storage path can point directly to a serialized model

```
s3://modelmesh-example-models/
└── lightgbm/mushroom.bst
└── lightgbm
└── mushroom.bst
└── mushroom-fil
├── 1
│ └── model.txt
└── config.pbtxt
```

**InferenceService**

For MLServer:

```yaml
kind: InferenceService
metadata:
Expand All @@ -54,3 +61,25 @@ spec:
parameters:
bucket: modelmesh-example-models
```

For Triton:

```yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: lightgbm-example
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
model:
modelFormat:
name: lightgbm
runtime: triton-2.x
storage:
key: localMinIO
path: lightgbm/lightgbm-fil
parameters:
bucket: modelmesh-example-models
```
31 changes: 30 additions & 1 deletion docs/model-formats/xgboost.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,18 @@ The storage path can point directly to a serialized model

```
s3://modelmesh-example-models/
└── xgboost/mushroom.json
└── xgboost
└── mushroom.json
└── mushroom-fil
├── 1
│ └── xgboost.json
└── config.pbtxt
```

**InferenceService**

If using MLServer:

```yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
Expand All @@ -56,3 +63,25 @@ spec:
parameters:
bucket: modelmesh-example-models
```

For Triton:

```yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: xgboost-example
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
model:
modelFormat:
name: xgboost
runtime: triton-2.x
storage:
key: localMinIO
path: xgboost/mushroom-fil
parameters:
bucket: modelmesh-example-models
```
40 changes: 40 additions & 0 deletions fvt/inference.go
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,26 @@ func ExpectSuccessfulInference_lightgbmMushroom(predictorName string) {
Expect(math.Round(float64(inferResponse.Outputs[0].Contents.Fp64Contents[0])*10) / 10).To(BeEquivalentTo(0.0))
}

// LightGBM Mushroom via Triton
// COS path: fvt/lightgbm/mushroom-fil
func ExpectSuccessfulInference_lightgbmFILMushroom(predictorName string) {
// build the grpc inference call
inferInput := &inference.ModelInferRequest_InferInputTensor{
Name: "input__0",
Shape: []int64{1, 126},
Datatype: "FP32",
Contents: &inference.InferTensorContents{Fp32Contents: mushroomInputData},
}
inferRequest := &inference.ModelInferRequest{
ModelName: predictorName,
Inputs: []*inference.ModelInferRequest_InferInputTensor{inferInput},
}

inferResponse, err := FVTClientInstance.RunKfsInference(inferRequest)
Expect(err).ToNot(HaveOccurred())
Expect(inferResponse).ToNot(BeNil())
}

// XGBoost Mushroom
// COS path: fvt/xgboost/mushroom
func ExpectSuccessfulInference_xgboostMushroom(predictorName string) {
Expand All @@ -324,6 +344,26 @@ func ExpectSuccessfulInference_xgboostMushroom(predictorName string) {
Expect(math.Round(float64(inferResponse.Outputs[0].Contents.Fp32Contents[0])*10) / 10).To(BeEquivalentTo(0.0))
}

// XGBoost Mushroom via Triton
// COS path: fvt/xgboost/mushroom-fil
func ExpectSuccessfulInference_xgboostFILMushroom(predictorName string) {
// build the grpc inference call
inferInput := &inference.ModelInferRequest_InferInputTensor{
Name: "input__0",
Shape: []int64{1, 126},
Datatype: "FP32",
Contents: &inference.InferTensorContents{Fp32Contents: mushroomInputData},
}
inferRequest := &inference.ModelInferRequest{
ModelName: predictorName,
Inputs: []*inference.ModelInferRequest_InferInputTensor{inferInput},
}

inferResponse, err := FVTClientInstance.RunKfsInference(inferRequest)
Expect(err).ToNot(HaveOccurred())
Expect(inferResponse).ToNot(BeNil())
}

// Helpers

var mushroomInputData []float32 = []float32{1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0}
Expand Down
Loading
Loading