add offline batch inference connector blueprints (opensearch-project#…

…2768) (opensearch-project#2771) Signed-off-by: Xun Zhang <[email protected]> (cherry picked from commit 62b33fd) Co-authored-by: Xun Zhang <[email protected]>
mingshl · Jul 27, 2024 · e371f4b · e371f4b
1 parent 446a592
commit e371f4b
Show file tree

Hide file tree

Showing 2 changed files with 303 additions and 0 deletions.
diff --git a/docs/remote_inference_blueprints/batch_inference_openAI_connector_blueprint.md b/docs/remote_inference_blueprints/batch_inference_openAI_connector_blueprint.md
@@ -0,0 +1,148 @@
+### OpenAI connector blueprint example for batch inference:
+
+Read more details on https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/
+
+Integrate the OpenAI Batch API using the connector below with a new action type "batch_predict".
+For more details of the OpenAI Batch API, please refer to https://platform.openai.com/docs/guides/batch/overview.
+
+#### 1. Create your Model connector and Model group
+
+##### 1a. Register Model group
+```json
+POST /_plugins/_ml/model_groups/_register
+{
+  "name": "openAI_model_group",
+  "description": "Your openAI model group"
+}
+```
+This request response will return the `model_group_id`, note it down.
+Sample response:
+```json
+{
+  "model_group_id": "IMobmY8B8aiZvtEZeO_i",
+  "status": "CREATED"
+}
+```
+
+##### 1b. Create Connector
+```json
+POST /_plugins/_ml/connectors/_create
+{
+  "name": "OpenAI Embedding model",
+  "description": "OpenAI embedding model for testing offline batch",
+  "version": "1",
+  "protocol": "http",
+  "parameters": {
+    "model": "text-embedding-ada-002",
+    "input_file_id": "file-YbowBByiyVJN89oSZo2Enu9W",
+    "endpoint": "/v1/embeddings"
+  },
+  "credential": {
+    "openAI_key": "<your openAI key>"
+  },
+  "actions": [
+    {
+      "action_type": "predict",
+      "method": "POST",
+      "url": "https://api.openai.com/v1/embeddings",
+      "headers": {
+        "Authorization": "Bearer ${credential.openAI_key}"
+      },
+      "request_body": "{ \"input\": ${parameters.input}, \"model\": \"${parameters.model}\" }",
+      "pre_process_function": "connector.pre_process.openai.embedding",
+      "post_process_function": "connector.post_process.openai.embedding"
+    },
+    {
+      "action_type": "batch_predict",
+      "method": "POST",
+      "url": "https://api.openai.com/v1/batches",
+      "headers": {
+        "Authorization": "Bearer ${credential.openAI_key}"
+      },
+      "request_body": "{ \"input_file_id\": \"${parameters.input_file_id}\", \"endpoint\": \"${parameters.endpoint}\", \"completion_window\": \"24h\" }"
+    }
+  ]
+}
+```
+To create the file_id in the connector, please prepare your batch file and upload it to the OpenAI service through the file API. Please refer to this [Public doc](https://platform.openai.com/docs/api-reference/files)
+
+#### Sample response
+```json
+{
+  "connector_id": "XU5UiokBpXT9icfOM0vt"
+}
+```
+
+### 2. Register model to the model group and link the created connector:
+
+```json
+POST /_plugins/_ml/models/_register?deploy=true
+{
+    "name": "OpenAI model for realtime embedding and offline batch inference",
+    "function_name": "remote",
+    "model_group_id": "IMobmY8B8aiZvtEZeO_i",
+    "description": "OpenAI text embedding model",
+    "connector_id": "XU5UiokBpXT9icfOM0vt"
+}
+```
+Sample response:
+```json
+{
+  "task_id": "rMormY8B8aiZvtEZIO_j",
+  "status": "CREATED",
+  "model_id": "lyjxwZABNrAVdFa9zrcZ"
+}
+```
+### 3. Test offline batch inference using the connector
+
+```json
+POST /_plugins/_ml/models/lyjxwZABNrAVdFa9zrcZ/_batch_predict
+{
+  "parameters": {
+    "model": "text-embedding-ada-002"
+  }
+}
+```
+Sample response:
+```json
+{
+  "inference_results": [
+    {
+      "output": [
+        {
+          "name": "response",
+          "dataAsMap": {
+            "id": "batch_khFSJIzT0eev9PuxVDsIGxv6",
+            "object": "batch",
+            "endpoint": "/v1/embeddings",
+            "errors": null,
+            "input_file_id": "file-YbowBByiyVJN89oSZo2Enu9W",
+            "completion_window": "24h",
+            "status": "validating",
+            "output_file_id": null,
+            "error_file_id": null,
+            "created_at": 1722037257,
+            "in_progress_at": null,
+            "expires_at": 1722123657,
+            "finalizing_at": null,
+            "completed_at": null,
+            "failed_at": null,
+            "expired_at": null,
+            "cancelling_at": null,
+            "cancelled_at": null,
+            "request_counts": {
+              "total": 0,
+              "completed": 0,
+              "failed": 0
+            },
+            "metadata": null
+          }
+        }
+      ],
+      "status_code": 200
+    }
+  ]
+}
+```
+For the definition of each field in the result, please refer to https://platform.openai.com/docs/guides/batch. 
+Once the batch is complete, you can download the output by making a request directly against the OpenAI Files API via the "id" field in the output.
diff --git a/docs/remote_inference_blueprints/batch_inference_sagemaker_connector_blueprint.md b/docs/remote_inference_blueprints/batch_inference_sagemaker_connector_blueprint.md
@@ -0,0 +1,155 @@
+### Sagemaker connector blueprint example for batch inference:
+
+Read more details on https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/
+
+Integrate the SageMaker Batch Transform API using the connector below with a new action type "batch_predict". 
+For more details to use batch transform to run inference with Amazon SageMaker, please refer to https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html.
+
+#### 1. Create your Model connector and Model group
+
+##### 1a. Register Model group
+```json
+POST /_plugins/_ml/model_groups/_register
+{
+  "name": "sagemaker_model_group",
+  "description": "Your sagemaker model group"
+}
+```
+This request response will return the `model_group_id`, note it down.
+Sample response:
+```json
+{
+  "model_group_id": "IMobmY8B8aiZvtEZeO_i",
+  "status": "CREATED"
+}
+```
+
+##### 1b. Create Connector
+```json
+POST /_plugins/_ml/connectors/_create
+{
+  "name": "DJL Sagemaker Connector: all-MiniLM-L6-v2",
+  "version": "1",
+  "description": "The connector to sagemaker embedding model all-MiniLM-L6-v2",
+  "protocol": "aws_sigv4",
+  "credential": {
+    "access_key": "<your access_key>",
+    "secret_key": "<your secret_key>",
+    "session_token": "<your session_token>"
+  },
+  "parameters": {
+    "region": "us-east-1",
+    "service_name": "sagemaker",
+    "DataProcessing": {
+        "InputFilter": "$.content",
+        "JoinSource": "Input",
+        "OutputFilter": "$"
+    },
+    "ModelName": "DJL-Text-Embedding-Model-imageforjsonlines",
+    "TransformInput": { 
+      "ContentType": "application/json",
+      "DataSource": { 
+         "S3DataSource": { 
+            "S3DataType": "S3Prefix",
+            "S3Uri": "s3://offlinebatch/sagemaker_djl_batch_input.json"
+         }
+      },
+      "SplitType": "Line"
+    },
+    "TransformJobName": "SM-offline-batch-transform-07-12-13-30",
+    "TransformOutput": { 
+      "AssembleWith": "Line",
+      "Accept": "application/json",
+      "S3OutputPath": "s3://offlinebatch/output"
+   },
+   "TransformResources": { 
+      "InstanceCount": 1,
+      "InstanceType": "ml.c5.xlarge"
+   },
+   "BatchStrategy": "SingleRecord"
+  },
+  "actions": [
+    {
+      "action_type": "predict",
+      "method": "POST",
+      "headers": {
+        "content-type": "application/json"
+      },
+      "url": "https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/OpenSearch-sagemaker-060124023703/invocations",
+      "request_body": "${parameters.input}",
+      "pre_process_function": "connector.pre_process.default.embedding",
+      "post_process_function": "connector.post_process.default.embedding"
+    },
+    {
+        "action_type": "batch_predict",
+        "method": "POST",
+        "headers": {
+            "content-type": "application/json"
+        },
+        "url": "https://api.sagemaker.us-east-1.amazonaws.com/CreateTransformJob",
+        "request_body": "{ \"BatchStrategy\": \"${parameters.BatchStrategy}\", \"ModelName\": \"${parameters.ModelName}\", \"DataProcessing\" : ${parameters.DataProcessing}, \"TransformInput\": ${parameters.TransformInput}, \"TransformJobName\" : \"${parameters.TransformJobName}\", \"TransformOutput\" : ${parameters.TransformOutput}, \"TransformResources\" : ${parameters.TransformResources}}"
+    }
+  ]
+}
+```
+SageMaker supports data processing through a subset of the defined JSONPath operators, and supports Associating Inferences results with Input Records. 
+Please refer to this [AWS doc](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform-data-processing.html)
+
+#### Sample response
+```json
+{
+  "connector_id": "XU5UiokBpXT9icfOM0vt"
+}
+```
+
+### 2. Register model to the model group and link the created connector:
+
+```json
+POST /_plugins/_ml/models/_register?deploy=true
+{
+    "name": "SageMaker model for realtime embedding and offline batch inference",
+    "function_name": "remote",
+    "model_group_id": "IMobmY8B8aiZvtEZeO_i",
+    "description": "SageMaker hosted DJL model",
+    "connector_id": "XU5UiokBpXT9icfOM0vt"
+}
+```
+Sample response:
+```json
+{
+  "task_id": "rMormY8B8aiZvtEZIO_j",
+  "status": "CREATED",
+  "model_id": "lyjxwZABNrAVdFa9zrcZ"
+}
+```
+### 3. Test offline batch inference using the connector
+
+```json
+POST /_plugins/_ml/models/dBK3t5ABrxVhHgFYhg7Q/_batch_predict
+{
+  "parameters": {
+    "TransformJobName": "SM-offline-batch-transform-07-15-11-30"
+  }
+}
+```
+Sample response:
+```json
+{
+  "inference_results": [
+    {
+      "output": [
+        {
+          "name": "response",
+          "dataAsMap": {
+            "job_arn": "arn:aws:sagemaker:us-east-1:802041417063:transform-job/SM-offline-batch-transform"
+          }
+        }
+      ],
+      "status_code": 200
+    }
+  ]
+}
+```
+The "job_arn" is returned immediately from this request, and you can use this job_arn to check the job status 
+in the SageMaker service. Once the job is done, you can check your batch inference results in the S3 that is 
+specified in the "S3OutputPath" field in your connector.