-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Connectors and ML updates for 2.9 #4554
Conversation
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for some comments.
Signed-off-by: Naarcha-AWS <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Naarcha-AWS A few more changes. Let me know if I reviewed what you needed or if there's other content that needs my review.
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
|
||
### Adding trusted endpoints | ||
|
||
To configure connectors in OpenSearch, add the trusted endpoints to your cluster settings using the `plugins.ml_commons.trusted_connector_endpoints_regex` setting, which supports Java regex expressions, as shown in the following example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this to settings page
"^https://runtime\\.sagemaker\\..*\\.amazonaws\\.com/.*$", | ||
"^https://api\\.openai\\.com/.*$", | ||
"^https://api\\.cohere\\.ai/.*$", | ||
"^https://bedrock\\..*\\.amazonaws.com/.*$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this?
|
||
### Enabling ML nodes | ||
|
||
Most connectors require the use of dedicated ML nodes. To make sure you have ML nodes enabled, update the following cluster settings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most connectors require the use of dedicated ML nodes
-> By default, connectors require the use of dedicated ML nodes
. Actually this is the default setting: "plugins.ml_commons.only_run_on_ml_node": true
For remote connector, it consumes much less resource. So should be ok if user prefer to run on data node. If they don't have dedicate ML node, and prefer to run on data node, they can set
PUT /_cluster/settings
{
"persistent": {
"plugins.ml_commons.only_run_on_ml_node": false
}
}
| `description` | String | A description of the connector. | | ||
| `version` | Integer | The version of the connector. | | ||
| `protocol` | String | The protocol for the connection. For AWS services such as Amazon SageMaker and Amazon Bedrock, use `aws_sigv4`. For all other services, use `http`. | | ||
| `parameter` | JSON array | The default connector parameters, including `endpoint` and `model`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameter
-> parameters
Type is : Map<String, ?>
including
endpointand
model.
-> for example
endpointand
model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From security team , we should call out all parameters in this block will be overridable in predict request. User can provide parameter with same in predict request to override the default parameter value defined in connector.
| `version` | Integer | The version of the connector. | | ||
| `protocol` | String | The protocol for the connection. For AWS services such as Amazon SageMaker and Amazon Bedrock, use `aws_sigv4`. For all other services, use `http`. | | ||
| `parameter` | JSON array | The default connector parameters, including `endpoint` and `model`. | ||
| `credential` | String | Defines any credential variables required to connect to your chosen endpoint. ML Commons uses **AES/GCM/NoPadding** symmetric encryption with a key length of 32 bytes. When a connection cluster first starts, the key persists in OpenSearch. Therefore, you do not need to manually encrypt the key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type: Map<string, string>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ML Commons uses **AES/GCM/NoPadding** symmetric encryption with a key length of 32 bytes. When a connection cluster first starts, the key persists in OpenSearch. Therefore, you do not need to manually encrypt the key.
->
ML Commons uses **AES/GCM/NoPadding** symmetric encryption to encrypt your credential. When a connection cluster first starts, ml-commons will create a random 32 bytes key and persist in OpenSearch system index. Therefore, you do not need to manually set the encryption key.
| `protocol` | String | The protocol for the connection. For AWS services such as Amazon SageMaker and Amazon Bedrock, use `aws_sigv4`. For all other services, use `http`. | | ||
| `parameter` | JSON array | The default connector parameters, including `endpoint` and `model`. | ||
| `credential` | String | Defines any credential variables required to connect to your chosen endpoint. ML Commons uses **AES/GCM/NoPadding** symmetric encryption with a key length of 32 bytes. When a connection cluster first starts, the key persists in OpenSearch. Therefore, you do not need to manually encrypt the key. | ||
| `action` | JSON array | Tells the connector what actions to run after a connection to ML Commons has been established. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tells the connector what actions to run after a connection to ML Commons has been established.
->
Define what actions can run within this connector.
`action_type` | String | Required. Sets the ML Commons API operation to use upon connection. As of OpenSearch 2.9, only `predict` is supported. | ||
`method` | String | Required. Defines the HTTP method for the API call. Supports `POST` and `GET`. | ||
`url` | String | Required. Sets the connection endpoint at which the action takes place. This must match the regex expression for the connection used when [adding trusted endpoints](#adding-trusted-endpoints). | ||
`headers` | String | Sets the headers used inside the request or response body. Default is `application/json`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type: Map<String, String>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is application/json
->
Default "Content-Type" is "application/json"
|
||
### Standalone connector | ||
|
||
The connector creation API, `/_plugins/_ml/connectors/_create`, creates connections to third-party ML tools. Using the `endpoint` parameter, you can connect ML Commons to any supported ML tool using its specific API endpoint. For example, to connect to a ChatGPT completion model, you can connect using the `api.openai.com`, as shown in the following example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, to connect to a ChatGPT completion model,
->
For example, to connect to a ChatGPT chat model,
``` | ||
{% include copy-curl.html %} | ||
|
||
If successful, the connector API responds with a `connector_id` and `status` for the connection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remote this
and
status
- You can use `model_group_id` to register a model version to an existing model group. | ||
- If you do not use `model_group_id`, ML Commons creates a model with a new model group. | ||
|
||
The following example registers a model named `openAI-GPT-3.5 completions`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following example registers a model named
openAI-GPT-3.5 completions
:
->
The following example registers a model named
openAI-gpt-3.5-turbo
:
{ | ||
"name": "openAI-gpt-3.5-turbo", | ||
"function_name": "remote", | ||
"model_group_id": "wlcnb4kBJ1eYAeTMHlV6", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest add the example for creating this model group
POST /_plugins/_ml/model_groups/_register
{
"name": "remote_model_group",
"description": "This is an example description"
}
Sample response
{
"model_group_id": "wlcnb4kBJ1eYAeTMHlV6",
"status": "CREATED"
}
POST /_plugins/_ml/models/cleMb4kBJ1eYAeTMFFg4/_predict | ||
{ | ||
"parameters": { | ||
"model": "gpt-3.5-turbo", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this line, as model already defined in connector
|
||
```json | ||
POST /_plugins/_ml/models/_register | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not chat model example. Let's keep consistent with the standalone connector.
} | ||
``` | ||
|
||
After creating the connector, you can retrieve the `task_id`, deploy the model, and use the Predict API, similar to a standalone connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After creating the connector, you can retrieve the
task_id
, deploy the model, and use the Predict API, similar to a standalone connector.
->
After creating the connector, you can use the connector id to register model, deploy and predict.
The `paramaters` section requires the following options when using `aws-sigv4` authentication: | ||
|
||
- `region`: The AWS Region in which the AWS instance is located. | ||
- `service_name`: The name of the AWS service for the connector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As all items defined in parameters
will be overridable and visible (user can see parameters with get connector API). If user don't want this, they can move these two reserved parameters for aws_sigv4
to credential
.
- `secret_key`: Required. Provides the secret key for the AWS instance. | ||
- `session_token`: Optional. Provides a temporary set of credentials for the AWS instance. | ||
|
||
The `paramaters` section requires the following options when using `aws-sigv4` authentication: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws-sigv4
->
aws_sigv4
Please also check all other places
@@ -0,0 +1,461 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add some security warning for remote connector. As user need to configure credential in connector, for example AWS credential, openAI API key, user should always use security enabled cluster. Otherwise their credentials will not be protected which is risky.
@natebower any mention of "SageMaker" should be "Amazon SageMaker". I saw some instances of "AWS SageMaker" that needs to be corrected. The same for ChatGPT. I believe we our standard is to use the full name, "OpenAI ChatGPT". |
@dylan-tong-aws Re: "Amazon SageMaker", that's correct, thanks for the callout. @Naarcha-AWS Can you please global find and replace? Re: ChatGPT, we don't actually have a standard for this, but given that OpenAI simply uses "ChatGPT", I'd prefer that we follow suit. Thanks! |
* Add Connectors and ML updates for 2.9 Signed-off-by: Naarcha-AWS <[email protected]> * Fix code block Signed-off-by: Naarcha-AWS <[email protected]> * Add Connectors and ML updates for 2.9 Signed-off-by: Naarcha-AWS <[email protected]> * Fix code block Signed-off-by: Naarcha-AWS <[email protected]> * Add connector settings and examples Signed-off-by: Naarcha-AWS <[email protected]> * Add GA warning Signed-off-by: Naarcha-AWS <[email protected]> * Add final experimental warning Signed-off-by: Naarcha-AWS <[email protected]> * Address tech review. Fix typos Signed-off-by: Naarcha-AWS <[email protected]> * Fix bad link. Add next steps section Signed-off-by: Naarcha-AWS <[email protected]> * Fix typo Signed-off-by: Naarcha-AWS <[email protected]> * Update cluster-settings.md Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: kolchfa-aws <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/connectors.md Signed-off-by: Naarcha-AWS <[email protected]> * Change cluster values for boolean. Fix typo. Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Fix cluser settings Signed-off-by: Naarcha-AWS <[email protected]> * Add missing config options. More technical feedback. Signed-off-by: Naarcha-AWS <[email protected]> * Adjust cluster setting description. Signed-off-by: Naarcha-AWS <[email protected]> * Add updated ChatGPT example Signed-off-by: Naarcha-AWS <[email protected]> * Add info and example for internal connector. Signed-off-by: Naarcha-AWS <[email protected]> * One last adjustment. Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Fix dead link Signed-off-by: Naarcha-AWS <[email protected]> * Fix one last comment. Signed-off-by: Naarcha-AWS <[email protected]> * change ordered list to numbered. Signed-off-by: Naarcha-AWS <[email protected]> --------- Signed-off-by: Naarcha-AWS <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
* Add Connectors and ML updates for 2.9 Signed-off-by: Naarcha-AWS <[email protected]> * Fix code block Signed-off-by: Naarcha-AWS <[email protected]> * Add Connectors and ML updates for 2.9 Signed-off-by: Naarcha-AWS <[email protected]> * Fix code block Signed-off-by: Naarcha-AWS <[email protected]> * Add connector settings and examples Signed-off-by: Naarcha-AWS <[email protected]> * Add GA warning Signed-off-by: Naarcha-AWS <[email protected]> * Add final experimental warning Signed-off-by: Naarcha-AWS <[email protected]> * Address tech review. Fix typos Signed-off-by: Naarcha-AWS <[email protected]> * Fix bad link. Add next steps section Signed-off-by: Naarcha-AWS <[email protected]> * Fix typo Signed-off-by: Naarcha-AWS <[email protected]> * Update cluster-settings.md Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: kolchfa-aws <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/connectors.md Signed-off-by: Naarcha-AWS <[email protected]> * Change cluster values for boolean. Fix typo. Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Fix cluser settings Signed-off-by: Naarcha-AWS <[email protected]> * Add missing config options. More technical feedback. Signed-off-by: Naarcha-AWS <[email protected]> * Adjust cluster setting description. Signed-off-by: Naarcha-AWS <[email protected]> * Add updated ChatGPT example Signed-off-by: Naarcha-AWS <[email protected]> * Add info and example for internal connector. Signed-off-by: Naarcha-AWS <[email protected]> * One last adjustment. Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Fix dead link Signed-off-by: Naarcha-AWS <[email protected]> * Fix one last comment. Signed-off-by: Naarcha-AWS <[email protected]> * change ordered list to numbered. Signed-off-by: Naarcha-AWS <[email protected]> --------- Signed-off-by: Naarcha-AWS <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
Fixes #3063
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.